You might be interested in pruning neural networks. Some networks can be reduced in size by over 90% without performance loss. The lottery ticket hypothesis paper is good place to start, if you don’t already know about it of course.
I am! And the lottery ticket (LT) hypothesis is something I have thought about quite a bit. I believe the LT hypothesis and our work are related in a subtle way: in the former we build out the larger structure first and then prune, while in our approach we conservatively construct the "winning lottery" so to speak.
Another difference with the LT hypothesis is that the pruning there is very specific to neural networks.
Pruning in general is about trimming larger structures down. This is nice for using a network, but indeed doesn’t help with training it.
Lottery tickets on the other hand are configurations of weights that rival the full network in performance when both are trained from scratch, with the ticket having far fewer parameters to optimize. Pruning is only the method to find the tickets for now, but the ideal would be to have a weight init strategy that can create winning tickets in one-shot. There is work being done on this front, and also to see if winning tickets for one image classification task generalize to image classification as a whole. This would do a lot to reduce the size of networks from the start, and so far results are promising.
But yes, this is all very specific to neural networks. Do you have a blog or other place where you post about some of the things you’re working on? It would be nice to read about.
I haven't had time to write a blog post about this. The paper linked to in the first comment - which is [1] - present the first cut of our ideas. We have made two additional extensions beyond that, we will be submitting them to journals soon.
I am yet to release code publicly but that would take a couple of months, given the papers are my priority now - downsides of working on a PhD while having a day job :-|
To be clear I am not claiming that compact models are related to the LT hypothesis, just that I think they are and this, of course, needs to be rigorously established. For a while I would be spending time on the compact model arc (my primary interest), till I can get to investigating its connections with the LT hypothesis. In fact [1] doesn't talk about the LT hypothesis at all, instead focusing solely on compacting models in a model-agnostic manner.
https://arxiv.org/abs/1803.03635
Some pruning methods have recently been added in the latest feature release of PyTorch.