How to reduce the size of a neural network

I’ve read a couple of posts here and there but couldn’t extract much useful for my case.

We are trying to train a neural net that will run in the browser, so for that we use onnx to export the model after training.

However, the model has to be small.

There are two ideas I can see that could achieve this:

  1. Use half precision (still trying to work out how could we do it.)
  2. Just remove layers (but this affects the precision.)

I’d welcome some pointers here as it is my first model, and i’m a bit stuck with the last steps.

I’ve heard that someone may use int8 to quantize their net while inference. This will greatly speed up~

Quantization works all the way down to 4-bit currently.

You can also look at pruning your model: Pruning Tutorial — PyTorch Tutorials 2.0.1+cu117 documentation

there is so much to read :frowning: