Generating Precomputed activations

In case of transfer learning, we usually train only the last fc layer. The number of epochs required is huge (300 odd) to achieve benchmark performance. I find it wasteful to every time compute the activations upto fc layer as they would be same for a particular image in the training set(as all earlier layers are frozen). Hence huge amount of time can be saved if the activations prior to fc layer for both train and test sets are precomputed and used. has such an approach. Can we implement such a functionality in any way using standard Pytorch?

1 Like

What would happen to nn.Batchnorm and nn.Dropout layers?
Is the fixed part of the model also in eval() mode?
Could you post a link to the implementation?

In course they use their own library which is built on top of pytorch. They have a model loading function where u can set if u want precompute. I dont have access to internals of these library fns. However they can be explored further. Their forum is very popular & I will try to get answers to the issues U raised. If u are using transformations especially the random ones they also would raise issues.

Here is my speculation:
The nn prior to to the fc layer are frozen and batchnorm and dropout not applied. Even certain random transformations cant be used. This reduces the problem to precomputing the activations prior to fc layer for each input image once and using the same over the epochs.
This of course is not ideal bur the speed at which u can validate is phenomenal. 3 epochs on google colab with gpu take 1-2 min or so. To handle transformations they may be making multiple copies of the activations one for each transformation and choose one at a time. This is only my speculation.

I skimmed through the code and couldn’t find the precompute option.
Could you post the class which uses precompute?
I guessed it should be defined in learner, but as I’m not that familiar with the code base, I would need a hint. :wink:

Here is the code snippet:

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True), 3)