Saving CNN Features

learning_pytorch · June 28, 2019, 5:46pm

Hello,

I plan to use pre-trained CNN for image classification. Also, I plan to train only the FC layers, hence the features shall stay fixed. With this in mind, Its beneficial to generate the features only once and use them over all epochs. My question is how can I save features into disk efficiently? Assume that each image has an image ID so during the forward pass, I can save features based on the image ID. But with high number of images, the size of my dictionary (where I will save features) will keep on expanding and eventually RAM shall not be able to handle it. So how can I (efficiently) append features pertaining to the current batch in a dictionary (Saved on disk) without loading the dictionary into memory?

Or any other suggestions?

Thanks

ptrblck · June 29, 2019, 12:15am

If I’m not mistaken, fastai provides some kind of solution for this work flow.
Unfortunately, I cannot find the topic here (or am not sure which keywords were used in the discussion), but maybe @sgugger or @jphoward might give you more information.

sgugger · July 7, 2019, 12:20pm

It’s not in the library but we have something that does this in this notebook for our DeviSe lesson. See the precompute_activations_dl function, it just takes a regular model and a regular dataloader.