Requires_grad = False after first batch

Mohammed_Zidane · March 27, 2023, 11:01am

Hi,
I am implementing a model such that
1- Set requires_grad = False under two conditions
ONLY for the two layers of the model
After feeding the FIRST batch of the data, from the second batch freeze and do not update the
weights.
2- after finishing the whole training process, I collect all the parameters’ grads ‘including the ones in the first two layers that were obtained only once with the first data batch’

I implemented it around 2 months ago and it worked but I set a new conda environment with the latest pytorch version and I am getting error that I have nontype gradients.
Are there new pytorch releases causing this error? if yes, any suggestions?
Thanks!

ptrblck · March 27, 2023, 6:31pm

The latest PyTorch 2.0.0 release sets the set_to_none argument to True by default in the zero_grad call, which will delete the .grad attribute of referenced parameters and will thus save memory.
I don’t know what exactly your code is doing, but maybe setting this attribute to False again allows your code to run?