Hi !
I am working on quantizing a CNN during training.
Each batch iteration, a floating point version of the network should be saved, and a quantized version should be used for forward propagation etc … then the floating point version should be loaded again and the weights should be updated using optimizer.step()
I don’t know what the proper way to copy, modify and load the network is. would you please suggest the best way for this scenario ?
Also, is it optimizer.step() what really updates the weights after back propagation? or does something else do it?
Thank you very much !
partial answers to your question, in case useful:
- the derivatives are calculated when you call
.backward()
on your loss/objective function
-
optimizer.step()
passes the derivatives to the optimizer, eg SGD, Adam etc, which then handles all the fancy momentum stuff and so on, and updates the weights
- in the backward path, the derivatives are based on the forward path. However, I kind of suspect that if you have eg
.int()
in your forward path, this is theoretically non-differentiable, and you’d need to use some kind of REINFORCE algorithm or gumbel softmax or similar (I havent though through the assertion that .int()
is non-differentiable for more than ~2-3 seconds, so take this with a wodge of salt)
Thank you so much for your reply.
Now things are hopefully clearer.
However, I am not using .int() in the forward path. I am using a quantization function similar to this one here:
https://github.com/eladhoffer/utils.pytorch/blob/master/quantize.py
For copying a network I use the following:
float_model_dict = cnn.state_dict()
and for quantization I use:
for key, value in float_model_dict.items():
quant_model_dict[key] = quant(value)
where quant is the quantization function. Then for loading a quantized model I use:
cnn.load_state_dict(quant_model_dict)
Then I do Finally, I load the original model back and I update the weights as follows:
cnn.load_state_dict(float_model_dict)
optimizer.step()
is what I am doing write for that purpose? what is the best way to copy a network, do some math on it and load it again?
Thanks
I guess I figured out what the problem is…
It is that in Python when you do object_new = object_old
then object_new
is not really a new object, it’s just another name for object_old
.
In other words, this is shallow copy.
And whenever object_old
is changed, changes are reflected to object_new
.
Since I’m using shallow copy (without knowing at first) like:
float_model_dict = cnn.state_dict()
an independent copy of state_dict() is not being created, rather a shallow copy is taking place.
In order to get the desired independent copy, I had to:
import copy
first, then deep copy:
float_model_dict = copy.deepcopy(cnn.state_dict())
This was the way I followed to copy a model.
Now in order to modify a ‘deep copied’ copy, I did the following:
for key, value in some_model_dict.items():
some_model_dict[key] = operation(value)
and that’s it.