Hi !
I am working on quantizing a CNN during training.
Each batch iteration, a floating point version of the network should be saved, and a quantized version should be used for forward propagation etc … then the floating point version should be loaded again and the weights should be updated using optimizer.step()
I don’t know what the proper way to copy, modify and load the network is. would you please suggest the best way for this scenario ?
Also, is it optimizer.step() what really updates the weights after back propagation? or does something else do it?
Thank you very much !
partial answers to your question, in case useful:
 the derivatives are calculated when you call
.backward()
on your loss/objective function

optimizer.step()
passes the derivatives to the optimizer, eg SGD, Adam etc, which then handles all the fancy momentum stuff and so on, and updates the weights
 in the backward path, the derivatives are based on the forward path. However, I kind of suspect that if you have eg
.int()
in your forward path, this is theoretically nondifferentiable, and you’d need to use some kind of REINFORCE algorithm or gumbel softmax or similar (I havent though through the assertion that .int()
is nondifferentiable for more than ~23 seconds, so take this with a wodge of salt)
Thank you so much for your reply.
Now things are hopefully clearer.
However, I am not using .int() in the forward path. I am using a quantization function similar to this one here:
https://github.com/eladhoffer/utils.pytorch/blob/master/quantize.py
For copying a network I use the following:
float_model_dict = cnn.state_dict()
and for quantization I use:
for key, value in float_model_dict.items():
quant_model_dict[key] = quant(value)
where quant is the quantization function. Then for loading a quantized model I use:
cnn.load_state_dict(quant_model_dict)
Then I do Finally, I load the original model back and I update the weights as follows:
cnn.load_state_dict(float_model_dict)
optimizer.step()
is what I am doing write for that purpose? what is the best way to copy a network, do some math on it and load it again?
Thanks
I guess I figured out what the problem is…
It is that in Python when you do object_new = object_old
then object_new
is not really a new object, it’s just another name for object_old
.
In other words, this is shallow copy.
And whenever object_old
is changed, changes are reflected to object_new
.
Since I’m using shallow copy (without knowing at first) like:
float_model_dict = cnn.state_dict()
an independent copy of state_dict() is not being created, rather a shallow copy is taking place.
In order to get the desired independent copy, I had to:
import copy
first, then deep copy:
float_model_dict = copy.deepcopy(cnn.state_dict())
This was the way I followed to copy a model.
Now in order to modify a ‘deep copied’ copy, I did the following:
for key, value in some_model_dict.items():
some_model_dict[key] = operation(value)
and that’s it.