Copy, modify and load networks

Hi !
I am working on quantizing a CNN during training.

Each batch iteration, a floating point version of the network should be saved, and a quantized version should be used for forward propagation etc … then the floating point version should be loaded again and the weights should be updated using optimizer.step()

I don’t know what the proper way to copy, modify and load the network is. would you please suggest the best way for this scenario ?

Also, is it optimizer.step() what really updates the weights after back propagation? or does something else do it?

Thank you very much !

partial answers to your question, in case useful:

  • the derivatives are calculated when you call .backward() on your loss/objective function
  • optimizer.step() passes the derivatives to the optimizer, eg SGD, Adam etc, which then handles all the fancy momentum stuff and so on, and updates the weights
  • in the backward path, the derivatives are based on the forward path. However, I kind of suspect that if you have eg .int() in your forward path, this is theoretically non-differentiable, and you’d need to use some kind of REINFORCE algorithm or gumbel softmax or similar (I havent though through the assertion that .int() is non-differentiable for more than ~2-3 seconds, so take this with a wodge of salt)

Thank you so much for your reply.

Now things are hopefully clearer.

However, I am not using .int() in the forward path. I am using a quantization function similar to this one here:

For copying a network I use the following:

        float_model_dict = cnn.state_dict()

and for quantization I use:

        for key, value in float_model_dict.items():
            quant_model_dict[key] = quant(value)

where quant is the quantization function. Then for loading a quantized model I use:


Then I do Finally, I load the original model back and I update the weights as follows:


is what I am doing write for that purpose? what is the best way to copy a network, do some math on it and load it again?


I guess I figured out what the problem is…

It is that in Python when you do object_new = object_old then object_new is not really a new object, it’s just another name for object_old .
In other words, this is shallow copy.
And whenever object_old is changed, changes are reflected to object_new.

Since I’m using shallow copy (without knowing at first) like:
float_model_dict = cnn.state_dict()
an independent copy of state_dict() is not being created, rather a shallow copy is taking place.

In order to get the desired independent copy, I had to:
import copy

first, then deep copy:
float_model_dict = copy.deepcopy(cnn.state_dict())

This was the way I followed to copy a model.

Now in order to modify a ‘deep copied’ copy, I did the following:

for key, value in some_model_dict.items():
    some_model_dict[key] = operation(value)

and that’s it.