Where the weights updates are calculated when Adam optimizer is used?

kala855 · January 9, 2020, 1:57pm

Hi everyone,

I am trying to figure out where the weight update process is done when you train a GRU model. I am using Adam optimizer, however I am new in PyTorch. I need this basically because I need to find the C++ code used to do the update in order to change the precision used while the update is done. I did this before using Caffe where I found a function called saxpy (on top of the blas function with the same name) where the update was done, however I am looking for the same in PyTorch but I can’t find the proper function doing it.

Do you know where could I find information related to this process ?

Thanks in advance.

albanD · January 9, 2020, 3:05pm

Hi,

All the computations are done in this file: https://github.com/pytorch/pytorch/blob/master/torch/optim/adam.py

If you just want to increase the precision, you don’t have to change any cpp code. You can make your own adam optimizer (copy paste the code from the file above), and convert Tensors from single precision to double precision with .double() to increase the precision of all operations that will happen to it.
Don’t forget to convert it back to a single precision Tensor with .float() before performing the update on the original parameters.

kala855 · January 9, 2020, 3:41pm

Thanks for your answer. I will check the code there. But the thing is that I need to reduce the precision using a mechanism that I implemented, with Caffe worked because I could detect specifically where the weight update process was done. However with PyTorch if I see the adam.py file seems like some .add methods are called to do the weight update process and as far as I understand in some way this will be executed using some backend library as mkl, mkldnn (because I am just supporting CPU features). Is there where I need to check the routine name to tell to my mechanism to avoid reducing precision when the weight updates is done based on this routine name.

albanD · January 9, 2020, 3:48pm

Ho you want less precision?

I think pytorch is quite different than Caffe and you might want to implement your modification differently than you used to. Why not write it as a python function that modifies how the adam optimizer works?

kala855 · January 9, 2020, 3:51pm

My mechanism is PinTool based. Basically I check at instruction level and then I can take the operands in the instruction and reduce the precision then I write the reduced precision operands back in the instruction and continue with the execution. However I can not do that in the weight updates calculation. That is why I need to avoid reducing precision there. By default my mechanism do the reduced precision during the whole program execution but avoids the weight updates routines.

albanD · January 9, 2020, 3:56pm

Ho ok.
Then I would say:

Find out which function is used in your case (that will depend on how you installed pytorch). And if you can, change this one the same way you changed the caffe implementation.
Use the cpp_extensions to create a custom cpp function that does what you want, and modify adam to use that instead of the plain .add.
You can also use the cpp extension to easily set a global flag in cpp to enable/disable your special code.

Would that help?

kala855 · January 9, 2020, 4:04pm

OK, basically the weight updates process in Adam Optimizer is not using a specific routine. Is more a generic routine to do the weight updates. I see a call to p.data.addcdiv_(-step_size, exp_avg, denom), with generic I mean that could be used to do different calculations than just the weight updates in the different optimizers (adam, adamgrad, sgd, etc). If this is the case then I need to do what you answered me.

What am I saying is OK ??

Thanks. I really appreciate your help.

albanD · January 9, 2020, 5:39pm

Yes, exactly, the adam optimizer only uses generic functions. It’s not a specific low level implementation.