Hello,
I am trying a new algorithm that I have implemented outside of PyTorch. Is this a legit way to copy out and copy in PyTorch model parameters or am I causing problems under the hood?
My code assumes that modules/parameters returned from named_children() and parameters() methods always return those items in the same order.
Pack into a numpy vector doing something like:
Qmods = [mod for name,mod in self.named_children() if name in self.Qlayers]
for mod in Qmods:
Qparams += [parm for parm in mod.parameters()]
Ws = np.hstack([np.ravel(netp.data.numpy()) for netp in Qparams])
return Ws
Do stuff to Ws
Unpack from numpy vector doing something like:
for qmod in [mod for name,mod in self.named_children() if name in self.Qlayers]:
for netp in qmod.parameters():
stopIdx = startIdx + np.prod(netp.data.size())
netp.data.copy_(Tensor(np.reshape(Ws[startIdx:stopIdx],netp.data.size())))
startIdx = stopIdx
I also do something similar to grab gradients from the network which are used to update Ws outside of PyTorch. I realize this may be inefficient. Just want to make sure I am not breaking PyTorch in some way by accessing/overwriting network weights and gradients in this way.
Thank you, very much, for your guidance.