You should back up param.data.view(-1) in the first point. You’re not going to backprop to the average.
Content of the tensor after resize is unspecified! You can get arbitrary garbage in your tensor. Use .view to change the shape if you need to.
You’re only overwriting the local reference to param, it doesn’t change your model at all. It’s as if you did that: a = model.linear.weight; a = Variable(...)
You never back up the original parameters of your model - they are overwrittien by the average for the test and you won’t restore them to the previous state. Not sure if that’s what you wanted.
This would be correct:
def flatten_params():
return torch.cat([param.data.view(-1) for param in model.parameters()], 0)
def load_params(flattened):
offset = 0
for param in model.parameters():
param.data.copy_(flattened[offset:offset + param.nelement()]).view(param.size())
offset += param.nelement()
avg_param = flatten_params() # initialize
def train():
...
avg_param = 0.9 * avg_param + 0.1 * flatten_params()
def test():
original_param = flatten_params() # save current params
load_params(avg_param) # load the average
...
load_params(original_param) # restore parameters
Before employing the running average, my code occupied only half of video memory.
But, when I tried as you suggested, it didn’t proceed after 2 iterations due to ‘out of memory’.
The error is shown below.
CompleteTHCudaCheck FAIL file=/data/users/soumith/miniconda2/conda-bld/pytorch-cuda80-0.1.9_1487349287443/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "inception_ae_resume_ra.py", line 251, in <module>
train_iter_loss, avg_param = train(config, epoch, avg_param)
File "inception_ae_resume_ra.py", line 166, in train
avg_param = 0.9*avg_param + 0.1*flatten_params()
File "/home/sypark/anaconda2/envs/py36/lib/python3.6/site-packages/torch/tensor.py", line 320, in __mul__
return self.mul(other)
RuntimeError: cuda runtime error (2) : out of memory at /data/users/soumith/miniconda2/conda-bld/pytorch-cuda80-0.1.9_1487349287443/work/torch/lib/THC/generic/THCStorage.cu:66
Your model must have a lot of parameters. Instead of flattening them into a single big tensor, you process it in parts:
from copy import deepcopy
avg_param = deepcopy(list(p.data for p in model.parameters()))
def train():
...
for p, avg_p in zip(model.parameters(), avg_param):
avg_p.mul_(0.9).add_(0.1, p.data)
Not sure if you’ll manage to fit another copy of the params in memory, so you can restore them after testing.
def load_params(avg_param):
for p, avg_p in zip(model.parameters(), avg_param):
p.data = deepcopy(avg_p)
def flatten_params():
flatten = deepcopy(list(p.data for p in model.parameters()))
return flatten
def load_params(flattened):
for p, avg_p in zip(model.parameters(), flattened):
p.data = deepcopy(avg_p)
Currently, it works without error.
But, I am not sure it works as I intended.
why didn’t u wrap avg_param is a Variable with autograd set to false? As in something like:
W = Variable(w_init, requires_grad=True)
W_avg = Variable(torch.FloatTensor(W).type(dtype), requires_grad=False)
for i in range(nb_iterations):
#some GD stuff...
W_avg = (1/nb_iter)*W + W_avg