Running average of parameters

I would like to use the running average of parameters instead of using the parameters from the training directly at the test session.

To do this, I initialized the running average parameters from the network as

avg_param =[param.view(-1) for param in model.parameters()],0)

Then, I performed the running average at each training iteration as

avg_param = 0.9*avg_param + 0.1*[ for param in model.parameters()],0)

Finally, at the test session, I loaded the parameters as

i = 0
for param in model.parameters():
    param = avg_param[i:i+param.nelement()].resize(*param.size())
    i = i+param.nelement()

Is this process correct ?

1 Like

There are a few problems I can see:

  1. You should back up in the first point. You’re not going to backprop to the average.
  2. Content of the tensor after resize is unspecified! You can get arbitrary garbage in your tensor. Use .view to change the shape if you need to.
  3. You’re only overwriting the local reference to param, it doesn’t change your model at all. It’s as if you did that: a = model.linear.weight; a = Variable(...)
  4. You never back up the original parameters of your model - they are overwrittien by the average for the test and you won’t restore them to the previous state. Not sure if that’s what you wanted.

This would be correct:

def flatten_params():
    return[ for param in model.parameters()], 0)

def load_params(flattened):
    offset = 0
    for param in model.parameters():[offset:offset + param.nelement()]).view(param.size())
        offset += param.nelement()

avg_param = flatten_params() # initialize

def train():
    avg_param = 0.9 * avg_param + 0.1 * flatten_params()

def test():
    original_param = flatten_params() # save current params
    load_params(avg_param) # load the average
    load_params(original_param) # restore parameters

Thanks for your reply.

Before employing the running average, my code occupied only half of video memory.

But, when I tried as you suggested, it didn’t proceed after 2 iterations due to ‘out of memory’.

The error is shown below.

CompleteTHCudaCheck FAIL file=/data/users/soumith/miniconda2/conda-bld/pytorch-cuda80-0.1.9_1487349287443/work/torch/lib/THC/generic/ line=66 error=2 : out of memory
Traceback (most recent call last):
  File "", line 251, in <module>
train_iter_loss, avg_param = train(config, epoch, avg_param)
  File "", line 166, in train
avg_param = 0.9*avg_param + 0.1*flatten_params()
  File "/home/sypark/anaconda2/envs/py36/lib/python3.6/site-packages/torch/", line 320, in __mul__
return self.mul(other)
RuntimeError: cuda runtime error (2) : out of memory at /data/users/soumith/miniconda2/conda-bld/pytorch-cuda80-0.1.9_1487349287443/work/torch/lib/THC/generic/

Your model must have a lot of parameters. Instead of flattening them into a single big tensor, you process it in parts:

from copy import deepcopy

avg_param = deepcopy(list( for p in model.parameters()))

def train():
    for p, avg_p in zip(model.parameters(), avg_param):

Not sure if you’ll manage to fit another copy of the params in memory, so you can restore them after testing.

Thanks for your reply.

As suggested, the train works well without memory issue.
But, in load_params(avg_param) at the test session, I got the following error.[offset:offset + param.nelement()]).view(param.size())
RuntimeError: copy from list to FloatTensor isn't implemented

I think the load_param function should be modified due to the list.

Yes, it should, but I left it as an exercise to the reader :wink:

1 Like

I modified the functions as below.

def load_params(avg_param):
    for p, avg_p in zip(model.parameters(), avg_param): = deepcopy(avg_p)

  def flatten_params():
    flatten = deepcopy(list( for p in model.parameters()))
    return flatten

def load_params(flattened):
    for p, avg_p in zip(model.parameters(), flattened): = deepcopy(avg_p)

Currently, it works without error.
But, I am not sure it works as I intended.

This would be better:

def load_params(flattened):
    for p, avg_p in zip(model.parameters(), flattened):

Also, note that they’re no longer flattened, so you might want to change the name.


I will change them.

why didn’t u wrap avg_param is a Variable with autograd set to false? As in something like:

W = Variable(w_init, requires_grad=True)
W_avg = Variable(torch.FloatTensor(W).type(dtype), requires_grad=False)
for i in range(nb_iterations):
    #some GD stuff...
    W_avg = (1/nb_iter)*W + W_avg