How to correctly assign value to a tensor

omers66 · December 16, 2018, 11:51am

Hi, I’m trying to do the simplest thing
I have in my simple feedforward model an attribute which behaves as sort of a memory/buffer of previous outputs in a way that I wish to store outputs in it and push out previous values for new incoming values.

class pred_model(nn.Module):
    def __init__(self):
        super(pred_model, self).__init__()
        self.Layer_1 = nn.Linear(Lin, L1)
        self.Layer_2 = nn.Linear(L1, L2)
        self.Layer_out = nn.Linear(L2, Lout)
        self.mem_y = torch.zeros([mem_y*2])

    def forward(self, input):
        x = torch.cat((input, self.mem_y))
        x = F.relu(self.Layer_1(input))
        x = F.relu((self.Layer_2(x)))
        x = (self.Layer_out(x))
        self.mem_y[2:None] = self.mem_y[0:-2]
        self.mem_y[0:2] = x

        return x

Now, I insert a sample to the forward pass for the first time and I get things right,
But in the second time I insert a new sample, the lines:

self.mem_y[2:None] = self.mem_y[0:-2]

Mess up, basically I’m getting this:

initial value : self.mem_y = tensor([0., 0., 0., 0., 0., 0.])
after first sample : self.mem_y = tensor([-0.1605,  0.3055,  0.0000,  0.0000,  0.0000,  0.0000]
after second sample : self.mem_y = tensor([-0.1530,  0.3056, -0.1605,  0.3055, -0.1605,  0.3055]

Why the vector after the second sample is quipped with 0.3056, -0.1605 twice?
What am I doing wrong?

InnovArul · December 16, 2018, 1:32pm

I think, this is because the assignment is done by reference.
In the above statement, consider that every element is copied one by one. Hence, self.mem_y[0] will be copied to self.mem_y[2]. self.mem_y[1] to self.mem_y[3]. Note that self.mem_y[2] is already changed. Hence the changed value (self.mem_y[2]) is copied to self.mem_y[4] and so on.

You can avoid this confusion by

self.mem_y[2:None] = self.mem_y[0:-2].clone()

omers66 · December 16, 2018, 2:46pm

OHH ok thanks! 1 more clarify about the computation graph.
Let’s say I’m using this .clone() operation for every input sample, will the computation graph be built as a combination of graphs where the enrty/input to the each graph will be separated input_features concatenated with the current memory buffer (like I do)?
Thanks

InnovArul · December 16, 2018, 5:54pm

if you want to store the previous outputs, you can use a simple list and do append() and pop().

Apart from this, to your question, the computation graph will be present in the memory until you call backward() on terminal nodes. During back propagation, the computation graph will be destroyed. If you want to keep the computation graph, use loss.backward(retain_graph=True).