Why does this view and matrix multiplication code fail?

The following code, inside the forward fo a network module, results in a module that learns correctly:

def forward(self, x, state_cell_tuple):
    ...
    xdot = x @ self.Wx
    xdot = xdot.view(batch_size, 4, self.embedding_size)
    ....
    i = F.tanh(xdot[:, 0] + hdot[:, 0] + self.bias[0])
    ... (etc) ...

However, transposing the view, and the accessor fails to learn correctly:

    xdot = x @ self.Wx
    xdot = xdot.view(4, batch_size, self.embedding_size)
    ...
    i = F.tanh(xdot[0] + hdot[0] + self.bias[0])
    ... (etc) ...

(from an LSTM cell implementation of course).

So, the question is: why is this view transposition failing? a matrix multiplication is effectively a fully-connected layer, should not matter which way around I do the .view(), I think? What am I missing here?

Have you transposed your target value?

As far as I know, if one permutes a matrix, that is used as a fully-connected layer, the model should learn identically, compared to any other permutation? ie if I do:

self.w = nn.Parameter(torch.Tensor(batch_size, neurons))
self.w.data.uniform(-stdv, stdv)
self.w.data = permute(self.w.data.view(-1)).view(batch_size,neurons)
...
x = x @ self.w

… the model will converge exactly the same, no matter what is the permutation inside permute?

(edited, to correct a bit…)

Hi,

The thing is that view is not transposition. It just looks at data differently. When you do .view(4, batch_size, ...), then the data from different samples in your batch are mixed (which most certainly confuses the learning a lot):

>>> import torch
>>> batch_size = 3
>>> seq_len = 2
>>> inp = torch.arange(batch_size).unsqueeze(-1).expand(batch_size, 4*seq_len).contiguous()
>>> print(inp)
tensor([[ 0,  0,  0,  0,  0,  0,  0,  0],
        [ 1,  1,  1,  1,  1,  1,  1,  1],
        [ 2,  2,  2,  2,  2,  2,  2,  2]])

>>> print(inp[0])
tensor([ 0,  0,  0,  0,  0,  0,  0,  0])

>>> print(inp.view(batch_size, 4, seq_len))
tensor([[[ 0,  0],
         [ 0,  0],
         [ 0,  0],
         [ 0,  0]],

        [[ 1,  1],
         [ 1,  1],
         [ 1,  1],
         [ 1,  1]],

        [[ 2,  2],
         [ 2,  2],
         [ 2,  2],
         [ 2,  2]]])

>>> print(inp.view(batch_size, 4, seq_len)[0])
tensor([[ 0,  0],
        [ 0,  0],
        [ 0,  0],
        [ 0,  0]])

>>> print(inp.view(4, batch_size, seq_len))
tensor([[[ 0,  0],
         [ 0,  0],
         [ 0,  0]],

        [[ 0,  0],
         [ 1,  1],
         [ 1,  1]],

        [[ 1,  1],
         [ 1,  1],
         [ 2,  2]],

        [[ 2,  2],
         [ 2,  2],
         [ 2,  2]]])

>>> print(inp.view(4, batch_size, seq_len)[:, 0])
tensor([[ 0,  0],
        [ 0,  0],
        [ 1,  1],
        [ 2,  2]])

1 Like

Oooo, right, good point :slight_smile: