2dConvolution on only one part of the matrix or alternatively reshaping

Hi, I have the following tensor size q=[150, 96, 96, 240] (here: [batch_size, sentence_length, sentence_length, weight_size]), I need to multiply it with a weight vector of size [240] and either reshape or use convolution to make q of size [150, 96, 96], basically combining the last [96, 96, 240] into size [96, 96]. I have a short code snippet, but I don’t know how to properly reshape, and if doing convolution, in this case, makes more sense (and if so, how to do it).
Any help appreciated.


import torch 
import torch.nn as nn 
import numpy as np

class SomeClass(nn.Module):

        def __init__(self):
             super().__init__()

             # create a learnable weight vector
             self.my_weights= nn.Parameter(240)
             nn.init.xavier_normal_(self.my_weights)
   
        def forward(q):

             # q = np.random.randn(150, 96, 96, 240)
             # a = torch.from_numpy(q)  
             # size: [batch_size, sentence_length, sentence_length, weight_size]
             
            # multiply the weight vector with the last column of the tensor, is this correct?
            k= torch.matmul(q, self.my_weights)
            
            # reshape or do a 2d Convolution
            v = TODO(k)  [150, 96, 96, 240] -->  [150, 96, 96, 1] --> [150, 96, 96]

Visual representation I drew up: https://imgur.com/a/yuIc3t4

How do you want to reduce the last dimension?
Do you want to sum it?

q = torch.randn(15, 9, 9, 2)
w = torch.randn(15, 9, 9)
(q*w.unsqueeze(3)).sum(3)

I only have one tensor q=[150, 96, 96, 240], I have to somehow transform it into q=[150, 96, 96]. In your example, there is an additional w of size [150, 96, 96], which I don’t have. A summation could work I guess, but when I try it on q, it doesn’t do anything in my case: q.sum(3), doesn’t take away the last column.

Sorry my bad!
I’ve read your problem and thought I still had it in mind when I created the example.
Apparently I’ve mixed up the shapes. :wink:

Would this work then:

q = torch.randn(15, 9, 9, 2)
w = torch.randn(1, 1, 1, 2)
(q*w).sum(3)

Sorry, I don’t think I fully understand what you are suggesting.

So, instead of

self.my_weights= nn.Parameter(torch.FloatTensor(240, 240)

I should somehow make it into:
self.my_weights= nn.Parameter(torch.FloatTensor([1,1,1,240], [1,1,1,240])) <-- that doesn’t seem to work in PyTorch. How can I initialize parameters with multiple dimensions properly?

So, based on my code snippet in the original post, I would do something like the following:

import torch 
import torch.nn as nn 
import numpy as np

class SomeClass(nn.Module):

        def __init__(self):
             super().__init__()

             # create a learnable weight vector
             self.my_weights= nn.Parameter(torch.FloatTensor([1,1,1,240], [1,1,1,240]))
             nn.init.xavier_normal_(self.my_weights)
   
        def forward(q):

             # q = np.random.randn(150, 96, 96, 240)
             # size: [batch_size, sentence_length, sentence_length, weight_size]
             
            k = (q*self.my_weights).sum(3)

Is this what you are suggesting or am I misunderstanding you? Also, thanks for looking into this!

Maybe I’m misunderstanding your use case completely, so feel free to correct me. :wink:
You’ve said

, but somehow you are trying to initialize a tensor of shape [240, 240] as my_weights.

If you just want random numbers, you can reuse my code example: my_weights = nn.Parameter(torch.randn(1, 1, 1, 240)).

Oh, I think I had nn.Linear there before, so it was 240 size in and 240 size out. I messed the example up because of that. It was supposed to be just 240, my bad!

I tried what you have suggested, and it looks like the following now:

import torch 
import torch.nn as nn 
import numpy as np

class SomeClass(nn.Module):

        def __init__(self):
             super().__init__()

             # create a learnable weight vector
             self.my_weights= nn.Parameter(torch.rand(1, 1, 1, 240))  # I need the vector to be learned weights
             nn.init.xavier_normal_(self.my_weights)
   
        def forward(q):

             # q = np.random.randn(150, 96, 96, 240)
             # size: [batch_size, sentence_length, sentence_length, weight_size]
             
            k = (q*self.my_weights)
            k = k.sum(3)

But I am very quickly getting an out of memory error on the multiplication part:

    k = (q*self.my_weights)
RuntimeError: CUDA error: out of memory

The memory jumps to over 6Gb VRAM (my limit on Nvidia 980Ti), while it usually only needs about 2-3 Gb. Any ideas about this? Thanks.

Also, what does the sum(3) actually do in this case? I hope the last column doesn’t just disappear, I need it applied to the previous two (as shown in the drawing I provided).

I think, you can use conv2d to achieve what you want. Something like:

data = torch.randn(150, 96, 96, 240).cuda()
conv = nn.Conv2d(240, 1)
output = conv(data.permute((0,3,1,2))).squeeze()

P. S. I didn’t check for syntax errors.

Thanks, I think this is what I needed. Unfortunately, the first part of initializing the weights doesn’t work with this solution for some reason.

Now, I have the following:

import torch 
import torch.nn as nn 
import numpy as np

class SomeClass(nn.Module):

        def __init__(self):
             super().__init__()

             # create a learnable weight vector
             self.my_weights= nn.Parameter(torch.FloatTensor(240))  # I need the vector to be learned weights
             nn.init.xavier_normal_(self.my_weights)

             self.conv = nn.Conv2d(240, kernel_size=1, out_channels=1)  # are the arguments correct?
   
        def forward(q):

             # q = np.random.randn(150, 96, 96, 240)
             # size: [batch_size, sentence_length, sentence_length, weight_size]
             
            k = (q*self.my_weights)
            k = self.conv(k.permute((0, 3, 1, 2))).squeeze()

I get the following error:

raise ValueError("Fan in and fan out cannot be computed for tensor with less than 2 dimensions")
ValueError: Fan in and fan out cannot be computed for tensor with less than 2 dimensions
(cuda_pytorch)

which is apparently coming from: nn.init.xavier_normal_(self.my_weights)

If I don’t do nn.init.xavier_normal_(self.my_weights), the loss is NaN.

Would doing:

self.concat_linear = nn.Linear(240, 240) 
nn.init.kaiming_normal_(self.concat_linear.weight)

k = self.concat_linear(q)

be an equivalent of:

self.my_weights= nn.Parameter(torch.FloatTensor(240))
nn.init.xavier_normal_(self.my_weights)
k = (q*self.my_weights)

?

However, when I do that, I get the following later on in my code:

  File "...\lib\site-packages\torch\tensor.py", line 381, in __iter__
    raise TypeError('iteration over a 0-d tensor')
TypeError: iteration over a 0-d tensor

In my view, you do not need separate weights. nn.Conv2d already has learnable weights.

Thanks, I tried using only Conv2d without the nn.Parameter. Unfortunately, still getting the following error:

    raise TypeError('iteration over a 0-d tensor')
TypeError: iteration over a 0-d tensor

can you write the new code here once more?

Sure thing. The more or less full code looks like this (with random data):

import torch 
import torch.nn as nn 
import numpy as np

class SomeClass(nn.Module):

        def __init__(self):
             super().__init__()
            
             self.relu = nn.RReLU()
             self.conv = nn.Conv2d(240, kernel_size=1, out_channels=1)
   
        def forward(q, k, values):
             
             # q = k = values
             q = np.random.randn(150, 96, 120)
             k = np.random.randn(150, 96, 120)
             
             # now transform both into one tensor of size: [150, 96, 96, 240]
             # not specifically related to the question above though
             a = torch.from_numpy(q)
             b = torch.from_numpy(k)
             bs, s, v = a.size()
             a_ = a.repeat(1, 1, s).view(bs, s*s, v)
             b_ = b.repeat(1, s, 1)
             concat_vec = torch.stack((a_, b_), 2).view(bs, s, s, -1)
             # concat_vec size: 
             # [batch_size, sentence_length, sentence_length, weight_size] = [150, 96, 96, 240]
             
             # now transform into [150, 96, 96]
             attention = self.conv(concat_vec.permute((0, 3, 1, 2))).squeeze()

             attention = self.relu(attention)
             output = torch.bmm(attention, values)

             return output

In case this helps, the code is the modification of the scaled-dot product attention from the Self-attention paper (https://arxiv.org/abs/1703.03130), the modification is the implementation of the concatenation weighting function from this paper: https://arxiv.org/abs/1711.07971 (Section 3.2).

Originally, the code looks like this without the modification and works just fine:

import torch 
import torch.nn as nn 
import numpy as np

class SomeClass(nn.Module):

        def __init__(self):
             super().__init__()
   
        def forward(q, k, values):
             
             attn = torch.bmm(q, k.transpose(1, 2))
             attn = self.softmax(attn)
             output = torch.bmm(attn, values)
             return output

Error I am getting (happens later on in the code that uses the transformed output):

File "...\lib\site-packages\torch\tensor.py", line 381, in __iter__
    raise TypeError('iteration over a 0-d tensor')
TypeError: iteration over a 0-d tensor