# 2dConvolution on only one part of the matrix or alternatively reshaping

Hi, I have the following tensor size `q=[150, 96, 96, 240]` (here: `[batch_size, sentence_length, sentence_length, weight_size]`), I need to multiply it with a weight vector of size `` and either reshape or use convolution to make q of size `[150, 96, 96]`, basically combining the last `[96, 96, 240]` into size `[96, 96]`. I have a short code snippet, but I don’t know how to properly reshape, and if doing convolution, in this case, makes more sense (and if so, how to do it).
Any help appreciated.

``````
import torch
import torch.nn as nn
import numpy as np

class SomeClass(nn.Module):

def __init__(self):
super().__init__()

# create a learnable weight vector
self.my_weights= nn.Parameter(240)
nn.init.xavier_normal_(self.my_weights)

def forward(q):

# q = np.random.randn(150, 96, 96, 240)
# a = torch.from_numpy(q)
# size: [batch_size, sentence_length, sentence_length, weight_size]

# multiply the weight vector with the last column of the tensor, is this correct?
k= torch.matmul(q, self.my_weights)

# reshape or do a 2d Convolution
v = TODO(k)  [150, 96, 96, 240] -->  [150, 96, 96, 1] --> [150, 96, 96]

``````

Visual representation I drew up: https://imgur.com/a/yuIc3t4

How do you want to reduce the last dimension?
Do you want to sum it?

``````q = torch.randn(15, 9, 9, 2)
w = torch.randn(15, 9, 9)
(q*w.unsqueeze(3)).sum(3)
``````

I only have one tensor `q=[150, 96, 96, 240]`, I have to somehow transform it into `q=[150, 96, 96]`. In your example, there is an additional `w` of size `[150, 96, 96]`, which I don’t have. A summation could work I guess, but when I try it on `q`, it doesn’t do anything in my case: `q.sum(3)`, doesn’t take away the last column.

I’ve read your problem and thought I still had it in mind when I created the example.
Apparently I’ve mixed up the shapes. Would this work then:

``````q = torch.randn(15, 9, 9, 2)
w = torch.randn(1, 1, 1, 2)
(q*w).sum(3)
``````

Sorry, I don’t think I fully understand what you are suggesting.

`self.my_weights= nn.Parameter(torch.FloatTensor(240, 240)`

I should somehow make it into:
`self.my_weights= nn.Parameter(torch.FloatTensor([1,1,1,240], [1,1,1,240]))` <-- that doesn’t seem to work in PyTorch. How can I initialize parameters with multiple dimensions properly?

So, based on my code snippet in the original post, I would do something like the following:

``````import torch
import torch.nn as nn
import numpy as np

class SomeClass(nn.Module):

def __init__(self):
super().__init__()

# create a learnable weight vector
self.my_weights= nn.Parameter(torch.FloatTensor([1,1,1,240], [1,1,1,240]))
nn.init.xavier_normal_(self.my_weights)

def forward(q):

# q = np.random.randn(150, 96, 96, 240)
# size: [batch_size, sentence_length, sentence_length, weight_size]

k = (q*self.my_weights).sum(3)

``````

Is this what you are suggesting or am I misunderstanding you? Also, thanks for looking into this!

Maybe I’m misunderstanding your use case completely, so feel free to correct me. You’ve said

, but somehow you are trying to initialize a tensor of shape `[240, 240]` as `my_weights`.

If you just want random numbers, you can reuse my code example: `my_weights = nn.Parameter(torch.randn(1, 1, 1, 240))`.

Oh, I think I had `nn.Linear` there before, so it was `240` size in and `240` size out. I messed the example up because of that. It was supposed to be just `240`, my bad!

I tried what you have suggested, and it looks like the following now:

``````import torch
import torch.nn as nn
import numpy as np

class SomeClass(nn.Module):

def __init__(self):
super().__init__()

# create a learnable weight vector
self.my_weights= nn.Parameter(torch.rand(1, 1, 1, 240))  # I need the vector to be learned weights
nn.init.xavier_normal_(self.my_weights)

def forward(q):

# q = np.random.randn(150, 96, 96, 240)
# size: [batch_size, sentence_length, sentence_length, weight_size]

k = (q*self.my_weights)
k = k.sum(3)
``````

But I am very quickly getting an out of memory error on the multiplication part:

``````    k = (q*self.my_weights)
RuntimeError: CUDA error: out of memory
``````

The memory jumps to over 6Gb VRAM (my limit on Nvidia 980Ti), while it usually only needs about 2-3 Gb. Any ideas about this? Thanks.

Also, what does the sum(3) actually do in this case? I hope the last column doesn’t just disappear, I need it applied to the previous two (as shown in the drawing I provided).

I think, you can use conv2d to achieve what you want. Something like:

``````data = torch.randn(150, 96, 96, 240).cuda()
conv = nn.Conv2d(240, 1)
output = conv(data.permute((0,3,1,2))).squeeze()
``````

P. S. I didn’t check for syntax errors.

Thanks, I think this is what I needed. Unfortunately, the first part of initializing the weights doesn’t work with this solution for some reason.

Now, I have the following:

``````import torch
import torch.nn as nn
import numpy as np

class SomeClass(nn.Module):

def __init__(self):
super().__init__()

# create a learnable weight vector
self.my_weights= nn.Parameter(torch.FloatTensor(240))  # I need the vector to be learned weights
nn.init.xavier_normal_(self.my_weights)

self.conv = nn.Conv2d(240, kernel_size=1, out_channels=1)  # are the arguments correct?

def forward(q):

# q = np.random.randn(150, 96, 96, 240)
# size: [batch_size, sentence_length, sentence_length, weight_size]

k = (q*self.my_weights)
k = self.conv(k.permute((0, 3, 1, 2))).squeeze()
``````

I get the following error:

``````raise ValueError("Fan in and fan out cannot be computed for tensor with less than 2 dimensions")
ValueError: Fan in and fan out cannot be computed for tensor with less than 2 dimensions
(cuda_pytorch)
``````

which is apparently coming from: `nn.init.xavier_normal_(self.my_weights)`

If I don’t do `nn.init.xavier_normal_(self.my_weights)`, the `loss` is `NaN`.

Would doing:

``````self.concat_linear = nn.Linear(240, 240)
nn.init.kaiming_normal_(self.concat_linear.weight)

k = self.concat_linear(q)
``````

be an equivalent of:

``````self.my_weights= nn.Parameter(torch.FloatTensor(240))
nn.init.xavier_normal_(self.my_weights)
k = (q*self.my_weights)
``````

?

However, when I do that, I get the following later on in my code:

``````  File "...\lib\site-packages\torch\tensor.py", line 381, in __iter__
raise TypeError('iteration over a 0-d tensor')
TypeError: iteration over a 0-d tensor
``````

In my view, you do not need separate weights. nn.Conv2d already has learnable weights.

Thanks, I tried using only `Conv2d` without the `nn.Parameter`. Unfortunately, still getting the following error:

``````    raise TypeError('iteration over a 0-d tensor')
TypeError: iteration over a 0-d tensor
``````

can you write the new code here once more?

Sure thing. The more or less full code looks like this (with random data):

``````import torch
import torch.nn as nn
import numpy as np

class SomeClass(nn.Module):

def __init__(self):
super().__init__()

self.relu = nn.RReLU()
self.conv = nn.Conv2d(240, kernel_size=1, out_channels=1)

def forward(q, k, values):

# q = k = values
q = np.random.randn(150, 96, 120)
k = np.random.randn(150, 96, 120)

# now transform both into one tensor of size: [150, 96, 96, 240]
# not specifically related to the question above though
a = torch.from_numpy(q)
b = torch.from_numpy(k)
bs, s, v = a.size()
a_ = a.repeat(1, 1, s).view(bs, s*s, v)
b_ = b.repeat(1, s, 1)
concat_vec = torch.stack((a_, b_), 2).view(bs, s, s, -1)
# concat_vec size:
# [batch_size, sentence_length, sentence_length, weight_size] = [150, 96, 96, 240]

# now transform into [150, 96, 96]
attention = self.conv(concat_vec.permute((0, 3, 1, 2))).squeeze()

attention = self.relu(attention)
output = torch.bmm(attention, values)

return output
``````

In case this helps, the code is the modification of the scaled-dot product attention from the Self-attention paper (https://arxiv.org/abs/1703.03130), the modification is the implementation of the concatenation weighting function from this paper: https://arxiv.org/abs/1711.07971 (Section 3.2).

Originally, the code looks like this without the modification and works just fine:

``````import torch
import torch.nn as nn
import numpy as np

class SomeClass(nn.Module):

def __init__(self):
super().__init__()

def forward(q, k, values):

attn = torch.bmm(q, k.transpose(1, 2))
attn = self.softmax(attn)
output = torch.bmm(attn, values)
return output
``````

Error I am getting (happens later on in the code that uses the transformed `output`):

``````File "...\lib\site-packages\torch\tensor.py", line 381, in __iter__
raise TypeError('iteration over a 0-d tensor')
TypeError: iteration over a 0-d tensor
``````