Individual attention weights for every location of feature map

I am trying to use the attention model. Specifically, I have a feature map whose size is (16,256,13,13) (batchsize, channel, height, width). I want to learn an attention weight at each location, but each location uses a different way of learning. Specifically, I want to convert the feature map into (16, 1, 13, 13) and implement it with 13*13 convolution kernels of size (256, 1, 1, 1).

My code definition of the network is:
layers=[]
for i in range(13):
for j in range(13):
layers.append(nn.Conv2d(256, 1, kernel_size=[1, 1], stride=(1, 1)))
self.attention_1 = layers

My code about forward is:
atten = []
for i in range(13):
for j in range(13):
feature_location = input[:, :, i, j]
feature_location = feature_location.unsqueeze(2).unsqueeze(3)
atten[:,:,i,j] = self.attention_1 (i+1)*(j+1)-1 .tanh()

I know that it is wrong to write this way, but I don’t know how to write this part of the code. How should I do for this problem? Thanks for your advice.