Learning on simple model

Hello
I would like to do learning weights of a convolution but using torch.nn.functional.conv2d instead of torch.nn.conv2d as in this tutorial: https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html , because the weights tensor when using torch.nn.conv2d has the shape (C_out, C_in, H_in, H_w) and I rather want (C_out, C_in, 1,1) in my model.

%matplotlib inline
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
#         self.w_i = torch.ones(10, 3, 1, 1, requires_grad=True)
        self.w_i = nn.Parameter(torch.FloatTensor(3,3,1,1))
   
    def forward(self, x):
        x = F.conv2d(x, self.w_i)
        return x

net = Net()
print(net)

input = torch.ones(1, 3, 5, 5)
true_weights = torch.Tensor([[[[ 2.]],[[ 3.]],[[ 1.]]],
[[[ 0.]],[[ 5.]],[[ 2.]]],
[[[ 1.]],[[ 1.]],[[ 2.]]]])
target = F.conv2d(input,true_weights)
criterion = nn.MSELoss()

import torch.optim as optim
optimizer = optim.SGD(net.parameters(), lr=0.01)

for epoch in range(10000):
    optimizer.zero_grad()   # zero the gradient buffers
    output = net(input)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()    # Does the update

list(net.parameters())

I am new on Pytorch and the code above returns me weights in net.parameters() that is not correct (I want true_weights).
What should I modify to make the code run ?

Are you sure about that? This shape specifies a kernels of size 1x1 which is simply a weight multiplied to every entry in the input. There is no need for convolution at all.

Yes, this is exactly what I want ^^. Actually my model is more complicate than this, it’s just to start. And the reason why I want to do this with weights with this shape is that it fits a real-word observation model but describing it is too long.

What I want to model afterwards:

Given input and weights (this last has mandatory a shape (blabla,blabla,1,1)), make a 2d convolution and obtain the 4d output, make some transformations on it and obtain a 2d matrix (let’s name it a_{i,j}) that actually represents for me a probability matrix, make a multinomial sampling (let’s say 3000) on this last and obtain another 2d matrix that sum up to 3000 and which represents my data (let’s name it b_{i,j}).
The goal is to maximize the log-likelihood(data=b_{i,j}; params=input,weights) = Sum_{i,j} {b_{i,j}*log (a_{i,j}))} = Sum{i,j} {b_{i,j}*log(transformations(torch.nn.functional.conv2d(inputs,weights))_{i,j}}.

(That also means that I want actually to learn not only weights, but also the input. Whatever for now.)

I cannot detect any obvious mistakes in your code. I tried it with various learning rates and an even larger number of iterations, but I also don’t get the wanted weights.

Have you tried other optimizers?

Hello. I’ve tried other optimizers but this is same.

I’ve also write the code for the thing I discussed in the post before


%matplotlib inline
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

def minus_log_likelihood(A,B):
    loss=0
    for i in range(A.shape[0]):
        for j in range(A.shape[1]):
            if B[i,j] > 0:
                loss+=A[i,j]+torch.log(B[i,j])
            else: 
                loss = loss
    return -loss

def four_to_two_dim(doc):
    doc = doc.view(doc.shape[1],doc.shape[2],doc.shape[3])
    doc = torch.cat((doc, torch.zeros(doc.shape[0],doc.shape[1],doc.shape[0]-1)),dim=2)
    for i in range(doc.shape[0]):
        for j in range(doc.shape[1]):
            doc[i,j,:]=torch.cat((doc[i,j,-i:],doc[i,j,0:-i]),dim=0)
    doc = torch.sum(doc,0)
    return doc

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.w_i = nn.Parameter(torch.ones(10,3,1,1), requires_grad = True)

    def forward(self, x):
        x = F.conv2d(x, self.w_i)
        x = four_to_two_dim(x)
        return x

net = Net()

A=torch.Tensor([[1,1,0,0,0],[0,4,30,1,0]])
B=torch.Tensor([[2,1,0,0,3],[4,5,0,0,0]])
C=torch.Tensor([[1,1,20,10,0],[0,4,1,1,0]])

A=A/A.sum()
B=B/B.sum()
C=C/C.sum()

input = torch.zeros(1,3,2,5)
input[0,0,]=A
input[0,1,]=B
input[0,2,]=C

true_weigts = torch.zeros(10,3,1,1)
true_weights[0,2,0,0]=1
true_weights[3,0,0,0]=1
true_weights[7,2,0,0]=1

obs = torch.nn.functional.conv2d(input, true_weights)
obs = four_to_two_dim(obs)
obs = torch.distributions.multinomial.Multinomial(3000,obs).sample()

criterion = minus_log_likelihood
optimizer = optim.SGD(net.parameters(), lr=0.01)
# optimizer = optim.Adam(net.parameters())

for epoch in range(50000):
    if epoch % 5000 == 0: 
        print(epoch)
    optimizer.zero_grad()   # zero the gradient buffers
    output = net(input)
    loss = criterion(obs,output)
    loss.backward()
    optimizer.step()    # Does the update

print("learnt weights:")
test = list(net.parameters())[0].detach()
print(test.view(test.shape[0],test.shape[1]))

print("true weights:")
print(true_weights.view(true_weights.shape[0],true_weights.shape[1]))

but with learnt weights =
([[ 26.6613, 54.1032, 36.4842],
[ 24.9853, 34.6588, 29.3577],
[ 24.1140, 25.9870, 25.9867],
[ 23.9944, 25.5071, 27.3805],
[ 23.8598, 25.1517, 27.7695],
[ 23.9057, 25.1781, 27.8207],
[ 23.6072, 26.0832, 27.1767],
[ 22.1335, 25.3652, 27.1994],
[ 23.1807, 30.0675, 28.5578],
[ 39.4541, 42.3686, 45.3418]])
and true weights=
([[ 0., 0., 1.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 1., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 1.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
I don’t know if I can say this a success lol.