Very simple categorisation problem

dededede4 · March 23, 2023, 1:23pm

Hello,

I’m new to the world of pytorch and machine learning.

I code a small project and I made a very simplified version here, if someone wants to help me to understand the problem.

I have tens of thousands of categories.
In this example, I’m trying to get the categories 42, 4300, 55000 for an input of 0, 0, 0.

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np


outputExcepted = torch.tensor([[42, 4300, 55000]])
data = torch.tensor([[0, 0, 0]], dtype=torch.float32)

class FCNet(nn.Module):
    def __init__(self):
        super(FCNet, self).__init__()
        self.layer1 = nn.Linear(3, 3)

    def forward(self, x):
        x = torch.relu(self.layer1(x))
        return x


model = FCNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001)
num_epochs = 10000

model.train()

for epoch in range(num_epochs):
    optimizer.zero_grad()
    outputs = model(data)
    loss = criterion(outputs, outputExcepted.float())    
    loss.backward()
    optimizer.step()
    print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

The first concern is that the neural network does not converge:

Epoch [10000/10000], Loss: 20850.6680

I also wonder why I have to pass floats to the criterion function :

loss = criterion(outputs, outputExcepted.float())

If I don’t, I get this error:
RuntimeError: Expected floating point type for target with class probabilities, got Long

But on the documentation of CrossEntropyLoss I understand that you have to send him Long.

I guess it’s not a problem with a very complicated cause but there is surely a point that I haven’t learned and it blocks my little projects.

It would be great if someone could help me!

Thanks a lot,

soulitzer · March 23, 2023, 2:37pm

If your target shape has the same shape as the inputs, it will assume you are passing in a list of expected class probabilities (and that would be required to be floating type).

dededede4 · March 23, 2023, 3:17pm

Thank you so much for taking the time to respond!

I found your answer surprising and so I changed the input to the neural network to receive a different shape, but both problems are still there.

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np


outputExcepted = torch.tensor([[42, 4300, 55000]])
data = torch.tensor([[0, 0]], dtype=torch.float32)

class FCNet(nn.Module):
    def __init__(self):
        super(FCNet, self).__init__()
        self.layer1 = nn.Linear(2, 3)

    def forward(self, x):
        x = torch.relu(self.layer1(x))
        return x


model = FCNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001)
num_epochs = 10000

model.train()

for epoch in range(num_epochs):
    optimizer.zero_grad()
    outputs = model(data)
    loss = criterion(outputs, outputExcepted)    
    loss.backward()
    optimizer.step()
    print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

J_Johnson · March 23, 2023, 3:48pm

First, CrossEntropyLoss probably would not be the appropriate loss function in this case. Try L1Loss or MSELoss.

Second, the Linear layer in your model has 2 components, weights and biases. The weights get matmul on the inputs. But since your inputs are always zero, those won’t(shouldn’t) be going anywhere. So that just leaves the bias, which should eventually equal your expected output, since it is added. So your optimal “solution” in this case would be zeros for the weights(or any value since it doesn’t matter as they will get multiplied to zeros) and your expected values for the bias. Point being, your weights are useless in this example.

J_Johnson · March 23, 2023, 4:07pm

If, for some reason, you’d like to frame the problem to be useable for CrossEntropyLoss, here is one way you might go about it:

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np


outputExcepted = torch.tensor([[42, 4300, 55000]])
data = torch.tensor([[0, 0, 0]], dtype=torch.float32)

class FCNet(nn.Module):
    def __init__(self):
        super(FCNet, self).__init__()
        self.layer1 = nn.Linear(3, 55001*3)

    def forward(self, x):
        x = torch.relu(self.layer1(x))
        return x.view(3,55001)


model = FCNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
num_epochs = 10000

model.train()

for epoch in range(num_epochs):
    optimizer.zero_grad()
    outputs = model(data)
    loss = criterion(outputs, outputExcepted.view(-1))
    loss.backward()
    optimizer.step()
    print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

print("Predictions", torch.argmax(outputs, dim=1))

When it is finished, you can check that the predictions are correct and match your outputExpected values.

It’s going to be hit and miss, though, depending on how the initial weights and biases get instantiated. This is because the model depth is so thin and the only inputs are zeroes. You could try playing around with the learning rate or try initializing your biases to 1s in your init with self.layer1.bias = nn.Parameter(torch.ones_like(self.layer1.bias)).