Consistently high loss values

NeuralFoX · June 13, 2024, 2:41am

Hi! I am trying to train a model to multiply 2 numbers accurately. Before this my previous task was to train a model to add 2 numbers, which worked fine. Unfortunately, during training, my model produces extremely high loss values, around the 300 mark.

Here is my code:

import torch;
import torch.nn as nn
import torch.optim as optim
import csv
import random
import numpy as np

#create the dataset

addend1=[];
addend2=[];
for x in range(100000):
 addend1.append(random.randint(0,100));
for y in range(100000):
 addend2.append(random.randint(0,100));
sumslst=[];
for z in range(100000):
 sumslst.append(addend1[z]*addend2[z]);

with open('additiondataset1.csv','w',newline='') as file:
  writer=csv.writer(file);
  for x in range(100000):
   writer.writerow([addend1[x],addend2[x],sumslst[x]])

device=("mps" if torch.backends.mps.is_available() else "cpu");
dataset=np.loadtxt('additiondataset1.csv',delimiter=',');
print (dataset)
X=dataset[:,0:2];
y=dataset[:,2];
X=torch.tensor(X,dtype=torch.float32);
y=torch.tensor(y,dtype=torch.float32).reshape(-1,1);
test=torch.tensor([30,10],dtype=torch.float32);

#build model

class Adder(nn.Module):
 def __init__(self):
  super().__init__()
  self.hidden1=nn.Linear(2,5)
  self.act1=nn.ReLU()
  self.hidden2=nn.Linear(5,2)
  self.act2=nn.ReLU()
  self.output=nn.Linear(2,1)
  self.act3=nn.ReLU()
 def forward(self,x):
  x = self.act1(self.hidden1(x))
  x = self.act2(self.hidden2(x))
  x = self.act3(self.output(x))
  return x

#train
model=Adder()
loss_fn=nn.MSELoss()
optimizer=optim.Adam(model.parameters(),lr=0.001)
n_epochs=10
batch_size=100

for epoch in range(n_epochs):
    for i in range(0, len(X), batch_size):
        Xbatch = X[i:i+batch_size]
        y_pred = model(Xbatch)
        ybatch = y[i:i+batch_size]
        loss = loss_fn(y_pred, ybatch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f'Finished epoch {epoch}, latest loss {loss}')
  
with torch.no_grad():
    y_pred = model(X)
accuracy = (y_pred.round() == y).float().mean()
print(f"Accuracy {accuracy}")   
print(model(test))

#test on numbers 2,3. the output should be 6
print(model(torch.FloatTensor([2,3])))

Here is a screenshot of my dataset, if it helps. I am using 1000 data points to train the model. The first 2 numbers are the factors, and the third number is the product.

10,5,50
7,4,28
5,4,20
7,4,28
6,1,6
8,1,8
2,5,10
4,5,20
2,2,4
7,3,21
9,1,9
4,2,8
7,2,14
2,2,4
10,1,10
4,3,12
2,5,10
2,4,8
10,4,40
5,2,10
6,2,12
8,2,16
1,1,1

Since my code instructs to write a new dataset every time the program is run, I delete the dataset after each run. So when I run the program my loss values are extremely high. Here is a screenshot of the output:

Finished epoch 0, latest loss 321.25
Finished epoch 1, latest loss 321.25
Finished epoch 2, latest loss 321.25
Finished epoch 3, latest loss 321.25
Finished epoch 4, latest loss 321.25
Finished epoch 5, latest loss 321.25
Finished epoch 6, latest loss 321.25
Finished epoch 7, latest loss 321.25
Finished epoch 8, latest loss 321.25
Finished epoch 9, latest loss 321.25
Accuracy 0.0
tensor([0.], grad_fn=<ReluBackward0>)
tensor([0.], grad_fn=<ReluBackward0>)

I’ve tried larger and smaller number of epochs with no improvement, and decreased and increased the learning rate which I understand controls the size of the steps the gradient descent algorithm takes, but with also no improvement. Can someone help? Thank you.

ptrblck · June 13, 2024, 6:32pm

Could you remove the last nn.ReLU activation and return the raw logits?