[Sloved] Why my loss not decreasing

Hi, I am new to deeplearning and pytorch, I write a very simple demo, but the loss can’t decreasing when training.
Any comments are highly appreciated!

I want to use one hot to represent group and resource, there are 2 group and 4 resouces in training data:
group1 (1, 0) can access resource 1(1, 0, 0, 0) and resource2(0, 1, 0, 0)
group2(0, 1) can access resource3(0, 0, 1, 0) and resource4(0, 0, 0, 1)
result is true(0) and false(1)

So the first line in input [1, 0, 1, 0, 0, 0] mean group1 can acess resource1, the first value in label is 0, mean true.

All the code is list below:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.datasets as dsets
from torch.autograd import Variable
import numpy as np
import random
from torch.utils.data import DataLoader, Dataset

input_size = 6
hidden_size = 6
num_classes = 2
num_epochs = 300
batch_size = 1
learning_rate = 0.01

training_data = torch.FloatTensor([[1, 0, 1, 0, 0, 0],
                 [1, 0, 0, 1, 0, 0],
                 [1, 0, 0, 0, 1, 0],
                 [1, 0, 0, 0, 0, 1],
                 [0, 1, 1, 0, 0, 0],
                 [0, 1, 0, 1, 0, 0],
                 [0, 1, 0, 0, 1, 0],
                 [0, 1, 0, 0, 0, 1]])
training_label = torch.LongTensor([0, 0, 1, 1, 1, 1, 0, 0])



class MyDataset(Dataset):
    def __init__(self, datas, labels):
        self.datas = datas
        self.labels = labels

    def __getitem__(self, index):
        data, target = self.datas[index], self.labels[index] 
        return data, target

    def __len__(self):
        return len(self.datas)


train_dataloader = DataLoader(MyDataset(training_data, training_label), batch_size=batch_size)

class Net(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size) 
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, num_classes)
    
    def forward(self, x):
        out = self.fc1(x)
        out = self.fc2(out)
        out = self.fc3(out)
        return F.log_softmax(out, dim=1)
    
net = Net(input_size, hidden_size, num_classes)

    
criterion = nn.NLLLoss()  
optimizer = torch.optim.SGD(net.parameters(), lr=learning_rate)

# Train the Model
for epoch in range(num_epochs):
    for data, labels in train_dataloader:  
        # Convert torch tensor to Variable
        data = Variable(data)
        labels = Variable(labels)
        
        # Forward + Backward + Optimize
        optimizer.zero_grad()  # zero the gradient buffer
        outputs = net(data)
        loss = criterion(outputs, labels.long())
        loss.backward()
        optimizer.step()
        print ('Epoch [%d/%d], Loss: %.4f' 
                   %(epoch+1, num_epochs, loss.data[0]))

Thanks for any comment or suggestion!

Try to add nonlinearities between your layers: out = F.relu(self.fc1(x)). Right now your model is just a linear transformation.
Also after adding the activation functions, you could play around with the learning rate.

2 Likes

With relu’s and learning rate = 0.1 will often converge to the solution, but will often also converge to a degenerate solution where it assigns the same probability to both labels.

Using tanh instead of relu seems to work better in this case.

2 Likes

Thanks a lot! Thanks ptrblck and jpeg729!
The activation function resolved my issue. Now the loss can decreasing.

ps. seems both relu and tanh can work in my test, I not seen much difference in results.

1 Like

Hey Peter!

Was just reviewing this thread and had two questions.

As of right now, my forward function is:

def __init__(self, input_length = 7,lstm_size = 64, lstm_layers=1, output_size = 1, 
                               drop_prob=0.2):
        super().__init__()
        self.input_length = input_length
        self.output_size = output_size
        self.lstm_size = lstm_size
        self.lstm_layers = lstm_layers
        self.drop_prob = drop_prob
        self.lstm = nn.LSTM(input_length, lstm_size, lstm_layers, 
                            dropout=drop_prob, batch_first=False)
        self.dropout = nn.Dropout(drop_prob)
        self.fc = nn.Linear(lstm_size, output_size)

def forward(self, nn_input, hidden_state):
        lstm_out, hidden_state = self.lstm(nn_input, hidden)
        lstm_out = lstm_out[-1, :, :] # this gets the final LSTM output for each sequence in the batch
        lstm_out = self.dropout(self.fc(lstm_out))
                
        return lstm_out, hidden_state

    def init_hidden(self, batch_size):
        weight = next(self.parameters()).data
        hidden = (weight.new(self.lstm_layers, batch_size, self.lstm_size).zero_(),
              weight.new(self.lstm_layers, batch_size, self.lstm_size).zero_())
        
        return hidden

1: Where would I add the nonlinear activation function in here? From what I’ve read it really depends on your activation functions which leads to the 2nd question.
2: I’m working with pct_change time series data that is attempting to predict a future pct_change. would I be correct in assuming that I’m better off using tanh than ReLU?

Thanks!

1 Like

@JStavy

I think you might’ve figured it out by this time. But, just for the sake of it I’m going to respond. ( I’m no expert, still a practitioner).

I believe you’re better of adding activation to your lstm_output, with activation of your choice, say:

lstm_out = nn.ELU(lstm_out[-1, :, :])

You can use this output to feed to your Linear Layer. Hope it helps someone.