I have a image dataset with 40 classes, i am trying to classify them using a neural network.
The shape of my input tensor i (1,1,64,64), so the batch size is 1 and the image is grayscale.
i ran the following code , but i am getting initial loss in around 3.4 , which is more than 1. What could be the reason that my model is not learning ?
class neural_net(nn.Module):
def __init__(self):
super().__init__()
self.input = nn.Linear(64 * 64, 256)
self.hidden1 = nn.Linear(256, 256)
self.output = nn.Linear(256, 40)
def forward(self, x):
x = F.relu(self.input(x))
x = F.relu(self.hidden1(x))
x = self.output(x)
return F.log_softmax(x,dim=1)
network1 = neural_net()
optimizer = optim.Adam(network1.parameters(), lr = 0.001)
for epoch in range(10):
for data in train_dl:
X, y = data
network1.zero_grad()
output = network1(X.view(-1,4096))
loss = F.cross_entropy(output, y)
loss.backward()
optimizer.step()
print(loss)
Also there is another issue i found, i am giving cross_entropy loss to calculate the loss but i am getting NLLLossbackword in the output. What could be the reason for that.
During the first forward-pass, loss is expected to be around log(num_classes) ie log(40) = 3.688.
Adding to your point of loss>1, it is not necessary that loss should be always <1.
nn.CrossEntropyLoss() is actually a combination of nn.LogSoftmax() and nn.NLLLoss(). Refer this doc.
That’s why it when you print loss, it shows <grad_fn = NLLLossBackward>
Your code uses a LogSoftmax() in the last layer.
In that case, the loss function you must use will be F.nll_loss(output, y) instead of F.cross_entropy(output, y)
In the training loop, you might also want to zero out the optimizer gradients instead of the network .
So, try:
for epoch in range(10):
for data in train_dl:
X, y = data
optimizer.zero_grad()
output = network1(X.view(-1,4096))
loss = F.cross_entropy(output, y)
loss.backward()
optimizer.step()
print(loss)
Okay, then you can try increasing the layers in the neural network.
Also, just take one mini-batch and keep on using the same mini-batch and see if the loss really goes to zero or accuracy approaches 100%.
That’s one way to debug your network.