Gradients always 0 during training

Hi,
I am training a classification neural network but the loss function is barely changing at each epoch, I checked the gradients of my network params and I found them to be 0 or e-05,e-10… I have tried normalization, decreasing hidden layers, changing activation functions but still no improvements, and I don’t think it is a problem of requires_grad=true, here is the code below

model:
def init(self,input_dim,hidden_dim,output_dim=4):
super(Sic_Detector,self,).init()
self.input=nn.Linear(in_features=input_dim,out_features=hidden_dim)
self.hidden1 = nn.Linear(in_features=hidden_dim, out_features=hidden_dim*3)
self.output= nn.Linear(in_features=hidden_dim, out_features=output_dim)
self.softmax=nn.Softmax(dim=0)

def forward(self,x):
    x=self.input(x)
    #x=self.hidden1(relu(x))
    x=self.output(relu(x))
    return  self.softmax(x)

Training block:

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(),lr=0.005)

for epoch in range(10):
running_loss = 0.0
for i, data in enumerate(trainloader, 0):

    inputs, labels = data
    inputs=inputs.requires_grad_(True)
    optimizer.zero_grad()
    outputs = model(inputs)
    loss = criterion(outputs, labels)

    loss.backward()

    running_loss += loss.item()
    optimizer.step()

Gradient values:

Layer: input.weight | Size: torch.Size([5, 2]) | Gradients:
tensor([[ 1.6471e-04, 6.5578e-05],
[ 0.0000e+00, 0.0000e+00],
[-8.7129e-05, -5.4855e-05],
[-5.7131e-05, -4.4586e-06],
[-2.4803e-06, -5.0209e-05]])
Layer: input.bias | Size: torch.Size([5]) | Gradients:
tensor([-1.5109e-04, 0.0000e+00, 5.9544e-06, -9.2206e-05, 3.3566e-05])
Layer: output.weight | Size: torch.Size([4, 5]) | Gradients:
tensor([[ 1.4060e-04, 0.0000e+00, 2.8602e-06, -1.1239e-04, 3.6894e-05],
[-1.3274e-04, 0.0000e+00, 2.4924e-04, -5.3653e-05, 9.1451e-05],
[-4.9264e-05, 0.0000e+00, -1.4066e-04, 9.7468e-05, -3.0539e-05],
[ 6.1543e-05, 0.0000e+00, -7.1362e-05, 4.6664e-05, -9.2642e-05]])

nn.CrossEntropyLoss expects raw logits so remove the self.softmax usage and pass the output of self.output to the criterion.

1 Like