Weights grad = 0 and predicted values don't change!

AhmdNassar · March 24, 2019, 7:28am

Hi
I’m new to pytorch…
when I build small model to predict hand sign is (0,1,2,3,4,5) I found out that grad for layers is 0 so model didn’t learn any thing… I don’t know why, I tried to classify FashionMNIST before that one and it work well, so I think problem could be In load H5 file to pytorch so that is my code for this part

keys = list(train_file.keys())
labels = list(train_file[keys[0]])
train_x = np.divide(np.array(train_file[keys[1]],dtype=np.float32),255)
train_y = np.array(train_file[keys[2]])
#-----------#
keys = list(test_file.keys())
test_labels=list(test_file[keys[0]])
test_x = np.divide(np.array(test_file[keys[1]],dtype=np.float32),255)
test_y = np.array(test_file[keys[2]])


#load data into pytorch loader 
train_y = np.reshape(train_y,(1080,1)) # pytorch expext labels with shape (N,1) not (N,)
train_x = np.reshape(train_x,(1080,3,64,64)) # pytorch expect channels first (3,w,h) not (w,h,3)
test_y = np.reshape(test_y,(120,1)) # pytorch expext labels with shape (N,1) not (N,)
test_x = np.reshape(test_x,(120,3,64,64)) # pytorch expect channels first (3,w,h) not (w,h,3)
# load data into data loader 
train_x = torch.stack([torch.Tensor(i) for i in train_x])
train_y = torch.stack([torch.Tensor(i) for i in train_y])
test_x = torch.stack([torch.Tensor(i) for i in test_x])
test_y = torch.stack([torch.Tensor(i) for i in test_y])
train_dataset = torch.utils.data.TensorDataset(train_x,train_y)
test_dataset = torch.utils.data.TensorDataset(test_x,test_y)

trainloader = DataLoader(train_dataset,batch_size=32,shuffle=True)
testloader = DataLoader(test_dataset,batch_size=32,shuffle=True)

Kushaj · March 24, 2019, 8:18am

Have you tried printing or plotting the results of train dataset.

AhmdNassar · March 25, 2019, 5:07am

if you mean predicted values of train datasets, yes I do… and it’s all the same values like [1,0,0,0,0,0]
which mean the model predict all images are the same

Kushaj · March 25, 2019, 7:00am

No not predicted values. I am referring to the input images from the dataloader.

AhmdNassar · March 25, 2019, 2:48pm

yes, I thought that might be the problem too so I tried to do that today and it worked fine
could this be the problem (.long) I mean?

 loss = criterion(outputs, torch.squeeze(labels).long())

and if I remove it I get an error saying

Expected object of type torch.cuda.LongTensor but found type torch.cuda.FloatTensor for argument #2 ‘target’

and thank you for your help

Kushaj · March 25, 2019, 3:12pm

The type of the labels depends upon the loss function you are using. There are torch loss functions that require target values to be long and others require them to be float.