iβm work on real time gender, expression detection
i created model using pytorch but el maximum accuracy the model reach < 50% but the same model with same data and same configuration in keras model reach 99% accuracy what is the reason for this
this is link for colab notebook contains both models an the training steps
https://colab.research.google.com/drive/1knoJrk-P3I0V2ZRfEV40CWC2lvZ-oEFR
Are the hyperparameters exactly the same for each model? Itβs quite hard to parse the training logs - itβd be nice if you showed a graph of the loss, accuracy etc over time for each model.
From what I can see though, your keras model converges to almost 0 training loss, whereas the PyTorch model seems to increase in training loss. It looks as though the learning rate in PyTorch is 0.1 wheras you use 0.001 in Keras, this could be the issue. For training loss to increase, itβs usually because your learning rate is too high.
models are totally separated form each other they just have same cfgs ,data and model stricture
i tried both with 0.001 and 0.01 and 0.1 gives same situation what use see is just experiment
and accuracy in pytorch model spot increasing after it reach 51% and some times after this the accuracy decrease to
Does the training loss continue to decrease, and approach 0? If not, you have a problem.
no itβs most of time increasing and at any epoch validation loss become less than el minimum validation loss of training process but the accuracy still low and it return again to the higher accuracy an start to increase and so on, look like training process work for first epochs and el process be come random after this
If the loss is increasing then perhaps even 0.001 is too high for the learning rate. Accuracy is not a great metric to look at to debug the code. Basically any neural network should be able to fully learn the training data if you train it for long enough, otherwise you dont have enough parameters in the network. So you should focus on trying to get the training loss to converge to 0. If the loss isnt decreasing, your model isnt training correctly.
so why 0.001 lr works well in keras
what is your advice for my to do?
Blockquote
so why 0.001 lr works well in keras
It could be any number of factors. Youβd have to investigate the source code for every layer to see if it is implemented exactly the same as in PyTorch. Maybe something simple like a learning rate decay is implemented by default in Keras.
Blockquote
what is your advice for my to do?
Try training a pytorch model with a lower learning rate, until you actually observe a consistent decrease in training loss throughout training. Donβt worry about valid accuracy for now. If train loss does not decrease with a lower LR, then something is wrong with either the data, the labels, the model architecture or the training script. I canβt really help much for debugging these. Just remember that PyTorch will not necessarily do everything in exactly the same way as Keras by default.
I tried some experiments and discovered that
All models in pytorch works will with same data and training steps and any configurations except resnet model either my implementation or the pytorch implementation
I tried to remove normalizing transformation, freezing batch norm layer,used sgd or adam optimizer, and use lower leaning rate 0.0000001
Nothing work
Is pytorch has problem with resnet architecture
Blockquote
Is pytorch has problem with resnet architecture
Nope there are plenty of examples of resnet implemented correctly in PyTorch.
Are you passing network outputs through a softmax before computing the loss with nn.CrossEntropyLoss()? If you read the documentation youβll notice that CrossEntropyLoss() actually takes logits as input.
no iβm taking the output directly form the resnet to el CrossEntropyLoss() without apply any thing
Well in the code you posted, the model is comprised of a feature extractor and a classifier. The classifier has x = self.softmax(x) as the final function. Without seeing the actual code youβre using I canβt help much, and Iβm not going to debug your entire script
no this old code iβm now using pytorch model
sorry about that
model_ft = models.resnet18(pretrained=False)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 10)
net = model_ft
This is the actual and full code that iβm using
data:
import torch
import torchvision
import torchvision.transforms as transforms
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=2)
optimizer and loss
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
model:
model_ft = models.resnet18(pretrained=False)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 10)
net = model_ft
train:
net.cuda()
for epoch in range(15): # loop over the dataset multiple times
acc= 0
train_acc=0
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
inputs, labels = inputs.cuda(), labels.cuda()
# zero the parameter gradients
optimizer.zero_grad()
labels = labels.squeeze()
# forward + backward + optimize
outputs = net(inputs)
# outputs = outputs.view(outputs.shape[0],outputs.shape[1])
# labels = labels.squeeze()
# print(outputs.shape,labels.shape)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
for i in range(len(labels)):
if labels[i] == torch.argmax(outputs[i]):
acc+=1
# print statistics
running_loss += loss.item()
ps = torch.exp(outputs)
top_p, top_class = ps.topk(1, dim=1)
equals = top_class == labels.view(*top_class.shape)
train_acc += torch.mean(equals.type(torch.FloatTensor))
# if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f acc: %.3f ' %
(epoch + 1, i + 1, running_loss,train_acc ))
print(acc)
acc=0
running_loss = 0.0
train_acc = 0
print('Finished Training')
You should initialise the optimiser after creating the model, and after sending the model to cuda
itβs working now
i think my model was suffering from many problems not one
as you say
1- final layer contains softmax and it makes train much slower
2- order of initializing model and itβs optimizer, it was train on model but optimizer steps takes on other model
3- i was using batched gd and keras model was used sgd so that keras converged much faster
i donβt know how could thank you for your help and your patience
thanks a lot
i hope for your great future and good life
Iβm have the same issue with Unet. In keras it converge nicely with no issue, but in pytorch it is not. it diverge after 4 to 5 epochs. Although I have checked the architecture implementations, and I donβt see any differences between the keras and pytorch implementation and the number of trainable parameters. How to solve the divergence of pytorch model? any idea why this is happening?
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 603/603 [05:18<00:00, 1.91it/s]
epoch 1/20, loss 0.11540429561613606, dice 0.5524536604077167
saving the model ...
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 603/603 [05:20<00:00, 1.89it/s]
epoch 2/20, loss 0.030815164163202097, dice 0.7417304706613025
saving the model ...
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 603/603 [05:19<00:00, 1.90it/s]
epoch 3/20, loss 0.025856498361659385, dice 0.7701306981134968
saving the model ...
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 603/603 [05:20<00:00, 1.89it/s]
epoch 4/20, loss 0.0235783447655131, dice 0.7951608874509188
saving the model ...
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 603/603 [05:20<00:00, 1.90it/s]
epoch 5/20, loss 0.022023522508196273, dice 0.8102755615762612
saving the model ...
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 603/603 [05:16<00:00, 1.93it/s]
epoch 6/20, loss 1.5251818179503502, dice 0.4822449768164751
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 603/603 [05:14<00:00, 1.94it/s]
epoch 7/20, loss 2.747159472943143, dice 0.1791747296810478
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 603/603 [05:12<00:00, 1.95it/s]
epoch 8/20, loss 2.7447210459369136, dice 0.17914873635318582
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 603/603 [05:12<00:00, 1.96it/s]
epoch 9/20, loss 2.744714410941597, dice 0.17913664069644042
Try to use ten crop transformation
I found the issue. I will mention it here for anyone who come across this issue later.
Iβm using Adam where the default eps in Keras is 1e-7, whereas the default value in Pytorch is 1e-8β¦ Also the default Conv2d initialization is different from Keras.
After fixing both eps and initialization. It converged nicely like what it did in Keras.