Model is Stucking at One Category

I have been working on PyTorch for quite a while, I’ve read several books, investigated other books but no matter what I do models I make try to predict the same output and never leave that area. Here is the latest example. The original model (made in Tensorflow) reaches 92% accuracy but when I copy it exactly as it is… I get stuck in a single category. I’ve been working on this for 3 months. I’ve thrown the towel and decided to search for external help.

This is the code.


import torch
import torch.nn as nn
from torchvision import datasets, transforms
from import DataLoader


train_path = “D:\Datasets\Pnemonia\train_set\”
val_path = “D:\Datasets\Pnemonia\val_set\”
test_path = “D:\Datasets\Pnemonia\test\”
train_transform = transforms.Compose([transforms.Resize((224, 224)),
transforms.Normalize(mean=[0.4799], std=[0.05077])

val_transform = transforms.Compose([transforms.Resize((224, 224)),
transforms.Normalize(mean=[0.4799], std=[0.05077])

trainset = datasets.ImageFolder(train_path, transform=train_transform) # TODO: create the ImageFolder
trainloader = DataLoader(trainset, batch_size=32, shuffle=True)

valset = datasets.ImageFolder(val_path, transform=val_transform) # TODO: create the ImageFolder
valloader = DataLoader(valset, batch_size=32, shuffle=True)

testset = datasets.ImageFolder(test_path, transform=val_transform) # TODO: create the ImageFolder
testloader = DataLoader(testset, batch_size=32, shuffle=True)


class PneModel(nn.Module):
def init(self) → None:
super(PneModel, self).init()
self.convBlock01 = self.convBlock(1, 32, (3, 3), 1)
self.convBlock02 = self.convBlock(32, 64, (3, 3), 1, 0.1)
self.convBlock03 = self.convBlock(64, 64, (3, 3), 1)
self.convBlock04 = self.convBlock(64, 128, (3, 3), 1, 0.2)
self.convBlock05 = self.convBlock(128, 256, (3, 3), 1, 0.2)
self.flatten = nn.Flatten()
self.linear01 = nn.Linear(12544, 128)
self.act01 = nn.ReLU()
self.dropout01 = nn.Dropout(0.2)
self.linear02 = nn.Linear(128, 2)
self.out = nn.Softmax(dim=1)

def forward(self, x):
    x = self.convBlock01(x)
    x = self.convBlock02(x)
    x = self.convBlock03(x)
    x = self.convBlock04(x)
    x = self.convBlock05(x)
    x = self.flatten(x)
    x = self.linear01(x)
    x = self.act01(x)
    x = self.dropout01(x)
    x = self.linear02(x)

def convBlock(self, ic, oc, ks, s, dropout=None):
    if dropout == None:
        model = nn.Sequential(nn.Conv2d(in_channels=ic, out_channels=oc, kernel_size=ks, stride=s, padding="same"),
                          nn.ReLU(), nn.BatchNorm2d(oc), nn.MaxPool2d((2, 2), stride=2))
        model = nn.Sequential(nn.Conv2d(in_channels=ic, out_channels=oc, kernel_size=ks, stride=s, padding="same"),
                          nn.ReLU(), nn.Dropout(dropout), nn.BatchNorm2d(oc), nn.MaxPool2d((2, 2), stride=2))


images, labels = next(iter(valloader))

model = PneModel()
device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)
model =


from import *
data = DataLoaders(trainloader, valloader)
learn = Learner(data, model, loss_func=F.nll_loss, opt_func=RMSProp, metrics=accuracy)

%%, 1e-3)


Hi Salih!

As I’m sure you are aware, but just to be clear, you can’t copy a tensorflow
model “exactly as it is” and have it work in pytorch.

Pytorch and tensorflow do similar things, but do them differently in detail,
so you have to translate a tensorflow model into pytorch. Make sure that
you understand how the tensorflow features that you are using work. Find
the pytorch features that do approximately the same thing, make sure that
you understand how they work, and then modify your code to account for
any differences.

It appears that you are passing the two outputs of a Linear through
Softmax and then using the “probabilities” produced by Softmax as your
model’s predictions that then get passed into your loss functions, nll_loss().

According to the documentation for NLLLoss (which fills in the details
for nll_loss()), nll_loss() expects log-probabilities (not probabilities)
for the predictions you pass in. You should therefore use LogSoftmax
(instead of Softmax) for your self.out layer.

Also print out the shape, type, and some sample values for the predictions
and targets that get passed into nll_loss() – similar-looking loss functions
in tensorflow and pytorch often have different requirements for the inputs
they expect.

Good luck!

K. Frank