My model give 0.42 accuracy, what is the problem

I have a private dataset of 413 samples for Diabetic Retinopathy classification.
I used pre-trained Resnet50, but the accuracy for training is poor, I need some one to tell me why my model doesn’t learn.

this the code :
#preprocessing and data augmentation
train_transforms = transforms.Compose([
transforms.ToPILImage(),# Convert a tensor or an ndarray to PIL Image (may be I need it)
transforms.CenterCrop(2300),
transforms.Resize((224,224)),
transforms.RandomRotation(90), # the training set is small so I will implement some Augmentations (1)
transforms.AugMix(), # (2)
transforms.RandomHorizontalFlip(p=0.4), # (3)
transforms.RandomVerticalFlip(p=0.4),# (4)
transforms.ToTensor(),
transforms.Normalize(mean=(0.485,0.456,0.406), std=(0.299, 0.224, 0.225)),

])
#preprocessing and data augmentation

In the testing set we don’t implement Augmentation

test_transforms = transforms.Compose([
transforms.ToPILImage(),# Convert a tensor or an ndarray to PIL Image (may be I need it)
transforms.CenterCrop(2500),
transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize(mean=(0.485,0.456,0.406), std=(0.299, 0.224, 0.225)),

])

#Our own custom for datasets

class createDataset(Dataset):
def init (self,df_data,data_dir=“/content/drive/MyDrive/Dia_dataset”, transform=None):
super().init()
self.df=df_data.values
self.data_dir=data_dir
self.transform=transform

def __len__(self):
    return len(self.df)

def __getitem__(self, index):
    img_name,label=self.df[index]
    img_path=os.path.join(self.data_dir,img_name+'.jpg')
    image=cv2.imread(img_path)
    if self.transform is not None:
       image=self.transform(image)
    return image, label

Creat training/validation/testing dataset,

train_data=createDataset(df_data=training_csv,data_dir=train_path,transform=train_transforms)

valid_data=createDataset(df_data=training_csv,data_dir=train_path,transform=test_transforms)

test_data=createDataset(df_data=testing_csv,data_dir=test_path,transform=test_transforms)

np.random.seed(13)

Split the trainig set into(80% training , 20% Validation)

validation_size=0.2

num_train=len(train_data)

indices=list(range(num_train))

np.random.shuffle(indices)

split=int(np.floor(validation_size*num_train))

trainloader=torch.utils.data.DataLoader(train_data,batch_size=10, sampler=train_sampler, num_workers=workers)

sample the validation dataset from a separate dataset the doesn’t include the image aug transformations.

valloader=torch.utils.data.DataLoader(valid_data,batch_size=10, sampler=validation_sampler, num_workers=workers)

testloader=torch.utils.data.DataLoader(test_data,batch_size=10, num_workers=workers)

device = torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’)

model=models.resnet50(pretrained=True)

Disable gradients on all model parameters to freeze the weights

for param in model.parameters():
param.requires_grad = False

n_inputs = model.fc.in_features
out_feature=5

Replace the final fully connected resnet layer with a 2 fc layer network and sigmoid output

model.fc = nn.Sequential(nn.Linear(n_inputs, 512),
nn.ReLU(),
nn.Linear(512,out_feature),
nn.LogSoftmax(dim=1))

for param in model.fc.parameters():
param.requires_grad = True

Unfreeze the last few layers of the model

for param in model.layer4.parameters():
param.requires_grad = True

the training model I use the one in PyTorch website:

Specify loss function and optimizer

criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(model.fc.parameters(), lr = 0.001, momentum=0.9)

‘’’

criterion = nn.CrossEntropyLoss()

optimizer = optim.Adam(model.parameters(), lr=0.00001)

‘’’

exp_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

model.to(device)

(The model)---------------------------
dataloaders = {‘train’:trainloader, ‘val’: valloader}
dataset_sizes = {‘train’: 331, ‘val’: 82}

import os
from tempfile import TemporaryDirectory

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
since = time.time()
acc_validation =
acc_training =
loss_validation =
loss_training =

# Create a temporary directory to save training checkpoints
with TemporaryDirectory() as tempdir:
    best_model_params_path = os.path.join(tempdir, 'best_model_params.pt')

    torch.save(model.state_dict(), best_model_params_path)
    best_acc = 0.0

    for epoch in range(num_epochs):
        print(f'Epoch {epoch +1}/{num_epochs }')
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)
                labels = labels.squeeze().long()

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            if phase == 'train':
                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]
            if phase == 'val':
                acc_validation.append(epoch_acc.item())
                loss_validation.append(epoch_loss)

            elif phase =='train':
                acc_training.append(epoch_acc.item())
                loss_training.append(epoch_loss)

            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                torch.save(model.state_dict(), best_model_params_path)

        print()

    time_elapsed = time.time() - since
    print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
    print(f'Best val Acc: {best_acc:4f}')

    # load best model weights
    model.load_state_dict(torch.load(best_model_params_path))
return model, acc_validation, acc_training, loss_validation, loss_training

Hi Shatha!

There’s too much code here to look at it all, but one thing did catch
my eye.

Remove the LogSoftmax from your Sequential. CrossEntropyLoss
has LogSoftmax built into it so it expects its input to come directly from
the final Linear layer of your model.

(If you need to use LogSoftmax for some reason – and you probably
don’t – use NLLLoss as your loss criterion.)

Edit: Let me correct what I said here:

Using LogSoftmax here is unnecessary (and therefore ever so slightly
inefficient), but not harmful. In short, this is because two LogSoftmaxs
in a row is the same as only one so that feeding the output of a model
through LogSoftmax and into CrossEntropyLoss is the same as feeding
the output directly into CrossEntropyLoss.

Consider:

>>> import torch
>>> torch.__version__
'2.1.0'
>>> _ = torch.manual_seed (2023)
>>> pred = torch.randn (5, 3)
>>> targ = torch.randint (3, (5,))
>>> pred_lsm = torch.nn.LogSoftmax (1)(pred)
>>> torch.nn.CrossEntropyLoss() (pred, targ)
tensor(1.2384)
>>> torch.nn.CrossEntropyLoss() (pred_lsm, targ)
tensor(1.2384)
>>> pred
tensor([[-1.2075,  0.5493, -0.3856],
        [ 0.6910, -0.7424,  0.1570],
        [ 0.0721,  1.1055,  0.2218],
        [-0.0794, -1.0846, -1.5421],
        [ 0.9377, -0.9787,  2.0930]])
>>> pred_lsm
tensor([[-2.2048, -0.4480, -1.3829],
        [-0.6015, -2.0348, -1.1354],
        [-1.6038, -0.5705, -1.4541],
        [-0.4685, -1.4737, -1.9312],
        [-1.4638, -3.3801, -0.3084]])
>>> torch.nn.LogSoftmax (1)(pred_lsm)
tensor([[-2.2048, -0.4480, -1.3829],
        [-0.6015, -2.0348, -1.1354],
        [-1.6038, -0.5705, -1.4541],
        [-0.4685, -1.4737, -1.9312],
        [-1.4638, -3.3801, -0.3084]])

(You can think of this as follows: LogSoftmax “normalizes”
unnormalized log-probabilities into normalized log-probabilities. The
second LogSoftmax normalizes the log-probabilities a second time,
but, because they are already normalized, nothing further happens.)

So your LogSoftmax is not the cause of your problem.

Best.

K. Frank