My dataloader is not working properly for validation

Hi all

I am truly struggling with my data loader for 5 days :frowning: I couldn’t find where the error is in my code

I would really appreciate it if anyone can read my code and help fix the issue

I am training the restnet 50 for a multi-classification task, I have three classes (0,1,2)

my data is organized as the following
|----------training.csv ( contains file_name: image1.png, label: 0)
|-----------------------Folder: training_image (contain all training images, e.g. image1.png ,image2.png…etc)

|----------valid.csv ( contains file_name: image1.png, label: 1)
|-----------------------Folder: valid_image (contain all validation images, e.g. image1.png ,image2.png…etc)

|----------test.csv ( contains file_name: image1.png, label: 2)
|-----------------------Folder: testing_image (contain all testing images, e.g. image1.png ,image1.png…etc)

This is my custom data loader

means = [0.485, 0.456, 0.406]
stds = [0.229, 0.224, 0.225]




train_transforms = transforms.Compose([
                           transforms.Resize(224),
                           transforms.ToTensor(),
                           transforms.Normalize(mean=means,
                                                std=stds)
                       ])




class CustomDataset(torch.utils.data.Dataset):
    
    def __init__(self, csv_path, images_folder, transform=None):
        df = pd.read_csv(csv_path)
        self.images_folder = images_folder
        self.images_name = df['file_name']
        self.y = df['label']
        self.transform = transform

    def __len__(self):
        return self.y.shape[0]

    def __getitem__(self, index):
        img_path=os.path.join(self.images_folder,
                              self.images_name[index])
        image=Image.open(img_path)
        y_label = self.y[index]
        
        if self.transform is not None:
            image = self.transform(image)
        return (image, y_label)

Here I am calling it 
#training files and folder
train_csv='train_patches.csv'
train_images_dir= 'TRAIN_Patches'
# validation files and folder
val_csv='val_patches.csv'
val_images_dir= ' VALID_Patches'

#testing files and folder
test_csv='test_patches.csv'
test_images_dir= 'TEST_Patches'


train_data = CustomDataset(csv_path=train_csv,images_folder=train_images_dir,transform=train_transforms )
test_data =  CustomDataset(csv_path=test_csv,images_folder=test_images_dir,transform=train_transforms)
val_data =  CustomDataset(csv_path=val_csv,images_folder=val_images_dir,transform=train_transforms)

train_loader = DataLoader(dataset=train_data, batch_size=12,drop_last=True,shuffle=True,num_workers=0)
val_loader = DataLoader(dataset=val_data, batch_size=6,drop_last=True,shuffle=False,num_workers=0)
test_loader = DataLoader(dataset=test_data, batch_size=6,drop_last=True,shuffle=False,num_workers=0)

model = models.resnet50(pretrained=True)

loss_fn = torch.nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr = 0.003)
val_count = len(val_loader)
train_count = len(train_loader)

def train_one_epoch(epoch, model, loss_fn, optimizer, train_loader):
    #put model in training state
    model.train()
    model.to(device=DEVICE,dtype=torch.float32)
    
    running_loss = 0    
    for (imgs, labels) in train_loader:
        imgs = imgs.to(device=DEVICE,dtype=torch.float32)
        labels = labels.to(device=DEVICE,dtype=torch.long)        
        preds = model(imgs)
        loss = loss_fn(preds, labels)
        loss.backward()
        
        optimizer.step()
        optimizer.zero_grad()
        
        running_loss+= loss.item()
    print('Epoch {} avg Training loss: {:.3f}'.format(epoch, running_loss/len(train_loader))) 
    
    return running_loss/len(train_loader)
        
def valid_one_epoch(epoch, model, loss_fn, test_loader):
    with torch.no_grad():
        model.eval()
        model.to(device=DEVICE,dtype=torch.float32)

    
        running_loss = 0
        actual_labels = []
        pred_labels = []
    
        for (imgs, labels) in test_loader:
            imgs = imgs.to(device=DEVICE,dtype=torch.float32)
            labels = labels.to(device=DEVICE,dtype=torch.long)
        
            log_preds = model(imgs)
            loss = loss_fn(log_preds, labels)
        
        
            preds = torch.exp(log_preds)
            running_loss+=loss.item()
        
        #calculate accuracy
            top_prob, top_class = preds.topk(1, dim=1)
            pred_labels+= list((top_class.view(-1)).detach().cpu().numpy()) 
            actual_labels+= list(labels.detach().cpu().numpy())
            model.train()
        
        
    
        accuracy = ((np.array(pred_labels)==np.array(actual_labels)).sum())/len(test_data) #size of test set
        print('Epoch {} avg Valid loss: {:.3f}'.format(epoch, running_loss/len(test_loader)))
        print('Epoch {} Valid accuracy: {:.1%}\n'.format(epoch, accuracy))
    
    return running_loss/len(test_loader)

train_losses = []
valid_losses = []
for epoch in range(10):
    train_loss = train_one_epoch(epoch, model, loss_fn, optimizer, train_loader)
    train_losses+= [train_loss]
    valid_loss = valid_one_epoch(epoch, model, loss_fn, test_loader)
    valid_losses+=[valid_loss]
    
    
    if len(valid_losses)>1 and (valid_loss<min(valid_losses[:-1])) :
        print(epoch)
        torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': loss_fn
            }, 'checkpoint.tar')

@ptrblck May I have the benefit of your thoughts on this issue?

I’m not sure what the current issue is. Are you seeing any errors or unexpected behavior?

Thanks for the reply.

Yes, there is unexpected behaviour. The data loader seems to read only the first file of my dataset, which is the training.csv. It doesn’t read the val.csv; as a result, when training the model I get a very low validation accuray. I hope you got my point!

Is it correct to read several files in the init ? should I place the reading statement in the geitem ?

I’m not sure how this could happen as it seems you are properly passing the different csv file paths and folder names to the CustomDatasets:

train_data = CustomDataset(csv_path=train_csv,images_folder=train_images_dir,transform=train_transforms )
test_data =  CustomDataset(csv_path=test_csv,images_folder=test_images_dir,transform=train_transforms)
val_data =  CustomDataset(csv_path=val_csv,images_folder=val_images_dir,transform=train_transforms)

Could you add a print statement to the __init__ method and check what csv_path is set to?

I really appreciate your help. I don’t know why I didn’t think of that before!

I found out that the data loader was working fine and there was nothing wrong with it.

But I found another issue!

To find the source of the problem, I trained the resnet50 on 9 images (of 3 labels: cats:0, dogs:1, pandas:2) and discovered that the model was always predicting one class, resulting in a very low validation accuracy. My issue is probably the way I labelled the target in the CSV? which is as follows: The first 500 rows are labelled 0, the next 300 rows are labelled 1, and the last 290 rows are labelled 2. I’m using Cross-Entropy loss. Eventually, I have {0,1,2} labels.

What is the proper way to handle multi-class labels when using a custom data loader?

The label values in the range [0, 1, 2] are fine for 3 classes and expected.
Are you training the model on 9 images only? If so, your model would have a hard time to generalize and would overfit to these samples pretty quickly.
If so, how are these 9 samples related to the 500/300/290 rows of the DataFrame?

@ptrblck

Are you training the model on 9 images only?

Yes, I was originally training the ResNet on my dataset which is of size 500/300/290 and had the exact same behaviour on the 9 images (low validation accuracy). I used the 9 images to find out what was the source of the problem! Do you get my point?

Moreover, when I print the shape of my images and labels I get the following:

size of the image tensor: torch.Size([1, 3, 224, 224])
size of the label tensor: torch.Size([1]) 

Is the size of the label correct? shouldn’t be of size [1,3) since I have 3 labels?

Yes, it’s correct if you are passing class indices.

nn.CrossEntropyLoss expects a target tensor in the shape of [batch_size] if you are passing class indices and [batch_size, nb_classes] if you are passing probabilities. Assuming you are using the former approach, the size is correct.

I’m afraid this is expected as the model also might not be able to generalize from the ~1000 images.
Your dataset is quite tiny (MNIST uses 60k training/validation samples and 10k test samples and is considered a “toy dataset”).

Thank you so much for your continued support!

You are right, I tried on 13k 10k training/validation, 2000 testing and it worked.