Pytorch begginer struggling with data processing

Hi there!!
I am new in Pytorch and although I am already able to work for example with FCN-GCN to segmentation problems, for example, I have some general questions regarding PyTorch. When I try to use a new network, I always struggle in the data processing phase, in particular when writing a Dataset object class to read my data, transform my data and convert it to Tensor type. I always have errors on the network, in particular with image or label shapes, etc… If it is not a pain, I really would kindly ask if someone could explain to me what it the best way of handling data for a general problem of classification or segmentation. For example, right now I am trying to develop a CNN for classification binary problem, and I am struggling with the labels ‘0’ and ‘1’, I always get shape and data type errors on my CNN. Not sure if could be also related with the criterion I am using: nn.CrossEntropyLoss().
I was using this tutorial: https://www.pluralsight.com/guides/image-classification-with-pytorch, but I am using my own Dataset.

Thank you in advance, and I understand if this is too confusing or difficult to answer :confused:

This might help:
If you have images with label 0 and label 1 you can put them in folders like /train/0 and /train/1 for your trainsplit and /test/0 /test/1 for your testsplit.
Then you can define a dataloader like so:

import torchvision.transforms as transforms
import torchvision.datasets as dset

train_root = '/train'
train_set = dset.ImageFolder(root=train_root,
                                   transform=transforms.Compose([
                                       transforms.Resize(image_size),
                                       transforms.CenterCrop(image_size),
                                       transforms.ToTensor(),
                                       transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
                                       ]))
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size,
                                               shuffle=True, num_workers=workers)

Do the same for testset.

Since you have a binary classification you can use torch.nn.Sigmoid() together with torch.nn.BCELoss() or just torch.nn.BCEWithLogitsLoss() as loss function.

torch.nn.CrossEntropyLoss() should also work tho.

If you have size mismatches during forward pass you might want to look at your featuremaps. Check whether or not the out_channels and in_channels of your conv layers match as well as the in_features and out_features of your linear layer. Just printing x.shape() at some points might help you find and debug mismatches

Thank you for the response!! What you are saying is that I should put my labels in folders as data in specific folders, but why can’t I do like this?


class Dataset(object):
 #----Initializing the class----
    def __init__(self, train_names, masks_dir, img_dir,train, im_size):
        self.train_names=train_names
        self.masks_dir=masks_dir
        self.img_dir=img_dir
        self.train=train 
        self.im_size=im_size

   
    def __len__(self):
        return len(self.train_names)
    
    def __getitem__(self, idx):

      image = Image.open(self.img_dir + self.train_names[idx])
      target = np.zeros((1,1))

      if self.train == True:

        try:
          masks = Image.open(self.masks_dir + os.path.splitext(self.train_names[idx])[0]+'beard.jpg') 
          target[0] = 1
            #----------------------------------------------------------------------- 
        except:
          target[0] = 0
      image=np.array(image)
      if image.shape[2] == 4:

        image = cv2.cvtColor(image, cv2.COLOR_BGR2BGR)
      image=resize(image, (self.im_size , self.im_size),anti_aliasing=True)
      image = transforms.ToTensor()(image)
      image=image.float() 

      target = np.array(target)
      target = transforms.ToTensor()(target)

      sample = {'image': image, 'target': target}
      
      return sample

My code was just an example of how you can use pytorchs dataloader with your own images.
You can of course define your own dataloader as well.

yes, I know!! But what I sent is not working, and I would like to understand why!

I get this error: pic should be PIL Image or ndarray. Got <class ‘str’>

batch_size = 1
learning_rate = 0.001
threads=1
train_loader = torch.utils.data.DataLoader(train_dataset, num_workers=threads, batch_size=batch_size, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, num_workers=threads, batch_size=batch_size, shuffle=False)
class CNN(nn.Module): 
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=10, kernel_size=3)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=3)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(720, 1024)
        self.fc2 = nn.Linear(1024, 2)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(x.shape[0],-1)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return x
model = CNN()

cuda = torch.cuda.is_available()
print(cuda)
if cuda:
   model = model.cuda()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(),lr = learning_rate)
starting_epoch=0
epochs=50
best_dice=0
loss_accum = []

for epoch in range(starting_epoch, epochs):  
    print ('Running epoch ',epoch)
    train_loss=0
    val_loss=0
    val_dice=[]

    for phase in ['train','val']:

      if phase=='train':
        loader=train_loader
        model.train()
      else:  
        loader=val_loader
        model.eval()

      for batch in loader:
        if cuda:
          image = batch['image'].cuda()
          target = batch['target'].cuda()
 
        
        #print(image)
        image=Variable(image)
        target=Variable(target) 


        optimizer.zero_grad()
        with torch.set_grad_enabled(phase == 'train'):
          output = model.forward(image)
          output=(output.squeeze(0))
                    
          loss = criterion(output, target) 
         
          if phase=='train':
            loss.backward()
            loss=loss.item()
            optimizer.step()
            train_loss=train_loss+loss*image.size(0)      
          else:
            loss=loss.item()
            val_loss=val_loss+loss*image.size(0)         
            dice=Dice(target.detach().cpu().numpy(),output.detach().cpu().numpy())
            print('Dice score: ',dice)
            val_dice.append(dice)  
          
          print('Loss: ',loss*image.size(0))   
           

yes, I know!! But what I sent is not working, and I would like to understand why!

I get this error: RuntimeError: size mismatch, m1: [1 x 3920], m2: [720 x 1024] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:283

batch_size = 1
learning_rate = 0.001
threads=1
train_loader = torch.utils.data.DataLoader(train_dataset, num_workers=threads, batch_size=batch_size, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, num_workers=threads, batch_size=batch_size, shuffle=False)
class CNN(nn.Module): 
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=10, kernel_size=3)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=3)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(720, 1024)
        self.fc2 = nn.Linear(1024, 2)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(x.shape[0],-1)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return x
model = CNN()

cuda = torch.cuda.is_available()
print(cuda)
if cuda:
   model = model.cuda()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(),lr = learning_rate)
starting_epoch=0
epochs=50
best_dice=0
loss_accum = []

for epoch in range(starting_epoch, epochs):  
    print ('Running epoch ',epoch)
    train_loss=0
    val_loss=0
    val_dice=[]

    for phase in ['train','val']:

      if phase=='train':
        loader=train_loader
        model.train()
      else:  
        loader=val_loader
        model.eval()

      for batch in loader:
        if cuda:
          image = batch['image'].cuda()
          target = batch['target'].cuda()
 
        
        #print(image)
        image=Variable(image)
        target=Variable(target) 


        optimizer.zero_grad()
        with torch.set_grad_enabled(phase == 'train'):
          output = model.forward(image)
          output=(output.squeeze(0))
                    
          loss = criterion(output, target) 
         
          if phase=='train':
            loss.backward()
            loss=loss.item()
            optimizer.step()
            train_loss=train_loss+loss*image.size(0)      
          else:
            loss=loss.item()
            val_loss=val_loss+loss*image.size(0)         
            dice=Dice(target.detach().cpu().numpy(),output.detach().cpu().numpy())
            print('Dice score: ',dice)
            val_dice.append(dice)  
          
          print('Loss: ',loss*image.size(0))   
           

All the transformations listed under Transforms on PIL Image need to be done on PIL image and not a tensor or ndarray.
Meaning they should be done before torchvision.transforms.ToTensor(), because this transforms your PIL image or ndarry to a tensor.
I you need, you can also do it in the opposite direction and transform a tensor or a ndarray to a PIL image using torchvision.transforms.ToPILImage()

I think you have a size mismatch when passing x into the first linear layer (self.fc1). Please check if x.shape (before passing into fc1) is really 720 in the second dimension.