Error when iterating over train_loader

I’m training my ResNet 50 with the following code:


However, I’m running into an IndexError when I try to iterate over train_loader

Because my image and labels data were separate, I created an array with the image and labels (not sure if this is where the error lies):

train_data = []
for i in range(len(train_files)):
   train_data.append([train_files[i], train_y[i]])

train_loader = DataLoader(train_data, batch_size = batch_size, 
                         sampler = train_sampler, num_workers = num_workers)

Any suggestions on how to deal with this error?

Could you post the entire error message as I’m currently unsure how your DataLoader would work.
It seems you might be passing a list of lists containing file paths, which the DataLoader should not be able to handle. If train_files contains paths to the actual data, you would need to load and process each sample in a Dataset first before passing it to the DataLoader.

train_files does contains paths to the actual data. When you say load and process each sample, what do you exactly mean - would I need to pass each sample into say an open_img function which loads each individual image?

full error message:

Yes, you could use e.g. ImageFolder if the data structure meets the requirements or implement a custom Dataset as described here.

I created my custom Dataset using the tutorial mentioned above and my train_loader

transformed_dataset = BreastCancerImages(csv_file='/kaggle/input/rsna-breast-cancer-detection/train.csv',
                                           root_dir='/kaggle/input/rsna-mammography-images-as-pngs/images_as_pngs/train_images_processed',
                                           transform= train_transform)

train_loader = DataLoader(transformed_dataset, batch_size=batch_size
                          ,num_workers=num_workers, shuffle = True)

But when I go to train my model:


it gives me the following error: pic should be PIL Image or ndarray. Got <class ‘dict’>

How should I adjust my code to fix this error?

I don’t know how you’ve defined your custom Dataset but based on the error message a transformation fails as you are passing a dict to it instead of a PIL.Image or numpy array as seen here:

transform = transforms.ToTensor()

# fails
transform(dict())
# TypeError: pic should be PIL Image or ndarray. Got <class 'dict'>

# works
x = np.random.randint(0, 256, (224, 224, 3)).astype(np.uint8)
out = transform(x)

This is my custom dataset, so would it still work if I converted it to a Tensor object?

class BreastCancerImages(Dataset):

def __init__(self, csv_file, root_dir, transform=None):
    """
    Args:
        csv_file (string): Path to the csv file with annotations.
        root_dir (string): Directory with all the images.
        transform (callable, optional): Optional transform to be applied
            on a sample.
    """
    self.d_train = pd.read_csv(csv_file)
    self.root_dir = root_dir
    self.transform = transform

def __len__(self):
    return len(self.d_train)

def __getitem__(self, idx):
    if torch.is_tensor(idx):
        idx = idx.tolist()

    img_name = list(glob(os.path.join(imageFilesDir, "**", "*.png")))[idx]
    image = Image.open(img_name)
    labels = self.d_train.iloc[idx, 6]
    labels = np.array([labels])
    labels = torch.tensor(labels.astype('float'))

    return image, labels

I don’t know where the dict is created and why the error is raised.
However, you should transform the PIL.Image to a tensor before returning it in the __getitem__ method.