Error when iterating over train_loader

Naina_K · March 12, 2023, 10:38pm

I’m training my ResNet 50 with the following code:

However, I’m running into an IndexError when I try to iterate over train_loader

Because my image and labels data were separate, I created an array with the image and labels (not sure if this is where the error lies):

train_data = []
for i in range(len(train_files)):
   train_data.append([train_files[i], train_y[i]])

train_loader = DataLoader(train_data, batch_size = batch_size, 
                         sampler = train_sampler, num_workers = num_workers)

Any suggestions on how to deal with this error?

ptrblck · March 12, 2023, 10:50pm

Could you post the entire error message as I’m currently unsure how your DataLoader would work.
It seems you might be passing a list of lists containing file paths, which the DataLoader should not be able to handle. If train_files contains paths to the actual data, you would need to load and process each sample in a Dataset first before passing it to the DataLoader.

Naina_K · March 12, 2023, 10:54pm

train_files does contains paths to the actual data. When you say load and process each sample, what do you exactly mean - would I need to pass each sample into say an open_img function which loads each individual image?

full error message:

ptrblck · March 12, 2023, 10:56pm

Yes, you could use e.g. ImageFolder if the data structure meets the requirements or implement a custom Dataset as described here.

Naina_K · March 13, 2023, 1:18am

I created my custom Dataset using the tutorial mentioned above and my train_loader

transformed_dataset = BreastCancerImages(csv_file='/kaggle/input/rsna-breast-cancer-detection/train.csv',
                                           root_dir='/kaggle/input/rsna-mammography-images-as-pngs/images_as_pngs/train_images_processed',
                                           transform= train_transform)

train_loader = DataLoader(transformed_dataset, batch_size=batch_size
                          ,num_workers=num_workers, shuffle = True)

But when I go to train my model:

it gives me the following error: pic should be PIL Image or ndarray. Got <class ‘dict’>

How should I adjust my code to fix this error?

ptrblck · March 13, 2023, 3:49am

I don’t know how you’ve defined your custom Dataset but based on the error message a transformation fails as you are passing a dict to it instead of a PIL.Image or numpy array as seen here:

transform = transforms.ToTensor()

# fails
transform(dict())
# TypeError: pic should be PIL Image or ndarray. Got <class 'dict'>

# works
x = np.random.randint(0, 256, (224, 224, 3)).astype(np.uint8)
out = transform(x)

Naina_K · March 14, 2023, 1:15am

This is my custom dataset, so would it still work if I converted it to a Tensor object?

class BreastCancerImages(Dataset):

def __init__(self, csv_file, root_dir, transform=None):
    """
    Args:
        csv_file (string): Path to the csv file with annotations.
        root_dir (string): Directory with all the images.
        transform (callable, optional): Optional transform to be applied
            on a sample.
    """
    self.d_train = pd.read_csv(csv_file)
    self.root_dir = root_dir
    self.transform = transform

def __len__(self):
    return len(self.d_train)

def __getitem__(self, idx):
    if torch.is_tensor(idx):
        idx = idx.tolist()

    img_name = list(glob(os.path.join(imageFilesDir, "**", "*.png")))[idx]
    image = Image.open(img_name)
    labels = self.d_train.iloc[idx, 6]
    labels = np.array([labels])
    labels = torch.tensor(labels.astype('float'))

    return image, labels

ptrblck · March 14, 2023, 1:28am

I don’t know where the dict is created and why the error is raised.
However, you should transform the PIL.Image to a tensor before returning it in the __getitem__ method.