Data Loader seems to be pulling the same image

Jordan_Howell · June 26, 2020, 10:10am

Hi. It seems to me that my data loader is pulling the same image. It’s transforming it differently but it’s still the same image.

I would expect it to pull 10 different images with each batch of 10.

Below is the data loader:

class Inspection_Dataset(Dataset):
        """
        df: Dataframe containing all categorical, numerical and image columns
        numerical columns: list of numerical columns
        cat_columns: list of categorical columns
        image: column containing image file name
        root_dir: column containing root directory
        
        """
        def __init__(self, df, numerical_columns = None,
                     cat_columns = None,
                     image = None, 
                     root_dir = None, 
                     label = None, 
                     transform = None):
            
            #df
            self.df = df
            #transform
            self.transform = transform
            #image
            self.image_column = image
            self.root_dir = root_dir
            
            #length
            self.n = df.shape[0]
            
            #output column
            self.label = np.array(self.df.loc[:, label])
           
            #cat columns
            self.cat_columns = cat_columns if cat_columns else []
            self.numerical_columns = [col for col in df[numerical_columns]]
                            
            if self.cat_columns:
                for column in self.cat_columns:
                    df[column] = df.loc[:, column].astype('category')
                    df[column] = df[column].cat.codes
                self.cat_columns = np.array(df[cat_columns]) 
            else:
                self.cat_columns = np.zeros((self.n, 1))  
            
            #numerical columns
            if self.numerical_columns:
                self.numerical_columns = df[self.numerical_columns].astype(np.float32).values
            else:
                self.numerical_columns = np.zeros((self.n, 1))   
                     
        def __len__(self):
            return self.n

            
        def __getitem__(self, idx):
            idx = list(self.df.index)
            
            image = Image.open(os.path.join(self.df.loc[idx, self.root_dir].values[0],
                                              self.df.loc[idx, self.image_column].values[0]))
            
            image = self.transform(image)

            return self.label[idx], self.numerical_columns[idx], self.cat_columns[idx], image

Below is the code I used to test the data loader/print the images.

train_data = Inspection_Dataset(train_sample,
                                numerical_columns = numerical_columns,
                                cat_columns = non_loca_cat_columns,
                                image = 'file',
                                root_dir = 'root',
                                label = 'target',
                                transform = train_transform)

train_loader = DataLoader(train_data, batch_size = 10, shuffle = True)

count = 50
for i in range(count):
    for b , (label, numericals, cats, image) in enumerate(train_loader):
        break
   
print('Label:', label.numpy())

im = make_grid(image, nrow=5)  # the default nrow is 8

# Inverse normalize the images
inv_normalize = transforms.Normalize(
    mean=[-0.485/0.229, -0.456/0.224, -0.406/0.225],
    std=[1/0.229, 1/0.224, 1/0.225]
)
im_inv = inv_normalize(im)

# Print the images
plt.figure(figsize=(12,4))
plt.imshow(np.transpose(im_inv.numpy(), (1, 2, 0)));


What am I missing here?

ptrblck · June 26, 2020, 10:48am

Could you check the path you are using to open the image:

os.path.join(self.df.loc[idx, self.root_dir].values[0], self.df.loc[idx, self.image_column].values[0])

Could it be that .values[0] is indexing the first image only?

Jordan_Howell · June 26, 2020, 11:00am

The image path is legit. When I take the [0] off of the values, I get the following error:


  File "<ipython-input-120-d05164b59f9d>", line 79, in <module>
    for label, numericals, cats, image in train_loader:

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\torch\utils\data\dataloader.py", line 345, in __next__
    data = self._next_data()

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\torch\utils\data\dataloader.py", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]

  File "<ipython-input-120-d05164b59f9d>", line 57, in __getitem__
    self.df.loc[idx, self.image_column].values))

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\ntpath.py", line 76, in join
    path = os.fspath(path)

TypeError: expected str, bytes or os.PathLike object, not numpy.ndarray

I get the same if I take the whole values[0] off.

Jordan_Howell · June 26, 2020, 12:43pm

When I put value[i] I get different pictures but the same picture per batch like the below:

I think it’s something to do with batches. When I change to batch of 1, I get different pictures each time. I’m not sure how batches works with the custom datasets and how to get them to pull correctly.

ptrblck · June 27, 2020, 8:23am

The Dataset.__getitem__ will get an index as the argument from the DataLoader to create a batch.
These indices will be in the range [0, len(dataset)-1], so you would only have to make sure to load each corresponding sample using the passed index.

I would still recommend to check the path and make sure that the idx is used to load the right path.