Returning multiple images with Dataloader

Mrig · June 24, 2020, 1:34pm

So I was working with a problem of Siamese network which requires the dataloader to output two random images and 1/0 based on if they are of the same class.

class Siamese(Dataset):
    def __init__(self,train_df):
        self.train_df=train_df
      
    def __len__(self):
        return(len(self.train_df))
    
    def __getitem__(self,idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
        img1=self.train_df.iloc[random.randint(0,177),:]
        img2=self.train_df.iloc[random.randint(0,177),:]
        while(img1['target']!=img2['target']):
            IMG1=img1.iloc[0:-1]
            IMG2=img2.iloc[0:-1]
            return IMG1,IMG2, torch.from_numpy(np.array([img1['target']==img2['target']],dtype=np.float32))

data=Siamese(digits.data)
vis_dataloader = DataLoader(data,
                        shuffle=True,
                           batch_size=2)
dataiter = iter(vis_dataloader)
next(dataiter)

gave me an error
TypeError Traceback (most recent call last)
in
----> 1 next(dataiter)

G:\ana\lib\site-packages\torch\utils\data\dataloader.py in next(self)
558 if self.num_workers == 0: # same-process loading
559 indices = next(self.sample_iter) # may raise StopIteration
–> 560 batch = self.collate_fn([self.dataset[i] for i in indices])
561 if self.pin_memory:
562 batch = _utils.pin_memory.pin_memory_batch(batch)

G:\ana\lib\site-packages\torch\utils\data_utils\collate.py in default_collate(batch)
68 return [default_collate(samples) for samples in transposed]
69
—> 70 raise TypeError((error_msg_fmt.format(type(batch[0]))))

TypeError: batch must contain tensors, numbers, dicts or lists; found <class ‘NoneType’>

Sorry if this a real silly question, I am new to pytorch and coding.

Nikronic · June 24, 2020, 7:37pm

Hi,

I think there is problem with your conditioning that leads to NoneType in IMG1 and IMG2. Try to write python a function outside of and test that your approach is currect or not. Actually, I do not know about your data structure but I cannot understand the idea behind imgx and IMGx, mostly here.

I tried to simulate your case using mnist and found no error except that IMG1 and IMG2 cannot be pandas series objects and you need to convert them to np.array.
In this simulation, each input in dataframe has 769 columns where first ones are target label and other 768 are 28x28 pixels of image.
Here is the code I used for MNIST:


class Siamese(data.Dataset):
    def __init__(self,train_df):
        self.train_df=train_df
      
    def __len__(self):
        return(len(self.train_df))
    
    def __getitem__(self,idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
        img1=self.train_df.iloc[random.randint(0,self.__len__()),:]
        img2=self.train_df.iloc[random.randint(0,self.__len__()),:]
        while(img1[0]!=img2[0]):
            img1=self.train_df.iloc[random.randint(0,self.__len__()),:]
            img2=self.train_df.iloc[random.randint(0,self.__len__()),:]
            # img1 = img1.iloc[0:-1]
            # img2 = img2.iloc[0:-1]
            return np.array(img1[0]), np.array(img2[0]), torch.from_numpy(np.array([img1[0]==img2[0]],dtype=np.float32))

train_data=pd.read_csv('sample_data/mnist_train_small.csv')
dataset_train=Siamese(train_data)
vis_dataloader = data.DataLoader(dataset_train, shuffle=True, batch_size=1)
dataiter = iter(vis_dataloader)
next(dataiter)

Bests

Mrig · June 25, 2020, 5:35am

Thank You very much.

IMG1=img1.iloc[0:-1]
IMG2=img2.iloc[0:-1]

the CSV, I was using had pixel values and in the last column the label.
So I had used the pixel values to store in the IMGx.