Getting 5 random crops - TypeError: pic should be PIL Image or ndarray. Got <type 'tuple'>

duygusar · March 24, 2019, 12:28pm

I do transformations on images as below (which works with RandCrop): (it is from this dataloader script: https://github.com/jeffreyhuang1/two-stream-action-recognition/blob/master/dataloader/motion_dataloader.py)

def train(self):
training_set = motion_dataset(dic=self.dic_video_train, in_channel=self.in_channel, root_dir=self.data_path,
mode=‘train’,
transform = transforms.Compose([
transforms.Resize([256,256]),
transforms.FiveCrop([224, 224]),
#transforms.RandomCrop([224, 224]),
transforms.ToTensor(),
#transforms.Normalize([0.5], [0.5])
]))
print ‘==> Training data :’,len(training_set),’ videos’,training_set[1][0].size()
    train_loader = DataLoader(
        dataset=training_set, 
        batch_size=self.BATCH_SIZE,
        shuffle=True,
        num_workers=self.num_workers,
        pin_memory=True
        )

    return train_loader

But when I do try to get Five Crops, I get this error:

Traceback (most recent call last):
File “motion_cnn.py”, line 267, in
main()
File “motion_cnn.py”, line 51, in main
train_loader,test_loader, test_video = data_loader.run()
File “/media/d/DATA_2/two-stream-action-recognition-master/dataloader/motion_dataloader.py”, line 120, in run
train_loader = self.train()
File “/media/d/DATA_2/two-stream-action-recognition-master/dataloader/motion_dataloader.py”, line 156, in train
print ‘==> Training data :’,len(training_set),’ videos’,training_set[1][0].size()
File “/media/d/DATA_2/two-stream-action-recognition-master/dataloader/motion_dataloader.py”, line 77, in getitem
data = self.stackopf()
File “/media/d/DATA_2/two-stream-action-recognition-master/dataloader/motion_dataloader.py”, line 51, in stackopf
H = self.transform(imgH)
File “/media/d/DATA_2/two-stream-action-recognition-master/venv/local/lib/python2.7/site-packages/torchvision/transforms/transforms.py”, line 60, in call
img = t(img)
File “/media/d/DATA_2/two-stream-action-recognition-master/venv/local/lib/python2.7/site-packages/torchvision/transforms/transforms.py”, line 91, in call
return F.to_tensor(pic)
File “/media/d/DATA_2/two-stream-action-recognition-master/venv/local/lib/python2.7/site-packages/torchvision/transforms/functional.py”, line 50, in to_tensor
raise TypeError(‘pic should be PIL Image or ndarray. Got {}’.format(type(pic)))
TypeError: pic should be PIL Image or ndarray. Got <type ‘tuple’>

Getting 5 random crops, I should handle a tuple of images instead of a PIL image,

so I use Lambda, but then I get the error, at line 55, in stackopf flow[2*(j),:,:] = H

RuntimeError: expand(torch.FloatTensor{[5, 1, 224, 224]}, size=[224, 224]): the number of sizes provided (2) must be greater or equal to the number of dimensions in the tensor (4)

and when I try to set flow = torch.FloatTensor(5, 2*self.in_channel,self.img_rows,self.img_cols)

I get motion_dataloader.py", line 55, in stackopf flow[:,2*(j),:,:] = H

RuntimeError: expand(torch.FloatTensor{[5, 1, 224, 224]}, size=[5, 224, 224]): the number of sizes provided (3) must be greater or equal to the number of dimensions in the tensor (4)

when I multiply the train batchsize by 5 that is returned, I also get the same error.

duygusar · March 24, 2019, 4:01pm

Hi @Nikronic, I use the same github as here: (it is from this dataloader script (with minimal changes): https://github.com/jeffreyhuang1/two-stream-action-recognition/blob/master/dataloader/motion_dataloader.py

So basically as below. It takes clips from a video) and gets two flow images for each frame, it defines 10 channels for each so it amounts to 20 channels per frame with stackopf. I think you might get a better idea with the motion_dataloader script on github.

def getitem(self, idx):
#print (‘mode:’,self.mode,‘calling Dataset:getitem @ idx=%d’%idx)

    if self.mode == 'train':
        self.video, nb_clips = self.keys[idx].split('-')
        self.clips_idx = random.randint(1,int(nb_clips))
    elif self.mode == 'val':
        self.video,self.clips_idx = self.keys[idx].split('-')
    else:
        raise ValueError('There are only train and val mode')

    label = self.values[idx]
    label = int(label)-1 
    data = self.stackopf()

    if self.mode == 'train':
        sample = (data,label)
    elif self.mode == 'val':
        sample = (self.video,data,label)
    else:
        raise ValueError('There are only train and val mode')
    return sample