Hi All,
I have a network that takes three images in the input layer. Now the three images must be frames of the same video (this can be known only from the filename of the image). I understood how to change the dataloader’s __getitem__
to send multiple inputs from here. But how do I make sure three images from the same video are chosen and that none of them are repeated in the same epoch ?
It depends a bit on the current way of storing these files.
Did you store each image from a video in a corresponding folder?
If so, how would you like to sample the files, if their amount is not divisible without a remainder by 3? Would you like to drop the last files or fill it up with repetition?
Should the frames be contiguous in a sample or would you like to shuffle the frames from a single video?
Also, would you like to shuffle the video folders or should the videos be loaded in a consecutive way?
Thanks for replying, @ptrblck
Currently all images are together in test, train and val folders. If amount not divisible by 3, dropping the last frame is no problem . The frames from a single video need not be contiguous. It’s fine as long as they are from the same video.
The video folders (I can change my dataset to that format) can be shuffled freely. There’s no order among the videos, and the order of the frames in a single video can also be ignored.
Here is a small dummy example using multiple video folders.
Note that I’ve used tensors directly, so you should add your frame loading logic into the Dataset
.
class MyDataset(Dataset):
def __init__(self, videos, transform=None, nb_frames=3):
self.nb_frames = nb_frames
self.transform = transform
# Crop data to multiple of nb_frames
self.data = [v[:-(len(v)%self.nb_frames)] if len(v)%self.nb_frames!=0 else v
for v in videos]
# calculate lengths
self.lens = [len(d)//self.nb_frames for d in self.data]
# calculate offsets
self.offsets = np.concatenate(([0], np.cumsum(self.lens[:-1])))
def __getitem__(self, index):
# subtract offset
print('index: {}'.format(index))
# get corresponding video file
found = False
for i, offset in enumerate(self.offsets):
if index < offset:
print('subtracting {} from index'.format(self.offsets[i-1]))
index -= self.offsets[i-1]
index *= self.nb_frames
found = True
break
# handle last video separately
if not found:
index -= self.offsets[-1]
index *= self.nb_frames
i += 1
# select correspondind data
print('selecing video {}'.format(i-1))
data = self.data[i-1]
# get frames
print('reading frames {}'.format([idx for idx in range(index, index+self.nb_frames)]))
x = []
for idx in range(index, index+self.nb_frames):
tmp = data[idx]
if self.transform:
tmp = self.transform(tmp)
x.append(tmp)
x = torch.cat(x)
return x
def __len__(self):
return np.sum(self.lens)
videos = [torch.ones(torch.randint(3, 12, (1,)), 1)*i for i in range(5)]
dataset = MyDataset(videos)
for data in dataset:
print(data)
loader = DataLoader(
dataset,
batch_size=2,
shuffle=True)
for data in loader:
print(data)
The code currently uses nb_frames
consecutive frames for each video folder and removes the trailing frames.
Shuffling using a DataLoader
will work.
I’ve also tried to add some debug print statement for better understanding, but let me know, if your need more information about this code.
Hi @ptrblck, I was wondering how the concept of ordinary dataloaders fit with this particular Dataset. That is, how the ordinary structure of -
train/
classA/
classB/
val/
classA/
classB/
test/
classA/
classB/
fit with this type of loading. Inside a particular train, test or val folder, do we have to further pass the paths of the subfolders corresponding to each of the classes ? (this was automatically detected in the default approach)
I would create separate Datasets
for the train, val, and test folders, so that you could stick yo your current Dataset
implementation and have a clean cut between the data splits to avoid data leakage.
Thanks, that makes much more sense. Should I worry about the labels or will it be handled automatically ?
In my code snippet I’m not handling the labels currently, so you would need to add them for your use case.
Hi @ptrblck ,I noticed in the source here that the __getitem__
method of the image folder class returns a sample, target
pair where sample is the image and target is the target label. In case of my dataset, how do I handle the case of three images having the same label ?
The forward
method of my final model looks like this -
def forward(self,image1,image2,image3):
x1 = self.model1(image1)
x2 = self.model2(image2)
x3 = self.model3(image3)
x4 = torch.cat((x1, x2, x3), dim=1)
x5 = self.classifier1(F.relu(x4))
x6 = self.classifier2(F.relu(x5))
x7 = self.classifier3(F.relu(x6))
x8 = self.logSoftmax(x7)
#print(x6.shape)
return x8
In the train function, the images are fetched like -
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
.....
If I concatenate the three images in one tensor and pass along a single label, does the model understand it’s actually three images and treat it as such ? I feel a bit lost here.
I would rather write a custom Dataset
and return these three images in your __getitem__
method:
def __getitem__(self, index):
# load the images and your labels according to your code logic
...
reutrn image1, image2, image3, target
# or concatenate the images and return them as a single one
return images, target
I’m not sure to understand this question properly.
Your current code snippet will work, but I’m not sure what your use case is.
My use case is that I’m cropping certain portions of frames from a video and storing them in folders (hence the images have to be from the same folder) . Then three images (which are frames of the same video) are passed through three models before the models join at the fc layers.
I was not sure if it would work, thank you.
That should work with your code snippet, however you could alternatively also use a single base model and pass these 3 frames through the same model and concatenate the output afterwards.
Hi @ptrblck,
I tried both return x, target
and return x[0],x[1],x[2],target
and the error I got was -
TypeError Traceback (most recent call last)
<ipython-input-83-e03efae61dea> in <module>
----> 1 model, history_model = train_model(model, criterion, optimizer, exp_lr_scheduler, num_epochs=10)
<ipython-input-47-f9a9be299e22> in train_model(model, criterion, optimizer, scheduler, num_epochs)
34 # track history if only in train
35 with torch.set_grad_enabled(phase == 'train'):
---> 36 outputs = model(inputs)
37 #print(outputs.shape)
38 _, preds = torch.max(outputs, 1)
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
--> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)
TypeError: forward() missing 2 required positional arguments: 'image2' and 'image3'
in both the cases.
Since your forward
definition is defined as:
def forward(self,image1,image2,image3)
you should pass inputs
separately or unwrap it:
outputs = model(*inputs)
# or
outputs = model(inputs[0], inputs[1], inputs[3]) # index in the dimemsion you've concatenated the inputs
Thanks a lot.
Let me try that and get back to you.
For some reason, the size of inputs
in model(*inputs)
is getting dependent on the batch size.
For a batch size of 16, doing model(*inputs)
gives me Expected 3 inputs, got 16
. I must have made a mistake.
My code snippet was probably wrong.
Could you check the shape of inputs
and split it in the right dimension?
E.g. if your inputs have the shape [batch_size, 3, ...]
split it in dim1.
Yes the inputs have the shape - torch.Size([16, 3, 224, 224])
What do I put as the split size or sections
in torch.split( inputs , split_size_or_sections, dim=1 )
?
You could use the following (depending on the expected shape inside forward
):
outputs = model(*x.split(1, dim=1)) # each input will have shape [16, 1, 224, 224]
outputs = model(x[:, 0], x[:, 1], x[:, 2]) # each input will have shape [16, 224, 224]
Using the first option, I get the error
RuntimeError Traceback (most recent call last)
<ipython-input-118-e03efae61dea> in <module>
----> 1 model, history_model = train_model(model, criterion, optimizer, exp_lr_scheduler, num_epochs=10)
<ipython-input-117-f1d1a61a074d> in train_model(model, criterion, optimizer, scheduler, num_epochs)
36 with torch.set_grad_enabled(phase == 'train'):
37 #outputs = model(inputs[:, 0], inputs[:, 1], inputs[:, 2])
---> 38 outputs = model(*inputs.split(1, dim=1))
39 #outputs = model(inputs)
40 #print(outputs.shape)
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
--> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)
<ipython-input-11-f590b21f7c95> in forward(self, image1, image2, image3)
11
12 def forward(self,image1,image2,image3):
---> 13 x1 = self.model1(image1)
14 x2 = self.model2(image2)
15 x3 = self.model3(image3)
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
--> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)
/opt/conda/lib/python3.6/site-packages/torchvision/models/vgg.py in forward(self, x)
40
41 def forward(self, x):
---> 42 x = self.features(x)
43 x = self.avgpool(x)
44 x = x.view(x.size(0), -1)
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
--> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
90 def forward(self, input):
91 for module in self._modules.values():
---> 92 input = module(input)
93 return input
94
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
--> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/conv.py in forward(self, input)
336 _pair(0), self.dilation, self.groups)
337 return F.conv2d(input, self.weight, self.bias, self.stride,
--> 338 self.padding, self.dilation, self.groups)
339
340
RuntimeError: Given groups=1, weight of size 64 3 3 3, expected input[16, 1, 224, 224] to have 3 channels, but got 1 channels instead
This is strange, the torch tensor originally had a size of [16,3,224,224]. Is the problem in the architecture of the model ?
Using the second option I get a similar error where it says the expected tensor is 4d and [16, 224, 224] is 3d.