Video Classification using UCF-101 dataset

I am trying to use video classifcation from torchvision models. The official code uses kinetics dataset however when I try to use UCF-101 dataset I am getting these runtime errors. Link to the train code:

Thanks in advace

torch              1.12.0+cu113
torchaudio         0.12.0+cu113
torchvision        0.13.0+cu113


Traceback (most recent call last):
  File "/home/pyler/PycharmProjects/res-ufc101/", line 388, in <module>
  File "/home/pyler/PycharmProjects/res-ufc101/", line 287, in main
    train_one_epoch(model, criterion, optimizer, lr_scheduler, data_loader, device, epoch, args.print_freq, scaler)
  File "/home/pyler/PycharmProjects/res-ufc101/", line 24, in train_one_epoch
    for video, target in metric_logger.log_every(data_loader, print_freq, header):
  File "/home/pyler/PycharmProjects/res-ufc101/", line 127, in log_every
    for obj in iterable:
  File "/home/pyler/Python/envs/torch/lib/python3.8/site-packages/torch/utils/data/", line 652, in __next__
    data = self._next_data()
  File "/home/pyler/Python/envs/torch/lib/python3.8/site-packages/torch/utils/data/", line 1347, in _next_data
    return self._process_data(data)
  File "/home/pyler/Python/envs/torch/lib/python3.8/site-packages/torch/utils/data/", line 1373, in _process_data
  File "/home/pyler/Python/envs/torch/lib/python3.8/site-packages/torch/", line 461, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/pyler/Python/envs/torch/lib/python3.8/site-packages/torch/utils/data/_utils/", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/pyler/Python/envs/torch/lib/python3.8/site-packages/torch/utils/data/_utils/", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/pyler/Python/envs/torch/lib/python3.8/site-packages/torch/utils/data/_utils/", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/pyler/Python/envs/torch/lib/python3.8/site-packages/torchvision/datasets/", line 128, in __getitem__
    video = self.transform(video)
  File "/home/pyler/PycharmProjects/res-ufc101/", line 26, in __call__
    return self.transforms(x)
  File "/home/pyler/Python/envs/torch/lib/python3.8/site-packages/torchvision/transforms/", line 94, in __call__
    img = t(img)
  File "/home/pyler/Python/envs/torch/lib/python3.8/site-packages/torch/nn/modules/", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pyler/Python/envs/torch/lib/python3.8/site-packages/torchvision/transforms/", line 269, in forward
    return F.normalize(tensor, self.mean, self.std, self.inplace)
  File "/home/pyler/Python/envs/torch/lib/python3.8/site-packages/torchvision/transforms/", line 360, in normalize
    return F_t.normalize(tensor, mean=mean, std=std, inplace=inplace)
  File "/home/pyler/Python/envs/torch/lib/python3.8/site-packages/torchvision/transforms/", line 959, in normalize
RuntimeError: The size of tensor a (240) must match the size of tensor b (3) at non-singleton dimension 1

Based on this error:

RuntimeError: The size of tensor a (240) must match the size of tensor b (3) at non-singleton dimension 1

I would guess you might be passing the input tensors in a channels-last format while channels first [batch_size, channels, height, width] is expected. Could you check if this is the case?

Thanks @ptrblck for your answer but it could not resolve my issue.
I only changed the dataset and some of its parameters. But the same problem still exists. Original code for kinetics dataset:

        dataset = torchvision.datasets.Kinetics(

I changed it to the following:

dataset = torchvision.datasets.UCF101(

Print the shape of x in line 26 in /home/pyler/PycharmProjects/res-ufc101/ and check what it’s returning. I would still guess that the memory format might be wrong and thus the transformation fails.

Yes, the problem is coming that transformation x = {Tensor: (16, 240, 320, 3)}}
. However I specified the output_format for the dataset as TCHW while creating a dataset but did not work.

I don’t think the output_format would fix the issue, as the transformation is expected to work on [T, H, W, C] frames as seen in the docs:

transform (callable , optional) – A function/transform that takes in a TxHxWxC video and returns a transformed version.

If your transformation doesn’t support it, you could permute the data inside the transform via:

transforms.Lambda(lambda x: x.permute(0, 3, 1, 2)),
1 Like

Thank you so much, it worked for me.