RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 8 and 4 in dimension 1, but the sizes of my tensors only differ in dim 0?

for feature in features:
    print("feature.shape = {}".format(feature.shape))

Output:

feature.shape = torch.Size([4, 2048, 2, 4])
feature.shape = torch.Size([2, 2048, 2, 4])
feature.shape = torch.Size([3, 2048, 2, 4])
feature.shape = torch.Size([7, 2048, 2, 4])
feature.shape = torch.Size([15, 2048, 2, 4])
feature.shape = torch.Size([6, 2048, 2, 4])
feature.shape = torch.Size([5, 2048, 2, 4])
feature.shape = torch.Size([3, 2048, 2, 4])
feature.shape = torch.Size([3, 2048, 2, 4])
feature.shape = torch.Size([8, 2048, 2, 4])

But when I run
result = torch.stack(features), I get this error:

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 8 and 4 in dimension 1 at C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src\THC/generic/THCTensorMath.cu:71

Anyone knows why?

I’m not sure, what your code is doing, so could you explain a bit, what the dimensions mean?
Are you using nn.DataParallel or DDP?

Ah, sorry about that.

transform = transforms.Compose([
            transforms.ToPILImage(),
            transforms.Resize((240,320)),
            transforms.ToTensor(),
            transforms.Normalize(MEAN, STD)
        ])

for video in videos:
   frames = [transform(frame) for frame in video]
   frames = torch.stack(frames)
   frames = frames.cuda()
   feature = feature_extractor(frames)
   features.append(feature)

I am trying to perform action recognition using CNN, and this the above is part of the Dataset class I am trying to write. Frames are images at a certain point in the video. I am not too sure what the dimensions mean to be honest… But I’m using Resnet-50 as the feature extractor, so that explains the 2048.

Other things I’ve tried:

  • converting the frames to RGB using PIL.Image (shouldn’t be required since the training videos are already in colour)
  • setting batch_size = 1

Both didn’t work…

As for DataParallel and DDP, I’m not too sure what those two are. I just write a Dataset class, get the data using a dataloader and iterate over it.

DataParallel and DistributedDataParallel are ways of leveraging multiple GPUs (and multiple machines for Distributed) to train faster. Pretty practical!

As for your error, what is the type of features? I suspect torch.stack will try to use the first dimension of your ensemble of features, which apparently has shape [10, X, 2048, 2, 4], for stacking. Thus your error at dimension 1, because of the varying sizes. If features is not a list, try to convert it into a list, and torch.stack should work by specifying dim=0 (default).

features is a list! []
I’m a little confused when you say my features have the shape [10, X, 2048, 2, 4], because based on the printed output I wrote in my question, they have the shape [X,2048,2,4] don’t they?

A single feature has shape [X, 2048, 2, 4], but the features list has 10 features, therefore axis 0 has 10 elements…
Can you try torch.stack(features, dim=1) maybe?

Ah I see! Somehow I get the same error except in a different dimension
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 2 and 4 in dimension 0 at C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src\THC/generic/THCTensorMath.cu:71

but I was looking through a Github repo and they did what would be the equivalent of
features.append(torch.mean(feature,0)) in my code. That seems to solve the issue (though I don’t get why), but I get a different error now.