RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 8 and 4 in dimension 1, but the sizes of my tensors only differ in dim 0?

H_C · December 11, 2019, 3:57am

for feature in features:
    print("feature.shape = {}".format(feature.shape))

Output:

feature.shape = torch.Size([4, 2048, 2, 4])
feature.shape = torch.Size([2, 2048, 2, 4])
feature.shape = torch.Size([3, 2048, 2, 4])
feature.shape = torch.Size([7, 2048, 2, 4])
feature.shape = torch.Size([15, 2048, 2, 4])
feature.shape = torch.Size([6, 2048, 2, 4])
feature.shape = torch.Size([5, 2048, 2, 4])
feature.shape = torch.Size([3, 2048, 2, 4])
feature.shape = torch.Size([3, 2048, 2, 4])
feature.shape = torch.Size([8, 2048, 2, 4])

But when I run
result = torch.stack(features), I get this error:

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 8 and 4 in dimension 1 at C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src\THC/generic/THCTensorMath.cu:71

Anyone knows why?

ptrblck · December 11, 2019, 4:45am

I’m not sure, what your code is doing, so could you explain a bit, what the dimensions mean?
Are you using nn.DataParallel or DDP?

H_C · December 11, 2019, 5:18am

Ah, sorry about that.

transform = transforms.Compose([
            transforms.ToPILImage(),
            transforms.Resize((240,320)),
            transforms.ToTensor(),
            transforms.Normalize(MEAN, STD)
        ])

for video in videos:
   frames = [transform(frame) for frame in video]
   frames = torch.stack(frames)
   frames = frames.cuda()
   feature = feature_extractor(frames)
   features.append(feature)

I am trying to perform action recognition using CNN, and this the above is part of the Dataset class I am trying to write. Frames are images at a certain point in the video. I am not too sure what the dimensions mean to be honest… But I’m using Resnet-50 as the feature extractor, so that explains the 2048.

Other things I’ve tried:

converting the frames to RGB using PIL.Image (shouldn’t be required since the training videos are already in colour)
setting batch_size = 1

Both didn’t work…

As for DataParallel and DDP, I’m not too sure what those two are. I just write a Dataset class, get the data using a dataloader and iterate over it.

alex.veuthey · December 11, 2019, 7:38am

DataParallel and DistributedDataParallel are ways of leveraging multiple GPUs (and multiple machines for Distributed) to train faster. Pretty practical!

As for your error, what is the type of features? I suspect torch.stack will try to use the first dimension of your ensemble of features, which apparently has shape [10, X, 2048, 2, 4], for stacking. Thus your error at dimension 1, because of the varying sizes. If features is not a list, try to convert it into a list, and torch.stack should work by specifying dim=0 (default).

H_C · December 11, 2019, 10:21am

features is a list! []
I’m a little confused when you say my features have the shape [10, X, 2048, 2, 4], because based on the printed output I wrote in my question, they have the shape [X,2048,2,4] don’t they?

alex.veuthey · December 11, 2019, 10:26am

A single feature has shape [X, 2048, 2, 4], but the features list has 10 features, therefore axis 0 has 10 elements…
Can you try torch.stack(features, dim=1) maybe?

H_C · December 11, 2019, 10:32am

Ah I see! Somehow I get the same error except in a different dimension
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 2 and 4 in dimension 0 at C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src\THC/generic/THCTensorMath.cu:71

but I was looking through a Github repo and they did what would be the equivalent of
features.append(torch.mean(feature,0)) in my code. That seems to solve the issue (though I don’t get why), but I get a different error now.