[Object Detection via VGG Network] TypeError: conv2d() received an invalid combination of arguments

Hi, everyone!
I am working on an object detection project using the VGG network in the PASCAL VOC dataset. I used custom dataset loading to load the PASCAL VOC dataset. And coded network from scratch. (They are similar to PyTorch’s Vision method).

Currently, I’m getting the following error:

Traceback (most recent call last):
  File "/home/khushi/Documents/deep-learning/benchmarking-deep-neural-networks/vgg/main.py", line 59, in <module>
    main()
  File "/home/khushi/Documents/deep-learning/benchmarking-deep-neural-networks/vgg/main.py", line 55, in main
    train(data, model, num_epochs, criteria, optimizer)
  File "/home/khushi/Documents/deep-learning/benchmarking-deep-neural-networks/vgg/main.py", line 21, in train
    outputs = model(image)
  File "/home/khushi/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/khushi/Documents/deep-learning/benchmarking-deep-neural-networks/vgg/vgg_torch.py", line 39, in forward
    x = self.features(x)
  File "/home/khushi/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/khushi/.local/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/home/khushi/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/khushi/.local/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 446, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/khushi/.local/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
TypeError: conv2d() received an invalid combination of arguments - got (Image, Parameter, Parameter, tuple, tuple, tuple, int), but expected one of:
 * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups)
      didn't match because some of the arguments have invalid types: (Image, Parameter, Parameter, tuple, tuple, tuple, int)
 * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, str padding, tuple of ints dilation, int groups)
      didn't match because some of the arguments have invalid types: (Image, Parameter, Parameter, tuple, tuple, tuple, int)

The code I am implementing:

import vgg_torch
import voc_loader

import time
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms

num_epochs = 5
learning_rate = 0.01

# https://github.com/khushi-411/tutorials/pytorch
def train(data, model, num_epochs, criteria, optimizer):
    steps = len(data)
    for epochs in range(num_epochs):
        for i, (image, target) in enumerate(data):
            # forward pass
            # https://stackoverflow.com/questions/57237381
            #outputs = model(image[None, ...])
            outputs = model(image)
            loss = criteria(outputs, target)

            # backward pass and optimization
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            if (i+1) % 100 == 0:
                    print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' .format(epochs+1, num_epochs, i+1, steps, loss.item()))

def main():
    # give absolute path to dataset
    # https://stackoverflow.com/questions/56741108
    data = voc_loader.VOCDetection('/home/khushi/Documents/deep-learning/datasets/pascal-voc/')
    """, 
            transform=transforms.Compose([
                transforms.ToTensor(),
                transforms.CenterCrop(224),
                transforms.ToTensor(),
                transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
            ])
        )
    """
    # Load model: vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, vgg19_bn
    model = vgg_torch.vgg11()
    print(model)
    
    # Loss function and optimizer
    criteria = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

    # training
    start = time.time()
    train(data, model, num_epochs, criteria, optimizer)
    print("Total time taken to train: ", time.time() - start)

if __name__ == "__main__":
    main()

Dependencies

  • PyTorch: 1.10.0+cu102
  • OS: Manjaro Distro
  • RAM: 16GB

Will anyone please help me out to resolve this error? Thanks!

Based on the error message you are trying to pass aPIL.Image to the model while a tensor is expected.
Add the transformation back (or at least transform the inputs to a tensor) and it should work.

Hi, @ptrblck,
Thanks for responding back.
I did try to change the Image format to Tensor input. I got the following error:

Traceback (most recent call last):
  File "/home/khushi/Documents/deep-learning/benchmarking-deep-neural-networks/vgg/main.py", line 58, in <module>
    main()
  File "/home/khushi/Documents/deep-learning/benchmarking-deep-neural-networks/vgg/main.py", line 54, in main
    train(data, model, num_epochs, criteria, optimizer)
  File "/home/khushi/Documents/deep-learning/benchmarking-deep-neural-networks/vgg/main.py", line 17, in train
    for i, (image, target) in enumerate(data):
  File "/home/khushi/Documents/deep-learning/benchmarking-deep-neural-networks/vgg/voc_loader.py", line 85, in __getitem__
    img, target = self.transforms(img, target)
  File "/home/khushi/.local/lib/python3.9/site-packages/torchvision/datasets/vision.py", line 93, in __call__
    input = self.transform(input)
  File "/home/khushi/.local/lib/python3.9/site-packages/torchvision/transforms/transforms.py", line 61, in __call__
    img = t(img)
  File "/home/khushi/.local/lib/python3.9/site-packages/torchvision/transforms/transforms.py", line 98, in __call__
    return F.to_tensor(pic)
  File "/home/khushi/.local/lib/python3.9/site-packages/torchvision/transforms/functional.py", line 114, in to_tensor
    raise TypeError('pic should be PIL Image or ndarray. Got {}'.format(type(pic)))
TypeError: pic should be PIL Image or ndarray. Got <class 'torch.Tensor'>

The transformation block, that I added is:

data = voc_loader.VOCDetection('/home/khushi/Documents/deep-learning/datasets/pascal-voc/', 
            transform=transforms.Compose([
                transforms.ToTensor(),
                transforms.CenterCrop(224),
                transforms.ToTensor(),
                transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
            ])
        )

Is there any other mistake? Thanks again

Your current transformation contains two ToTensor() transforms, so you might need to remove one of them.

1 Like

@ptrblck, I did try that. Got the following error:

TypeError: cross_entropy_loss(): argument 'target' (position 2) must be Tensor, not dict

According to it, we have to convert the target (dtype dict) to type Tensor. But the question is, how to convert the dict obj to Tensor type?

Another interesting thing I noticed. I got the output tensor value as:

([[ 0.3085, -0.0681,  0.1395, -0.2118, -0.2023, -0.1892, -0.0647, -0.0718,
         -0.1249, -0.0592,  0.0486,  0.0403,  0.0439, -0.0640, -0.0333, -0.1105,
          0.0386, -0.2595, -0.1005, -0.0038]], grad_fn=<AddmmBackward0>)

And the target value is:

{'annotation': {'folder': 'VOC2012', 'filename': '2008_000008.jpg', 'source': {'database': 'The VOC2008 Database', 'annotation': 'PASCAL VOC2008', 'image': 'flickr'}, 'size': {'width': '500', 'height': '442', 'depth': '3'}, 'segmented': '0', 'object': [{'name': 'horse', 'pose': 'Left', 'truncated': '0', 'occluded': '1', 'bndbox': {'xmin': '53', 'ymin': '87', 'xmax': '471', 'ymax': '420'}, 'difficult': '0'}, {'name': 'person', 'pose': 'Unspecified', 'truncated': '1', 'occluded': '0', 'bndbox': {'xmin': '158', 'ymin': '44', 'xmax': '289', 'ymax': '167'}, 'difficult': '0'}]}}

How shall I compute the loss function?

The error is raised as you are trying to pass the dict to the loss function, which expects tensors.
I don’t know where your current model is coming from but I would guess that you might need to pass the different model outputs and corresponding targets to separate loss functions.

1 Like