RuntimeError: expected type torch.cuda.FloatTensor but got torch.FloatTensor

aast226 · December 18, 2019, 11:55pm

I am getting the error:
RuntimeError: expected type torch.cuda.FloatTensor but got torch.FloatTensor
when running:
scores, classification, transformed_anchors = retinanet(batch)

See code below:

import torch
import model
from torchvision import transforms
from PIL import Image

torch.set_default_tensor_type(torch.cuda.FloatTensor)

def image_loader(loader, image_name):
    image = Image.open(image_name)
    image = loader(image).float()
    image = image.unsqueeze(0)
    return image

def main():

    retinanet = model.resnet50(num_classes=80, pretrained=True, device='cuda:0')
    state_dict_path = 'coco_resnet_50_map_0_335_state_dict.pt'
    retinanet.load_state_dict(torch.load(state_dict_path))
    retinanet = retinanet.cuda()
    retinanet.eval()

    for name, param in retinanet.named_parameters():
        if param.device.type != 'cuda':
            print('param {}, not on GPU'.format(name))

    image_path = 'image.jpg'

    data_transforms = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor()
    ])

    batch = image_loader(data_transforms, image_path).cuda()
    print(batch.type())
    print(batch.device)

    with torch.no_grad():
        scores, classification, transformed_anchors = retinanet(batch)

if __name__ == '__main__':
    main()

There is no output when running:

    for name, param in retinanet.named_parameters():
        if param.device.type != 'cuda':
            print('param {}, not on GPU'.format(name))

Implying that every parameter has device type ‘cuda’

And the output from running:

    print(batch.type())
    print(batch.device)

Is:

torch.cuda.FloatTensor
cuda:0

Given that the model and the input tensor are both on the GPU, I’m not sure why I getting this error.
(Note that import model refers to the file https://github.com/AljoSt/pytorch-retinanet/blob/replace_nms/model.py from the repo https://github.com/AljoSt/pytorch-retinanet/tree/replace_nms)

ptrblck · December 19, 2019, 12:49am

Could you rerun the code with CUDA_LAUNCH_BLOCKING=1 python script.py args and post the stack trace here, please?
I cannot find any obvious error, so I hope the error might point to the line of code causing this issue.

aast226 · December 19, 2019, 1:11am

Here is the stack trace

/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:129: UserWarning: nn.Upsample is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.{} is deprecated. Use nn.functional.interpolate instead.".format(self.name))
Traceback (most recent call last):
  File "min.py", line 41, in <module>
    main()
  File "min.py", line 38, in main
    scores, classification, transformed_anchors = retinanet(batch)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nvidia/projects/pytorch-retinanet/model.py", line 268, in forward
    transformed_anchors = self.regressBoxes(anchors, regression)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nvidia/projects/pytorch-retinanet/utils.py", line 105, in forward
    pred_ctr_x = ctr_x + dx * widths
RuntimeError: expected type torch.cuda.FloatTensor but got torch.FloatTensor

ptrblck · December 19, 2019, 2:07am

Thanks for the information.
Could you add a print statement to BBoxTransform and check the device of these tensors?
I would assume that self.mean and self.std are not pushed to the device correctly (line of code), as they are not registered as a buffer or parameter.

aast226 · December 19, 2019, 3:35am

I added the lines:

print(f'Mean is of type: {self.mean.type()}')
print(f'std is of type: {self.std.type()}')

And the output was:

Mean is of type: torch.FloatTensor
std is of type: torch.FloatTensor

So it confirms that they were not the correct type. I also added the following two lines:

self.std = self.std.cuda()
self.mean = self.mean.cuda()

Which yielded the output:

Mean is of type: torch.cuda.FloatTensor
std is of type: torch.cuda.FloatTensor

But, when I run the program I get the same error message as before. I went through and checked to see if any other variables were the wrong type and found that ctr_x, ctr_y, widths, and heights were also not the correct type. By changing the device for these variables I was able to run inference.

ptrblck · December 19, 2019, 4:00am

Good to hear it’s working!
However, I couldn’t find related issues in the repository and would assume this issue would be common.

aast226 · December 19, 2019, 4:42pm

That’s true, before I posted this I also looked through that repo for related issues and couldn’t find any. I tested this on multiple devices running different versions of pytorch and got effectively the same error on each device.