RuntimeError: expected scalar type Float but found Half in deform_conv2d

jsetty · December 3, 2021, 10:42pm

Hi, I have a huge network with some Deformable CNNs (torch.ops.torchvision.deform_conv2d) in it. I am using apex 0.1, Cuda 11, and torch 1.11. I have set the optimization level to O1 for mixed-precision training.

I have used nn.DataParallel and also cast the model to GPU properly. Unfortunately, I have this error. Could you please help me fix this?

File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "<path>/model/dcn.py", line 56, in forward
    x = torchvision.ops.deform_conv2d(input=x,
  File "/opt/conda/lib/python3.8/site-packages/torchvision/ops/deform_conv.py", line 89, in deform_conv2d
    return torch.ops.torchvision.deform_conv2d(
RuntimeError: expected scalar type Float but found Half

Thanks!

ptrblck · December 3, 2021, 11:28pm

apex.amp is deprecated so please use the native mixed-precision training util. via torch.cuda.amp. You can find examples here.

jsetty · December 4, 2021, 12:12am

Hi @ptrblck,

Thanks for your inputs. I moved to torch.cuda.amp.

I am using scaler = torch.cuda.amp.GradScaler().

I am doing the casting as following:

with torch.cuda.amp.autocast():
    preds = model(inputs)
    loss = criterion(preds, labels.float())
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

But I have an error in a different module for torch.einsum().

Traceback (most recent call last):
  File "main.py", line 333, in <module>
    main()
  File "main.py", line 327, in main
    train(model, train_loader, test_loader, args, criterion, optimizer, device)
  File "main.py", line 229, in train
    scaler.scale(loss).backward()
  File "/opt/conda/lib/python3.8/site-packages/torch/_tensor.py", line 352, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/function.py", line 199, in apply
    return user_fn(self, *args)
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/function.py", line 340, in wrapper
    outputs = fn(ctx, *args)
  File "<path>/model/pac.py", line 179, in backward
    grad_in_mul_k = torch.einsum('iomn,ojkl->ijklmn', (grad_output, weight))
  File "/opt/conda/lib/python3.8/site-packages/torch/functional.py", line 325, in einsum
    return einsum(equation, *_operands)
  File "/opt/conda/lib/python3.8/site-packages/torch/functional.py", line 327, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
RuntimeError: expected scalar type Half but found Float

Please take a look.

Thanks!

ptrblck · December 4, 2021, 1:26am

I guess you might be using an older torchvision release, since the operation is available via torchvision.ops.DeformConv2d and this PR added autocast support ~1 year ago.

This code snippet works fine for me:

from torchvision import ops

deform_layer = ops.DeformConv2d(in_channels=3, out_channels=64, kernel_size=3).cuda()
x = torch.randn((1,3,9,9)).cuda()
offset = torch.randn(1,3*3*2,7,7).cuda()
with torch.cuda.amp.autocast():
    out = deform_layer(x, offset)