Native CUDA amp input and weight dtype mismatch

Hi all,

when trying to use the native amp functionalities I run in the following error:

RuntimeError: Input type (torch.cuda.DoubleTensor) and weight type (torch.cuda.HalfTensor) should be the same

I use the @autocast() decorator and the GradScaler class.
PyTorch version is: 1.7.0.dev20200716

Thanks in advance for any hints or help!

Could you post a code snippet to reproduce this issue, please? :slight_smile:

Thanks for the crazy fast reply :slight_smile:

Sure, here is a gist with some code (sorry for the many lines):

I can post the entire code if you want, I just thought it might be too much code.

Thanks for the code!
Could you post the complete config and upload the kwargs used in the model?
Also, I assume you are using [batch_size, 3, 224, 224] shaped inputs?

Sure, I updated the gist to include the configs.
The kwargs are listed in the config and are sorted in the same way they are used in the model.

Exactly that is my input’s shape.

Thanks!
Based on the code it seems that you might be creating numpy arrays for your input data, which uses float64 by default as the dtype, before transforming them to tensors.
Could you transform the input tensors for float32 via tensor = tensor.float() before passing them to the model and rerun the code?

Alright, made the changes you suggest, I don’t know if I’m a step forward or still at the same point :slight_smile:

Now I get this error:

Traceback (most recent call last):
  File "train.py", line 342, in <module>
    model = train(data_loader,
  File "train.py", line 172, in train
    scaler.scale(loss).backward()
  File "/sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/autograd/__init__.py", line 125, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Found dtype Float but expected Half
Exception raised from compute_types at /pytorch/aten/src/ATen/native/TensorIterator.cpp:183 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f212400e1e2 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/pyt
hon3.8/site-packages/torch/lib/libc10.so)
frame #1: at::TensorIterator::compute_types(at::TensorIteratorConfig const&) + 0x259 (0x7f21603aa429 in /sapmnt/home/D067751/.local/share/virtualenvs/project-
VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #2: at::TensorIterator::build(at::TensorIteratorConfig&) + 0x6b (0x7f21603adbcb in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/py
thon3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #3: at::TensorIterator::TensorIterator(at::TensorIteratorConfig&) + 0xdd (0x7f21603ae23d in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZK
pb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #4: at::native::mse_loss_backward_out(at::Tensor&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long) + 0x18a (0x7f216020f6fa in /sapmnt/hom
e/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0xf11ad0 (0x7f2125388ad0 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/li
b/libtorch_cuda.so)
frame #6: at::native::mse_loss_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, long) + 0x90 (0x7f216020c240 in /sapmnt/home/D067751/.local/s
hare/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0xf11b70 (0x7f2125388b70 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/li
b/libtorch_cuda.so)
frame #8: <unknown function> + 0xf357e6 (0x7f21253ac7e6 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/li
b/libtorch_cuda.so)
frame #9: at::mse_loss_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, long) + 0xfb (0x7f2160710c5b in /sapmnt/home/D067751/.local/share/vir
tualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x26301c9 (0x7f2161b2f1c9 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/
lib/libtorch_cpu.so)
frame #11: <unknown function> + 0xab4cc6 (0x7f215ffb3cc6 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/l
ib/libtorch_cpu.so)
frame #12: at::mse_loss_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, long) + 0xfb (0x7f2160710c5b in /sapmnt/home/D067751/.local/share/vi
rtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #13: torch::autograd::generated::MseLossBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x1af (0x7f2161a7393f in /sapmnt/home/
D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #14: <unknown function> + 0x2b31037 (0x7f2162030037 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/
lib/libtorch_cpu.so)
frame #15: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std
::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x1400 (0x7f216202b880 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/sit
e-packages/torch/lib/libtorch_cpu.so)
frame #16: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x451 (0x7f216202c421 in /sapmnt/home/D067751/.local/sha
re/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x89 (0x7f2162024599 in /sapmnt/home/D067751
/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #18: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x4a (0x7f217020eb2a in /sapmn
t/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #19: <unknown function> + 0xc70f (0x7f216f8bd70f in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib
/libtorch.so)
frame #20: <unknown function> + 0x76ba (0x7f2175d116ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #21: clone + 0x6d (0x7f21753374dd in /lib/x86_64-linux-gnu/libc.so.6)

Any more help on this?
Thanks in advance!

I encounter a similar issue.

Previously, my dataset.py had this line: torch.from_numpy(data) and received RuntimeError: Input type (torch.cuda.DoubleTensor) and weight type (torch.cuda.HalfTensor) should be the same

I’ve changed it to torch.from_numpy(data).float() and the error went away.

Environment

  • torch==1.10.1+cu113
  • torchaudio==0.10.1+cu113
  • torchinfo==1.5.2
  • torchvision==0.11.2+cu113