Native CUDA amp input and weight dtype mismatch

PrimeF · July 17, 2020, 11:25am

Hi all,

when trying to use the native amp functionalities I run in the following error:

RuntimeError: Input type (torch.cuda.DoubleTensor) and weight type (torch.cuda.HalfTensor) should be the same

I use the @autocast() decorator and the GradScaler class.
PyTorch version is: 1.7.0.dev20200716

Thanks in advance for any hints or help!

ptrblck · July 17, 2020, 11:26am

Could you post a code snippet to reproduce this issue, please?

PrimeF · July 17, 2020, 11:35am

Thanks for the crazy fast reply

Sure, here is a gist with some code (sorry for the many lines):

gist.github.com

https://gist.github.com/PrimeF/2f49e6ffb5f0be279250907aca886220

dataset.py

# Python standard libraries
import os
import glob

# Installed libraries
import numpy as np
from PIL import Image, ImageCms
import torch
from torch.utils.data import DataLoader
from torch.utils.data.dataset import Dataset

This file has been truncated. show original

model.py

# Installed libraries
import torch
import torch.nn as nn
from torch.cuda.amp import autocast

# Custom libraries
from utils import resolve_class


class Model(nn.Module):

This file has been truncated. show original

train.py

def train(data_loader_, model_, optimizer, optim_params, loss_fn, epochs, checkpoint_freq, model_dir_, checkpoint=None):
    """
    Trains the model

    :param data_loader_: DataLoader to retrieve the training data
    :param model_: Model to train
    :param optimizer: Optimizer
    :param loss_fn: Loss function
    :param lr: Learning rate
    :param epochs: Number of epochs to train

This file has been truncated. show original

I can post the entire code if you want, I just thought it might be too much code.

ptrblck · July 17, 2020, 11:37am

Thanks for the code!
Could you post the complete config and upload the kwargs used in the model?
Also, I assume you are using [batch_size, 3, 224, 224] shaped inputs?

PrimeF · July 17, 2020, 11:45am

Sure, I updated the gist to include the configs.
The kwargs are listed in the config and are sorted in the same way they are used in the model.

Exactly that is my input’s shape.

ptrblck · July 17, 2020, 12:11pm

Thanks!
Based on the code it seems that you might be creating numpy arrays for your input data, which uses float64 by default as the dtype, before transforming them to tensors.
Could you transform the input tensors for float32 via tensor = tensor.float() before passing them to the model and rerun the code?

PrimeF · July 17, 2020, 12:18pm

Alright, made the changes you suggest, I don’t know if I’m a step forward or still at the same point

Now I get this error:

Traceback (most recent call last):
  File "train.py", line 342, in <module>
    model = train(data_loader,
  File "train.py", line 172, in train
    scaler.scale(loss).backward()
  File "/sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/autograd/__init__.py", line 125, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Found dtype Float but expected Half
Exception raised from compute_types at /pytorch/aten/src/ATen/native/TensorIterator.cpp:183 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f212400e1e2 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/pyt
hon3.8/site-packages/torch/lib/libc10.so)
frame #1: at::TensorIterator::compute_types(at::TensorIteratorConfig const&) + 0x259 (0x7f21603aa429 in /sapmnt/home/D067751/.local/share/virtualenvs/project-
VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #2: at::TensorIterator::build(at::TensorIteratorConfig&) + 0x6b (0x7f21603adbcb in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/py
thon3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #3: at::TensorIterator::TensorIterator(at::TensorIteratorConfig&) + 0xdd (0x7f21603ae23d in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZK
pb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #4: at::native::mse_loss_backward_out(at::Tensor&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long) + 0x18a (0x7f216020f6fa in /sapmnt/hom
e/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0xf11ad0 (0x7f2125388ad0 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/li
b/libtorch_cuda.so)
frame #6: at::native::mse_loss_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, long) + 0x90 (0x7f216020c240 in /sapmnt/home/D067751/.local/s
hare/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0xf11b70 (0x7f2125388b70 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/li
b/libtorch_cuda.so)
frame #8: <unknown function> + 0xf357e6 (0x7f21253ac7e6 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/li
b/libtorch_cuda.so)
frame #9: at::mse_loss_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, long) + 0xfb (0x7f2160710c5b in /sapmnt/home/D067751/.local/share/vir
tualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x26301c9 (0x7f2161b2f1c9 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/
lib/libtorch_cpu.so)
frame #11: <unknown function> + 0xab4cc6 (0x7f215ffb3cc6 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/l
ib/libtorch_cpu.so)
frame #12: at::mse_loss_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, long) + 0xfb (0x7f2160710c5b in /sapmnt/home/D067751/.local/share/vi
rtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #13: torch::autograd::generated::MseLossBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x1af (0x7f2161a7393f in /sapmnt/home/
D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #14: <unknown function> + 0x2b31037 (0x7f2162030037 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/
lib/libtorch_cpu.so)
frame #15: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std
::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x1400 (0x7f216202b880 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/sit
e-packages/torch/lib/libtorch_cpu.so)
frame #16: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x451 (0x7f216202c421 in /sapmnt/home/D067751/.local/sha
re/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x89 (0x7f2162024599 in /sapmnt/home/D067751
/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #18: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x4a (0x7f217020eb2a in /sapmn
t/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #19: <unknown function> + 0xc70f (0x7f216f8bd70f in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib
/libtorch.so)
frame #20: <unknown function> + 0x76ba (0x7f2175d116ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #21: clone + 0x6d (0x7f21753374dd in /lib/x86_64-linux-gnu/libc.so.6)

PrimeF · July 20, 2020, 12:19pm

Any more help on this?
Thanks in advance!

linpershey · January 2, 2022, 10:06am

I encounter a similar issue.

Previously, my dataset.py had this line: torch.from_numpy(data) and received RuntimeError: Input type (torch.cuda.DoubleTensor) and weight type (torch.cuda.HalfTensor) should be the same

I’ve changed it to torch.from_numpy(data).float() and the error went away.

Environment

torch==1.10.1+cu113
torchaudio==0.10.1+cu113
torchinfo==1.5.2
torchvision==0.11.2+cu113