PrimeF
(Fabrizio Primerano)
July 17, 2020, 11:25am
1
Hi all,
when trying to use the native amp functionalities I run in the following error:
RuntimeError: Input type (torch.cuda.DoubleTensor) and weight type (torch.cuda.HalfTensor) should be the same
I use the @autocast()
decorator and the GradScaler
class.
PyTorch version is: 1.7.0.dev20200716
Thanks in advance for any hints or help!
1 Like
Could you post a code snippet to reproduce this issue, please?
PrimeF
(Fabrizio Primerano)
July 17, 2020, 11:35am
3
Thanks for the crazy fast reply
Sure, here is a gist with some code (sorry for the many lines):
dataset.py
# Python standard libraries
import os
import glob
# Installed libraries
import numpy as np
from PIL import Image, ImageCms
import torch
from torch.utils.data import DataLoader
from torch.utils.data.dataset import Dataset
This file has been truncated. show original
model.py
# Installed libraries
import torch
import torch.nn as nn
from torch.cuda.amp import autocast
# Custom libraries
from utils import resolve_class
class Model(nn.Module):
This file has been truncated. show original
train.py
def train(data_loader_, model_, optimizer, optim_params, loss_fn, epochs, checkpoint_freq, model_dir_, checkpoint=None):
"""
Trains the model
:param data_loader_: DataLoader to retrieve the training data
:param model_: Model to train
:param optimizer: Optimizer
:param loss_fn: Loss function
:param lr: Learning rate
:param epochs: Number of epochs to train
This file has been truncated. show original
I can post the entire code if you want, I just thought it might be too much code.
Thanks for the code!
Could you post the complete config and upload the kwargs
used in the model?
Also, I assume you are using [batch_size, 3, 224, 224]
shaped inputs?
PrimeF
(Fabrizio Primerano)
July 17, 2020, 11:45am
5
Sure, I updated the gist to include the configs.
The kwargs
are listed in the config and are sorted in the same way they are used in the model.
Exactly that is my input’s shape.
Thanks!
Based on the code it seems that you might be creating numpy arrays for your input data, which uses float64 by default as the dtype
, before transforming them to tensors.
Could you transform the input tensors for float32 via tensor = tensor.float()
before passing them to the model and rerun the code?
PrimeF
(Fabrizio Primerano)
July 17, 2020, 12:18pm
7
Alright, made the changes you suggest, I don’t know if I’m a step forward or still at the same point
Now I get this error:
Traceback (most recent call last):
File "train.py", line 342, in <module>
model = train(data_loader,
File "train.py", line 172, in train
scaler.scale(loss).backward()
File "/sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/tensor.py", line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/autograd/__init__.py", line 125, in backward
Variable._execution_engine.run_backward(
RuntimeError: Found dtype Float but expected Half
Exception raised from compute_types at /pytorch/aten/src/ATen/native/TensorIterator.cpp:183 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f212400e1e2 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/pyt
hon3.8/site-packages/torch/lib/libc10.so)
frame #1: at::TensorIterator::compute_types(at::TensorIteratorConfig const&) + 0x259 (0x7f21603aa429 in /sapmnt/home/D067751/.local/share/virtualenvs/project-
VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #2: at::TensorIterator::build(at::TensorIteratorConfig&) + 0x6b (0x7f21603adbcb in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/py
thon3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #3: at::TensorIterator::TensorIterator(at::TensorIteratorConfig&) + 0xdd (0x7f21603ae23d in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZK
pb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #4: at::native::mse_loss_backward_out(at::Tensor&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long) + 0x18a (0x7f216020f6fa in /sapmnt/hom
e/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0xf11ad0 (0x7f2125388ad0 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/li
b/libtorch_cuda.so)
frame #6: at::native::mse_loss_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, long) + 0x90 (0x7f216020c240 in /sapmnt/home/D067751/.local/s
hare/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0xf11b70 (0x7f2125388b70 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/li
b/libtorch_cuda.so)
frame #8: <unknown function> + 0xf357e6 (0x7f21253ac7e6 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/li
b/libtorch_cuda.so)
frame #9: at::mse_loss_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, long) + 0xfb (0x7f2160710c5b in /sapmnt/home/D067751/.local/share/vir
tualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x26301c9 (0x7f2161b2f1c9 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/
lib/libtorch_cpu.so)
frame #11: <unknown function> + 0xab4cc6 (0x7f215ffb3cc6 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/l
ib/libtorch_cpu.so)
frame #12: at::mse_loss_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, long) + 0xfb (0x7f2160710c5b in /sapmnt/home/D067751/.local/share/vi
rtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #13: torch::autograd::generated::MseLossBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x1af (0x7f2161a7393f in /sapmnt/home/
D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #14: <unknown function> + 0x2b31037 (0x7f2162030037 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/
lib/libtorch_cpu.so)
frame #15: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std
::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x1400 (0x7f216202b880 in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/sit
e-packages/torch/lib/libtorch_cpu.so)
frame #16: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x451 (0x7f216202c421 in /sapmnt/home/D067751/.local/sha
re/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x89 (0x7f2162024599 in /sapmnt/home/D067751
/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #18: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x4a (0x7f217020eb2a in /sapmn
t/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #19: <unknown function> + 0xc70f (0x7f216f8bd70f in /sapmnt/home/D067751/.local/share/virtualenvs/project-VOVyZKpb/lib/python3.8/site-packages/torch/lib
/libtorch.so)
frame #20: <unknown function> + 0x76ba (0x7f2175d116ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #21: clone + 0x6d (0x7f21753374dd in /lib/x86_64-linux-gnu/libc.so.6)
PrimeF
(Fabrizio Primerano)
July 20, 2020, 12:19pm
8
Any more help on this?
Thanks in advance!
I encounter a similar issue.
Previously, my dataset.py had this line: torch.from_numpy(data)
and received RuntimeError: Input type (torch.cuda.DoubleTensor) and weight type (torch.cuda.HalfTensor) should be the same
I’ve changed it to torch.from_numpy(data).float()
and the error went away.
Environment
torch==1.10.1+cu113
torchaudio==0.10.1+cu113
torchinfo==1.5.2
torchvision==0.11.2+cu113