Converting model to torch.float16

I converted my 3D training data to float16 for memory issues, but now there is an error:
RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same
How can I convert the model to torch.float16 (or torch.half) before I start training?

The full error stack looks like this:

pred, info = model.update(imgs, gt, dataset, learning_rate, training=True)
  File "/home/hamid/Desktop/OpticalFlow/FlowSciVis/Flow-3D/model/RIFE.py", line 126, in update
    flow, mask, merged, flow_teacher, merged_teacher, loss_distill = self.flownet(torch.cat((imgs, gt), 1), scale=[1, 1, 1])
  File "/home/hamid/miniconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/hamid/miniconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 511, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/hamid/miniconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/hamid/Desktop/OpticalFlow/FlowSciVis/Flow-3D/model/IFNet.py", line 177, in forward
    flow, mask = stu[i](torch.cat((img0, img1), 1), None, scale=scale[i]) # stu[0]
  File "/home/hamid/miniconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/hamid/Desktop/OpticalFlow/FlowSciVis/Flow-3D/model/IFNet.py", line 94, in forward
    x = self.conv0(x)
  File "/home/hamid/miniconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/hamid/miniconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/hamid/miniconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/hamid/miniconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/hamid/miniconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/hamid/miniconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 567, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same

If you want to use “pure” float16 training, you would have to call model.half() to transform all parameters and buffers to float16, too.
We generally recommend using torch.cuda.amp for mixed-precision training as it will be more stable than a pure float16 training.

1 Like

Thanks @ptrblck, that solved the previous issue. Now I am starting the training with with torch.cuda.amp.autocast(True). I noticed however that the training takes more time per each iteration, could that be related to autocast mode? I am using NVIDIA TITAN V with CUDA Version 11.2.

This should not be the case. Could you post your model definition as well as the input shapes here, please?

I checked again and found that it is actually not the time per iteration that increased but the time between epochs. That is, the beginning of each new epoch takes increasingly more time and eventually slows down the training. Not sure why this happens, I have 3D volumetric data of size (128, 128, 128) with 4 channels (density and velocities) and I’m using the batch size of 45 at the moment, after I enabled the autocast mode. My model consists of 3 blocks with several convolutional layers per each and PReLU activations. I recently added batch normalization to stabilize training after activations but my loss increases at some point at the beginning of the training and becomes nan, I’ll need to find why.