RuntimeError: expected scalar type Float but found Half

I am using torch.cuda.amp for mixed precision.

My forward pass calls many functions with their own forward passes.
I tried to decorate all the forward passes in the subsequent functions with torch.cuda.amp.autocast(enabled=True) but the error persists.

Forward pass:

with torch.cuda.amp.autocast(enabled=True):
                h, chunk, preds, labels = model.forward(batch, alphaSG, device)
                label = labels
                for worker in model.classification_workers:
                    loss = worker.loss_weight * worker.loss(preds[worker.name], label[worker.name])
                    losses[worker.name] = loss
                    tot_loss += loss

                for worker in model.regression_workers:
                    loss = worker.loss_weight * worker.loss(preds[worker.name], label[worker.name])
                    losses[worker.name] = loss
                    tot_loss += loss

Error I am getting:

RuntimeError                              Traceback (most recent call last)
<ipython-input-10-f4ce5bf32b0d> in <module>()
   2327 
   2328             with torch.cuda.amp.autocast(enabled=True):
-> 2329                 h, chunk, preds, labels = model.forward(batch, alphaSG, device)
   2330                 label = labels
   2331                 for worker in model.classification_workers:

10 frames
/usr/local/lib/python3.6/dist-packages/torch/cuda/amp/autocast_mode.py in decorate_autocast(*args, **kwargs)
    133         def decorate_autocast(*args, **kwargs):
    134             with self:
--> 135                 return func(*args, **kwargs)
    136         return decorate_autocast
    137 

<ipython-input-10-f4ce5bf32b0d> in forward(self, x, alpha, device)
   1945             # remove key if it exists
   1946             x_.pop('cchunk', None)
-> 1947         h = self.frontend(x_, device)
   1948         if len(h) > 1:
   1949             assert len(h) == 2, len(h)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    556             result = self._slow_forward(*input, **kwargs)
    557         else:
--> 558             result = self.forward(*input, **kwargs)
    559         for hook in self._forward_hooks.values():
    560             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torch/cuda/amp/autocast_mode.py in decorate_autocast(*args, **kwargs)
    133         def decorate_autocast(*args, **kwargs):
    134             with self:
--> 135                 return func(*args, **kwargs)
    136         return decorate_autocast
    137 

<ipython-input-10-f4ce5bf32b0d> in forward(self, batch, device, mode)
   1827             dskips = []
   1828         for n, block in enumerate(self.blocks):
-> 1829             h = block(h)
   1830             if denseskips and (n + 1) < len(self.blocks):
   1831                 # denseskips happen til the last but one layer

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    556             result = self._slow_forward(*input, **kwargs)
    557         else:
--> 558             result = self.forward(*input, **kwargs)
    559         for hook in self._forward_hooks.values():
    560             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torch/cuda/amp/autocast_mode.py in decorate_autocast(*args, **kwargs)
    133         def decorate_autocast(*args, **kwargs):
    134             with self:
--> 135                 return func(*args, **kwargs)
    136         return decorate_autocast
    137 

<ipython-input-10-f4ce5bf32b0d> in forward(self, x)
   1494                 P = (pad, pad)
   1495             x = F.pad(x, P, mode=self.pad_mode)
-> 1496         h = self.conv(x)
   1497         if hasattr(self, 'norm'):
   1498             h = forward_norm(h, self.norm)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    556             result = self._slow_forward(*input, **kwargs)
    557         else:
--> 558             result = self.forward(*input, **kwargs)
    559         for hook in self._forward_hooks.values():
    560             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torch/cuda/amp/autocast_mode.py in decorate_autocast(*args, **kwargs)
    133         def decorate_autocast(*args, **kwargs):
    134             with self:
--> 135                 return func(*args, **kwargs)
    136         return decorate_autocast
    137 

<ipython-input-10-f4ce5bf32b0d> in forward(self, waveforms)
   1334         band=(high-low)[:,0]
   1335 
-> 1336         f_times_t_low = torch.matmul(low, self.n_)
   1337         f_times_t_high = torch.matmul(high, self.n_)
1 Like

Could you post a code snippet to reproduce this issue, please?

Hi,
Here is the code

this is 115 lines of code. Please let me know if this wouldn’t help.

The error points towards a device mismatch:

RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_mm

amp needs a GPU to run properly, so you would need to call .to('cuda') on the model and input.
Also, you might need to register self.n_ as a buffer via:

self.register_buffer('n_', ...)

so that it will be also pushed to the device.

Your code doesn’t provide the definition of format_frontend_output so that I cannot verify, if it’s working after these fixes.

Thank You!! this solves the error.

Two points -

  1. Where/how did you get the device mismatch error?

  2. self.n_ is already on device, so why use register_buffer again? There are many other variables in the actual code, how do you decide if a parameter should be registered as a buffer or not? Here, in this code, e.g. I have self.window_ along with self.n_; why not register it as buffer as well?

I’m glad it solved the error! :wink:

  1. The error was raised in my environment. If you have executed the script in Colab, the error message might have been lost?

  2. Usually, every tensor, which should be pushed to the same device as the model parameters, but doesn’t require gradients, should be registered as a buffer. On the other hand, if the tensor requires gradients, use nn.Parameter.

1 Like

Can you please help me out with this I am facing a similar problem. Getting the error:
Expected object of scalar type Float but got scalar type Half for argument #2 'mat2' in call to _th_mm

Link to my problem: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_mm

Thanks!