RuntimeError: Expected isFloatingType(grads[i].type().scalarType()) to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

hello,
I am getting this error when I try this code:

source = source - torch.mean(source, dim=1, keepdim=True)
		template = template - torch.mean(template, dim=1, keepdim=True)

		output = model(template, source)
		loss_val = ChamferDistanceLoss()(template, output['transformed_source'])
		print(loss_val)
                #model.named_parameters()
		# forward + backward + optimize
		optimizer.zero_grad()
		loss_val.backward()
		optimizer.step()

print(loss_val)==>tensor(0.2791, device=‘cuda:0’, grad_fn=DivBackward0)
How can I solve this problem?
Thanks in advance,

Hi,

Could you enable anomaly mode to see where it comes from? torch.autograd.set_detect_anomaly(True) at the begining of your script.

Also if you have custom autograd.Function, could you share their code here?

Hi @albanD,
Anomaly mode:<torch.autograd.anomaly_mode.set_detect_anomaly object at 0x7eff1bdba390>
loss_val tensor(0.2113, device=‘cuda:0’, grad_fn=DivBackward0)
/opt/conda/conda-bld/pytorch_1573049304260/work/torch/csrc/autograd/python_anomaly_mode.cpp:40: UserWarning: No forward pass information available. Enable detect anomaly during forward pass for more information.

I don’t understand just why this error.
loss_val:tensor(0.2113, device=‘cuda:0’, grad_fn=DivBackward0)
loss_val.backward() did’nt work

thanks,

Interesting… The errors seems to be that some gradients are not floating point numbers.
Do you have a custom autograd Function in your code?
If not, can you share a small code sample (30 lines) that reproduces the issue)?

Hi @albanD,

def train_one_epoch(device, model, train_loader, optimizer):
	model.train()
	train_loss = 0.0
	pred  = 0.0
	count = 0
	for i, data in enumerate(tqdm(train_loader)):
		template, source, igt = data

		template = template.to(device)
		source = source.to(device)
		igt = igt.to(device)

		# mean substraction
		source = source - torch.mean(source, dim=1, keepdim=True)
		template = template - torch.mean(template, dim=1, keepdim=True)

		output = model(template, source)
		loss_val = ChamferDistanceLoss()(template, output['transformed_source'])
		# print(loss_val.item())

		# forward + backward + optimize
		optimizer.zero_grad()
		loss_val.backward()
		optimizer.step()

		train_loss += loss_val.item()
		count += 1

	train_loss = float(train_loss)/count
	return train_loss

error in loss_val.backward()

Hi,

I am missing few key pieces here:
What is ChamferDistanceLoss ?
What is model ?

Function chamfer_istance

import torch
import torch.nn as nn
import torch.nn.functional as F

def chamfer_distance(template: torch.Tensor, source: torch.Tensor):
	from .cuda.chamfer_distance import ChamferDistance
	cost_p0_p1, cost_p1_p0 = ChamferDistance()(template, source)
	cost_p0_p1 = torch.mean(torch.sqrt(cost_p0_p1))
	cost_p1_p0 = torch.mean(torch.sqrt(cost_p1_p0))
	chamfer_loss = (cost_p0_p1 + cost_p1_p0)/2.0
	return chamfer_loss


class ChamferDistanceLoss(nn.Module):
	def __init__(self):
		super(ChamferDistanceLoss, self).__init__()

	def forward(self, template, source):
		return chamfer_distance(template, source)

file train_pcrnet.py:line 99(loss_val.backward())

And you’re sure that these outputs here: https://github.com/vinits5/pcrnet_pytorch/blob/5ee2a2b5c12faa9c43c4c7ae1e503e098ecd4b46/pcrnet/losses/cuda/chamfer_distance/chamfer_distance.py#L61 are all floats?