abd
(gues)
May 14, 2020, 3:02pm
1
hello,
I am getting this error when I try this code:
source = source - torch.mean(source, dim=1, keepdim=True)
template = template - torch.mean(template, dim=1, keepdim=True)
output = model(template, source)
loss_val = ChamferDistanceLoss()(template, output['transformed_source'])
print(loss_val)
#model.named_parameters()
# forward + backward + optimize
optimizer.zero_grad()
loss_val.backward()
optimizer.step()
print(loss_val)==>tensor(0.2791, device=‘cuda:0’, grad_fn=DivBackward0)
How can I solve this problem?
Thanks in advance,
albanD
(Alban D)
May 14, 2020, 3:32pm
2
Hi,
Could you enable anomaly mode to see where it comes from? torch.autograd.set_detect_anomaly(True)
at the begining of your script.
Also if you have custom autograd.Function, could you share their code here?
abd
(gues)
May 16, 2020, 11:43pm
3
Hi @albanD ,
Anomaly mode:<torch.autograd.anomaly_mode.set_detect_anomaly object at 0x7eff1bdba390>
loss_val tensor(0.2113, device=‘cuda:0’, grad_fn=DivBackward0)
/opt/conda/conda-bld/pytorch_1573049304260/work/torch/csrc/autograd/python_anomaly_mode.cpp:40: UserWarning: No forward pass information available. Enable detect anomaly during forward pass for more information.
I don’t understand just why this error.
loss_val:tensor(0.2113, device=‘cuda:0’, grad_fn=DivBackward0)
loss_val.backward()
did’nt work
thanks,
albanD
(Alban D)
May 18, 2020, 3:37pm
4
Interesting… The errors seems to be that some gradients are not floating point numbers.
Do you have a custom autograd Function in your code?
If not, can you share a small code sample (30 lines) that reproduces the issue)?
abd
(gues)
May 18, 2020, 4:50pm
5
Hi @albanD ,
def train_one_epoch(device, model, train_loader, optimizer):
model.train()
train_loss = 0.0
pred = 0.0
count = 0
for i, data in enumerate(tqdm(train_loader)):
template, source, igt = data
template = template.to(device)
source = source.to(device)
igt = igt.to(device)
# mean substraction
source = source - torch.mean(source, dim=1, keepdim=True)
template = template - torch.mean(template, dim=1, keepdim=True)
output = model(template, source)
loss_val = ChamferDistanceLoss()(template, output['transformed_source'])
# print(loss_val.item())
# forward + backward + optimize
optimizer.zero_grad()
loss_val.backward()
optimizer.step()
train_loss += loss_val.item()
count += 1
train_loss = float(train_loss)/count
return train_loss
error in loss_val.backward()
albanD
(Alban D)
May 18, 2020, 4:55pm
6
Hi,
I am missing few key pieces here:
What is ChamferDistanceLoss
?
What is model
?
abd
(gues)
May 18, 2020, 6:10pm
7
Function chamfer_istance
import torch
import torch.nn as nn
import torch.nn.functional as F
def chamfer_distance(template: torch.Tensor, source: torch.Tensor):
from .cuda.chamfer_distance import ChamferDistance
cost_p0_p1, cost_p1_p0 = ChamferDistance()(template, source)
cost_p0_p1 = torch.mean(torch.sqrt(cost_p0_p1))
cost_p1_p0 = torch.mean(torch.sqrt(cost_p1_p0))
chamfer_loss = (cost_p0_p1 + cost_p1_p0)/2.0
return chamfer_loss
class ChamferDistanceLoss(nn.Module):
def __init__(self):
super(ChamferDistanceLoss, self).__init__()
def forward(self, template, source):
return chamfer_distance(template, source)
file train_pcrnet.py:line 99(loss_val.backward())
import argparse
import os
import sys
import logging
import numpy
import numpy as np
import torch
import torch.utils.data
import torchvision
from torch.utils.data import DataLoader
from tensorboardX import SummaryWriter
from tqdm import tqdm
# Only if the files are in example folder.
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
if BASE_DIR[-8:] == 'examples':
sys.path.append(os.path.join(BASE_DIR, os.pardir))
os.chdir(os.path.join(BASE_DIR, os.pardir))
from pcrnet.models import PointNet
This file has been truncated. show original
albanD
(Alban D)
May 18, 2020, 7:26pm
8