When I try to feed data to GPUs, I found sometimes the img data become NAN and the label data sometimes become -1, the code is below, I also paste the output of this copy of code:
minibatch = dataloader.next()
imgs = minibatch['data']
gts = minibatch['label']
if torch.sum(torch.isnan(imgs).any()).item()>0:
print('There is NAN in imgs before sent to cuda')
else:
print('There is no NAN in imgs before sent to cuda')
print('imgs:',np.amax(imgs.numpy()),np.amin(imgs.numpy()))
print('gts:',np.unique(gts.numpy()))
print('gts:',type(gts))
if engine.distributed:
# for multiple-gpu
imgs = imgs.cuda(non_blocking=True)
gts = gts.cuda(non_blocking=True)
print('gts tensor on cuda:',torch.unique(gts))
if torch.sum(torch.isnan(imgs).any()).item()>0:
print('There is NAN in imgs on cuda')
#print('no move to cuda')
Output:
There is no NAN in imgs before sent to cuda
There is no NAN in imgs before sent to cuda
There is no NAN in imgs before sent to cuda
There is no NAN in imgs before sent to cuda
imgs: 0.594 -0.46539214
imgs: 0.594 -0.46539214
imgs: 0.594 -0.44970587
imgs: 0.594 -0.46147057
gts: [ 0 1 2 4 5 8 9 10 11 13 18 255]
gts: <class ‘torch.Tensor’>
gts: [ 0 1 2 5 7 8 10 11 13 15 18 255]
gts: <class ‘torch.Tensor’>
gts tensor on cuda: tensor([ 0, 1, 2, 4, 5, 8, 9, 10, 11, 13, 18, 255],
device=‘cuda:3’)
gts tensor: tensor([ 0, 1, 2, 4, 5, 8, 9, 10, 11, 13, 18, 255],
device=‘cuda:3’)
gts: [ 0 1 2 3 5 6 7 8 9 11 13 18 255]
gts: <class ‘torch.Tensor’>
gts tensor on cuda: tensor([ 0, 1, 2, 5, 7, 8, 10, 11, 13, 15, 18, 255],
device=‘cuda:1’)
There is NAN in imgs on cuda
gts tensor: tensor([ 0, 1, 2, 5, 7, 8, 10, 11, 13, 15, 18, 255],
device=‘cuda:1’)
Environment:
python3.6
torch 1.1.0
torchvision 0.3.0
cuda: 9.0