Sub() received map but expected some tensors

nicoliKim · July 24, 2018, 10:01am

Good morning guys.
I am trying to run a bunch of (complicated) code but I keep getting the following weird error:

INFO:root:Random state initialized with seed 28794055  
INFO:root:Ani1 will be loaded...
INFO:root:create splits...
INFO:root:load data...
INFO:root:calculate statistics...
INFO:root:cached statistics was loaded...
INFO:root:training...
Traceback (most recent call last):
  File "schnetpack_ani1.py", line 296, in <module>
    train(args, model, train_loader, val_loader, device)
  File "schnetpack_ani1.py", line 156, in train
    trainer.train(device)
  File "/home/kim.a.nicoli/Projects/Schnetpack_release/src/schnetpack/train.py", line 175, in train
    raise e
  File "/home/kim.a.nicoli/Projects/Schnetpack_release/src/schnetpack/train.py", line 118, in train
    loss = self.loss_fn(train_batch, result)
  File "schnetpack_ani1.py", line 149, in loss
    diff = batch[args.property] - result[0]
==TypeError: sub() received an invalid combination of arguments - got (map), but expected one of:
 * (Tensor other, float alpha)
 * (float other, float alpha)==

What I don’t get is that if I insert some print statement at the level where it crashes I get that the types of the objects are e.g. Tensors, and not map.
Does any of you have any suggestion on what to look for or what might be the problem? I didn’t find too much about issues with map type.

Thanks in advance for the help

ptrblck · July 24, 2018, 11:01am

Could you post some more code where you created batch and result?

nicoliKim · July 24, 2018, 12:07pm

result comes from here

y = self.atom_pool(yi, atom_mask)
result = [y]
props = ['y']

at_func = namedtuple('atomwise', props)
return at_func(*result)

as for the batch is just a tensor of a given batch_size created from the Dataloader.
I feel, anyway, that the error might be related to how result is created.

ptrblck · July 24, 2018, 12:13pm

Thanks for the update. Could you print out the type of result0] just before the error is thrown?

print(type(result[0])
print(result[0].type())

nicoliKim · July 24, 2018, 12:19pm

I did it already.
That’s actually why I was confused. This is the output, although it doesn’t make sense to me that it raises the error since result seems to be of type tensor!

torch.FloatTensor
torch.FloatTensor

ptrblck · July 24, 2018, 12:50pm

I assume you’ve already checked the type of batch[args.property].

Could you create a small executable code snippet so that I could run it on my machine?

nicoliKim · July 25, 2018, 12:22pm

Bug found.
It seems that there are conflict with the DataParallel.
If I run the code on a CPU on a single GPU it works.

When I try to go parallel on multiple GPUs it raises the error I reported previously.
I doubled checked and my guess was right. It seems the issue is related to the type returned in the code I showed you.

This means that if I simply return result as a list it works (with DataParallel on multiple GPUs) whereas when I return a tuple through the named tuple method it breaks.
Nonetheless to return a list it not that nice, I would like to keep returning the namedtuple object. Any clue on how I could overcome the issue?