Runtime error in cluster but not local

Akis_Linardos · September 6, 2017, 12:19pm

I’ve recently implemented a DCGAN on pytorch which works fine on my local machine but when I tried to run it on a cluster I get this error.

RuntimeError: Assertion `x >= 0. && x <= 1.’ failed. input value should be between 0~1, but got -0.020724 at /opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THNN/generic/BCECriterion.c:34

It becomes apparent that the error originates on the loss function, but it doesn’t make much sense to me since the same exact code works on my local machine.
Now, I havent worked in a cluster before, and I could only think of the following reasons:

Mess up in the installations. Shouldn’t there be an error when importing the corresponding libraries if that was the case? importing works fine however I did try “conda install pytorch torchvision cuda80 -c soumith” again which responds that everything is in order.
Python version. It’s 3.6 on both, I double checked.
Maybe it has something to do with the way the cluster seeds so these values appear only because of that? I used manual seeding to test this but it didn’t change anything

One last thing I thought was that somehow the paths are messed up. /opt/conda/conda-bld/pytorch_1503970438496 which appears in the error doesnt exist in the cluster but how can I fix that?
Any thoughts?

smth · September 30, 2017, 9:39pm

/opt/conda/conda-bld/pytorch_1503970438496 which appears in the error doesnt exist in the cluster but how can I fix that?

This is the path where we built pytorch. the binaries just have such paths.

This is super weird that the output of a nn.Sigmoid on the cluster is giving you -0.020724.
Can you post your DCGAN implementation, atleast the part where the DCGAN model definition is?

Akis_Linardos · January 17, 2018, 1:34pm

Sorry for not replying sooner.
If I remember correctly the issue ended up being that I was using a deprecated function so it worked on my local machine because it was outdated but not in the updated version.

You can have a look at the new version if you want
https://github.com/Linardos/PyTorch/blob/master/DCGAN.py