This is a related issue to this one.
I am trying to use Pytorch with Pyspark.
In order to deploy my model on a large dataset on HDFS I need to add the Pytorch wheel with
or directly when I submit the spark job with
I don’t have privileges to install Pytorch on the machines of cluster, so I really need to use the whl file. When I try to import torch, this error happens:
File "/tmp/spark-940d3edb-efdf-4ceb-8955-f1f1e1c59939/userFiles-f88461bb-66de-4d0d-97ad-30a6716c3339/torch-0.4.1-cp27-cp27mu-linux_x86_64.zip/torch/__init__.py", line 80, in <module> ImportError: No module named _C.
I know I should call
import torch from another directory than the root one, but in this case I have no clue on how to do it.
Thanks for your help!
I don’t know how pyspark works but it seems it’s not handling our C built modules properly Are libraries like numpy supported properly?
Even without admin rights, you can create a local python virtualenv where you install pytorch. That might be the simplest thing to do here.
Even creating a local python virtualenv wouldn’t help because in this case I would install it only on the master node, while I need pytorch on each machine that I am using with Pyspark. That’s why I need to send the whl file to each cluster.
Don’t you think that the problem is the same as in here?
you can try to uninstall numpy then install numpy+mkl. The download address is https://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy
No the issue was just because he was trying to import torch from the root of the github repo. And there is a
torch folder there. And so python was loading this folder instead of the installed torch.