Trying to run inference with custom model but get Runtime error asking for Cuda

Ben_Sturgeon · July 6, 2020, 11:53am

I am trying to run inference with a model on a device which only has access to CPU. I am loading the model in with:
gtf = torch.load(model_dir, map_location=torch.device(“cpu”))
But I get a Runtime Error saying:
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu.

How do I get it to stop using Cuda for this operation?
My eventual goal is to run this on an android device using chaquopy so I will not be able to use Cuda at all.

ptrblck · July 7, 2020, 7:54am

I assume the torch.load line of code is raising the error?
If so, did you store the model directly or the state_dict (which is the recommended way)?

Ben_Sturgeon · July 7, 2020, 10:46am

I am loading just the stored model, rather than the state_dict. If you think this might be the root cause I will look further into that.

The error actually occurs when I try to do the actual inference. When I have full access to the GPU it runs fine, but when I direct it to the CPU it throws the error that it was expecting a Cuda device.

To give some more context, I’m actually trying to use jit.trace to serialize the inference so I can deploy on mobile. Here is the code I have:

model_dir = "/home/benjamin/imagelift/Odom_reader/trained_char_det/signatrix_efficientdet_coco.pth"
gtf = torch.load(model_dir)

example_inputs = torch.rand(1, 3, 512, 512)
odom_detect = torch.jit.script(gtf,example_inputs)

The error this returns refers to the jit.trace line:

/home/benjamin/.local/lib/python3.6/site-packages/torch/serialization.py:657: SourceChangeWarning: source code of class ‘torch.nn.modules.conv.Conv2d’ has changed. you can retrieve the original source code by accessing the object’s source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
/home/benjamin/AndroidStudioProjects/Odom_detect/app/src/main/python/src/model.py:251: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if len(inputs) == 2:
/home/benjamin/AndroidStudioProjects/Odom_detect/app/src/main/python/src/utils.py:84: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
image_shape = np.array(image_shape)
/home/benjamin/AndroidStudioProjects/Odom_detect/app/src/main/python/src/utils.py:96: TracerWarning: torch.from_numpy results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
anchors = torch.from_numpy(all_anchors.astype(np.float32))
/home/benjamin/AndroidStudioProjects/Odom_detect/app/src/main/python/src/model.py:282: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if scores_over_thresh.sum() == 0:
Traceback (most recent call last):
File “/home/benjamin/.config/JetBrains/PyCharmCE2020.1/scratches/scratch_1.py”, line 137, in
infer_test()
File “/home/benjamin/.config/JetBrains/PyCharmCE2020.1/scratches/scratch_1.py”, line 113, in infer_test
odom_detect = torch.jit.trace(gtf,example_inputs)
File “/home/benjamin/.local/lib/python3.6/site-packages/torch/jit/init.py”, line 875, in trace
check_tolerance, _force_outplace, _module_class)
File “/home/benjamin/.local/lib/python3.6/site-packages/torch/jit/init.py”, line 1027, in trace_module
module._c._create_method_from_trace(method_name, func, example_inputs, var_lookup_fn, _force_outplace)
RuntimeError: 0 INTERNAL ASSERT FAILED at /pytorch/torch/csrc/jit/ir/alias_analysis.cpp:318, please report a bug to PyTorch. We don’t have an op for aten::to but it isn’t a special case. Argument types: Tensor, None, int, Device, bool, bool, bool, int,

ptrblck · July 8, 2020, 10:02am

Try to recreate the model and load the state_dict. As the warning already show, it seems that the source code was modified, such that the restored model might not work properly anymore.

Once these warning are gone, feel free to update this topic in case you encounter further errors.

Ben_Sturgeon · July 9, 2020, 11:05am

Thank you for the support on this. I’ve managed to get rid of the errors above, and I have succeeded in running the trace on the loaded module. However, I now get an error when trying to save the model.

I tried 2 approaches, one in which I loaded the default model with pretrained weights, and one where I load in the necessary module for my custom trained model. The error is the same with both.

odom_detect.save(“odom_detect_traced.pt”)

Traceback (most recent call last):
File “”, line 1, in
File “/home/benjamin/.local/lib/python3.6/site-packages/torch/jit/init.py”, line 1648, in save
return self._c.save(*args, **kwargs)
RuntimeError:
Could not export Python function call ‘SwishImplementation’. Remove calls to Python functions before export. Did you forget add @script or c annotation? If this is a nn.ModuleList, add it to constants:
/home/benjamin/.local/lib/python3.6/site-packages/efficientnet_pytorch/utils.py(52): forward
/home/benjamin/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(534): _slow_forward
/home/benjamin/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(548): call
/home/benjamin/.local/lib/python3.6/site-packages/efficientnet_pytorch/model.py(78): forward
/home/benjamin/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(534): _slow_forward
/home/benjamin/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(548): call
/home/benjamin/imagelift/opencv-text-recognition/src/model.py(193): forward
/home/benjamin/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(534): _slow_forward
/home/benjamin/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(548): call
/home/benjamin/imagelift/opencv-text-recognition/src/model.py(258): forward
/home/benjamin/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(534): _slow_forward
/home/benjamin/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(548): call
/home/benjamin/.local/lib/python3.6/site-packages/torch/jit/init.py(1027): trace_module
/home/benjamin/.local/lib/python3.6/site-packages/torch/jit/init.py(875): trace
(1):

I have tried adding script methods “@torch.jit.script_method” to many of my forward functions, but I am not sure how deep it is possible for me to go with this, and if it will end up working or not.

ptrblck · July 9, 2020, 11:25pm

Could you post the definition of SwichImplementation? It seems that some Python calls are using inside this method, which cannot be exported.
One workaround would be to split the method into a “Python method” and an exportable method using @torch.jit.ignore.