So, for reference, I’m doing this on Lambda Cloud GPU service with Jupyter notebook, with the env variable CUDA_LAUNCH_BLOCKING = 1, and I got this output:
This error is from a cell containing only one line to test
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_2510/3640003828.py in <module>
----> 1 model = Reconstruction(decoder_embedding_size = 512, additional_encoder_nhead=5, additional_encoder_dim_feedforward = 2048).to(device)
/usr/lib/python3/dist-packages/torch/nn/modules/module.py in to(self, *args, **kwargs)
1338 raise
1339
-> 1340 return self._apply(convert)
1341
1342 def register_full_backward_pre_hook(
/usr/lib/python3/dist-packages/torch/nn/modules/module.py in _apply(self, fn, recurse)
898 if recurse:
899 for module in self.children():
--> 900 module._apply(fn)
901
902 def compute_should_use_set_data(tensor, tensor_applied):
/usr/lib/python3/dist-packages/torch/nn/modules/module.py in _apply(self, fn, recurse)
898 if recurse:
899 for module in self.children():
--> 900 module._apply(fn)
901
902 def compute_should_use_set_data(tensor, tensor_applied):
/usr/lib/python3/dist-packages/torch/nn/modules/module.py in _apply(self, fn, recurse)
898 if recurse:
899 for module in self.children():
--> 900 module._apply(fn)
901
902 def compute_should_use_set_data(tensor, tensor_applied):
/usr/lib/python3/dist-packages/torch/nn/modules/module.py in _apply(self, fn, recurse)
925 # `with torch.no_grad():`
926 with torch.no_grad():
--> 927 param_applied = fn(param)
928 p_should_use_set_data = compute_should_use_set_data(param, param_applied)
929
/usr/lib/python3/dist-packages/torch/nn/modules/module.py in convert(t)
1324 memory_format=convert_to_format,
1325 )
-> 1326 return t.to(
1327 device,
1328 dtype if t.is_floating_point() or t.is_complex() else None,
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
However, running
model = Reconstruction(decoder_embedding_size = 512, additional_encoder_nhead=5, additional_encoder_dim_feedforward = 2048)
works with no error.
How could this be? If you need, I can show Reconstruction()