Hi all,
Is anyone aware of a problem when running pytest and getting the error:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
The tests are for use within a large training pipeline (e.g. testing complete model training for a couple epochs, retraining of models, etc) so I’m not sure I can reduce it down to a simple snippet demonstrating the issue.
Basically, if I run the entire test suite using:
pytest tests/
then some of the tests will fail.
If I run those failing tests at the module level:
pytest tests/modulename/module/test_module.py
then they pass.
The tests all pass on my local machine (I have to exclude some as my machine can’t handle the overhead), and they pass when I run them individually on the remote machine. It’s pretty annoying as it’s interfering with our deployment procedure and I know the tests pass.
Local environment (windows):
NVIDIA-SMI 497.29 Driver Version: 497.29 CUDA Version: 11.5
pytorch 1.11.0 py3.8_cuda11.3_cudnn8_0 pytorch
pytorch-mutex 1.0 cuda pytorch
torchaudio 0.11.0 py38_cu113 pytorch
torchio 0.18.43 pypi_0 pypi
torchsummary 1.5.1 pypi_0 pypi
torchvision 0.12.0 py38_cu113 pytorch
Remote environment (ubuntu):
NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4
pytorch 1.11.0 py3.8_cuda11.3_cudnn8.2.0_0 pytorch
pytorch-mutex 1.0 cuda pytorch
torchaudio 0.11.0 py38_cu113 pytorch
torchio 0.18.43 pypi_0 pypi
torchsummary 1.5.1 pypi_0 pypi
torchvision 0.12.0 py38_cu113 pytorch