Am getting the following error:
Traceback (most recent call last):
File "/home/kitoo/mlpractical/pytorch_mlp_framework/train_evaluate_image_classification_system.py", line 74, in <module>
experiment_metrics, test_metrics = conv_experiment.run_experiment() # run experiment and return experiment metrics
File "/home/kitoo/mlpractical/pytorch_mlp_framework/experiment_builder.py", line 258, in run_experiment
loss, accuracy = self.run_train_iter(x=x, y=y) # take a training iter step
File "/home/kitoo/mlpractical/pytorch_mlp_framework/experiment_builder.py", line 182, in run_train_iter
out = self.model.forward(x) # forward the data in the model
File "/home/kitoo/mlpractical/pytorch_mlp_framework/model_architectures.py", line 319, in forward
out = self.layer_dict['input_conv'].forward(out)
File "/home/kitoo/mlpractical/pytorch_mlp_framework/model_architectures.py", line 138, in forward
out = self.layer_dict['conv_0'].forward(out)
File "/opt/conda/envs/mlp/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/opt/conda/envs/mlp/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_ARCH_MISMATCH
The issue is, the same code was working till a few days back. I was running this on a GCP compute instance, and had to delete and recreate the same instance from the same image. This issue has now been occurring post creating the new instance.
Using Tesla K80 GPU, and Nvidia driver:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.199.02 Driver Version: 470.199.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 42C P0 73W / 149W | 0MiB / 11441MiB | 73% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Pytorch - 2.1.1 and Cuda 11.8.