Would be immensely grateful for some help. New GPUs not working on my conda environment
In [2]: torch.version.cuda
Out[2]: ‘10.1.243’
cudatoolkit 10.1.243 h6bb024c_0 anaconda
cudnn 7.6.5.32 hc0a50b0_1 conda-forge
python 3.7.5 h0371630_0 anaconda
python-dateutil 2.8.0 py37_0 anaconda
python_abi 3.7 1_cp37m conda-forge
pytorch 1.3.1 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch
torchsummary 1.5.1 pypi_0 pypi
torchvision 0.4.2 py37_cu101 pytorch
i tried updating my pytorch install within my conda environment using
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
as well as updating cudnn
and thus far no luck
GPUs:
NVIDIA GeForce RTX 3090, 00000000:1A:00.0, 94.02.42.00.B0
NVIDIA GeForce RTX 3090, 00000000:1B:00.0, 94.02.42.00.B0
NVIDIA GeForce RTX 3090, 00000000:3D:00.0, 94.02.42.00.B0
NVIDIA GeForce RTX 3090, 00000000:3E:00.0, 94.02.42.00.B0
NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3
(porch) **pae2115@dendrite**:**~/ValveNet_v2**$ CUDA_IDX=0 python CADnet_ECG_001.py
View CADnet_ECG_ONLY_001_2021-11-12--07:54:27 for study log dir
[I 2021-11-12 07:54:27,246] A new study created in memory with name: CADnet_ECG_ONLY_001
data_config <OptunaDataConfig(batch_size=16, shuffle=True, train_sampler=None, validation_sampler=None, data_type=2, pin_memory=True, train_features_path=/home/pae2115/CADnet/amyloid_any_typ
e_ECG_plus_echo_train_wander_removed_pct_truncated_mean_normalized_waveform_features.npy, train_labels_path=/home/pae2115/CADnet/amyloid_any_type_ECG_plus_echo_train_label.npy, train_tabular
_path=None, validation_features_path=/home/pae2115/CADnet/amyloid_any_type_ECG_plus_echo_eval_wander_removed_pct_truncated_mean_normalized_waveform_features.npy, validation_labels_path=/home
/pae2115/CADnet/amyloid_any_type_ECG_plus_echo_eval_label.npy, validation_tabular_path=None, features_permute_axes_order=[], labels_permute_axes_order=[]) @12fd0>
Loading features data from /home/pae2115/CADnet/amyloid_any_type_ECG_plus_echo_train_wander_removed_pct_truncated_mean_normalized_waveform_features.npy
Loading labels data from /home/pae2115/CADnet/amyloid_any_type_ECG_plus_echo_train_label.npy
Loading tabular data from None
result shape before permute_axes_order (7364, 1, 2500, 12)
result shape before permute_axes_order (7364,)
Done loading features data from /home/pae2115/CADnet/amyloid_any_type_ECG_plus_echo_train_wander_removed_pct_truncated_mean_normalized_waveform_features.npy (7364, 1, 2500, 12)
Done loading labels data from /home/pae2115/CADnet/amyloid_any_type_ECG_plus_echo_train_label.npy (7364,)
Loading features data from /home/pae2115/CADnet/amyloid_any_type_ECG_plus_echo_eval_wander_removed_pct_truncated_mean_normalized_waveform_features.npy
Loading labels data from /home/pae2115/CADnet/amyloid_any_type_ECG_plus_echo_eval_label.npy
Loading tabular data from None
result shape before permute_axes_order (2161, 1, 2500, 12)
result shape before permute_axes_order (2161,)
Done loading features data from /home/pae2115/CADnet/amyloid_any_type_ECG_plus_echo_eval_wander_removed_pct_truncated_mean_normalized_waveform_features.npy (2161, 1, 2500, 12)
Done loading labels data from /home/pae2115/CADnet/amyloid_any_type_ECG_plus_echo_eval_label.npy (2161,)
Computing train label counts
Computing eval label counts
train label count: Counter({0.0: 5611, 1.0: 1753})
eval label count: Counter({0.0: 1468, 1.0: 693})
POSITIVE_CLASS_WEIGHT: tensor([3.2008], dtype=torch.float64)
High class imbalance in train, adding positive label weight(s): tensor([3.2008], dtype=torch.float64)
Optuna Train on: cuda:0
trial params dict_items([('batch_size', 16), ('lr', 5e-05), ('weight_decay', 0.01), ('optimizer_name', 'Adam'), ('filter_size', 16), ('dropout', 0.5)])
checkpoint_log_dir CADnet_ECG_ONLY_001_2021-11-12--07:54:27/trial_batch_size=16,lr=5e-05,weight_decay=0.01,optimizer_name=Adam,filter_size=16,dropout=0.5,
loss is on cuda:0 with pos weight tensor([3.2008], device='cuda:0', dtype=torch.float64)
2021-11-12 08:03:37,966 trainer INFO: Engine run starting with max_epochs=80.
2021-11-12 08:03:38,363 trainer ERROR: Current run is terminating due to exception: cuDNN error: CUDNN_STATUS_MAPPING_ERROR.
2021-11-12 08:03:38,364 trainer ERROR: Engine run is terminating due to exception: cuDNN error: CUDNN_STATUS_MAPPING_ERROR.
[W 2021-11-12 08:03:38,364] Trial 0 failed because of the following error: RuntimeError('cuDNN error: CUDNN_STATUS_MAPPING_ERROR')
Traceback (most recent call last):
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/optuna/study/_optimize.py", line 213, in _run_trial
value_or_values = func(trial)
File "/home/pae2115/ValveNet_v2/optuna_common/objective.py", line 110, in __call__
trainer.run(train_data_loader, max_epochs=self.model_config.num_epochs)
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/ignite/engine/engine.py", line 691, in run
return self._internal_run()
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/ignite/engine/engine.py", line 762, in _internal_run
self._handle_exception(e)
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/ignite/engine/engine.py", line 467, in _handle_exception
raise e
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/ignite/engine/engine.py", line 730, in _internal_run
time_taken = self._run_once_on_dataset()
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/ignite/engine/engine.py", line 828, in _run_once_on_dataset
self._handle_exception(e)
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/ignite/engine/engine.py", line 467, in _handle_exception
raise e
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/ignite/engine/engine.py", line 811, in _run_once_on_dataset
self.state.output = self._process_function(self, self.state.batch)
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/ignite/engine/__init__.py", line 99, in _update
y_pred = model(x)
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/pae2115/ValveNet_v2/models/optuna_pytorch_models.py", line 84, in forward
x = self.conv1(x)
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 202, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
Traceback (most recent call last):
File "CADnet_ECG_001.py", line 111, in <module>
redirect=args.redirect
File "/home/pae2115/ValveNet_v2/optuna_common/trial_utils.py", line 50, in run_trials
study.optimize(objective, n_trials=num_trials, gc_after_trial=True)
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/optuna/study/study.py", line 409, in optimize
show_progress_bar=show_progress_bar,
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/optuna/study/_optimize.py", line 76, in _optimize
progress_bar=progress_bar,
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/optuna/study/_optimize.py", line 163, in _optimize_sequential
trial = _run_trial(study, func, catch)
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/optuna/study/_optimize.py", line 264, in _run_trial
raise func_err
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/optuna/study/_optimize.py", line 213, in _run_trial
value_or_values = func(trial)
File "/home/pae2115/ValveNet_v2/optuna_common/objective.py", line 110, in __call__
trainer.run(train_data_loader, max_epochs=self.model_config.num_epochs)
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/ignite/engine/engine.py", line 691, in run
return self._internal_run()
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/ignite/engine/engine.py", line 762, in _internal_run
self._handle_exception(e)
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/ignite/engine/engine.py", line 467, in _handle_exception
raise e
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/ignite/engine/engine.py", line 730, in _internal_run
time_taken = self._run_once_on_dataset()
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/ignite/engine/engine.py", line 828, in _run_once_on_dataset
self._handle_exception(e)
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/ignite/engine/engine.py", line 467, in _handle_exception
raise e
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/ignite/engine/engine.py", line 811, in _run_once_on_dataset
self.state.output = self._process_function(self, self.state.batch)
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/ignite/engine/__init__.py", line 99, in _update
y_pred = model(x)
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/pae2115/ValveNet_v2/models/optuna_pytorch_models.py", line 84, in forward
x = self.conv1(x)
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/pae2115/anaconda3/envs/porch/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 202, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR