RuntimeError: CUDA error: device-side assert triggered Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions

Jyun · June 6, 2023, 4:47am

Hello, This is my code with error in torch.manual_seed(myseed) and I have already added os.environ[‘CUDA_LAUNCH_BLOCKING’] = “1”. I don’t know how to fix it

My code

myseed = cfg[‘seed’] # set a random seed for reproducibility
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(myseed)
torch.manual_seed(myseed)
random.seed(myseed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(myseed)

save_path = os.path.join(cfg[‘save_dir’], cfg[‘exp_name’]) # create saving directory
os.makedirs(save_path, exist_ok=True)

log_fw = open(f"{save_path}/log.txt", ‘w’) # open log file to save log outputs
def log(text): # define a logging function to trace the training process
print(text)
log_fw.write(str(text)+‘\n’)
log_fw.flush()

log(cfg) # log your configs to the log file

ptrblck · June 6, 2023, 5:13am

Are you getting the error directly when torch.manual_seed(myseed) is executed or are you running some code already beforehand?
In the former case, was the setup working before and if so what changed?

Jyun · June 6, 2023, 5:41am

I just import some packages and set the config

configs
cfg = {
‘dataset_root’: ‘./Food-11’,
‘save_dir’: ‘./outputs’,
‘exp_name’: “simple_baseline”,
‘batch_size’: 128,
‘lr’: 1e-3,
‘seed’: 20220013,
‘loss_fn_type’: ‘KD’,
‘weight_decay’: 1e-5,
‘grad_norm_max’: 10,
‘n_epochs’: 200,
‘patience’: 40,
}

However, I can run it on colab but but not on my computer.
Could this be a problem with my computer settings?

ptrblck · June 6, 2023, 5:46am

I don’t know how this config is used, but assuming the environment directly fails during the seeding without execution of any previous code I would suggest to delete the current environment and reinstall PyTorch into a new and empty virtual env.