Debugging variables

secrettoad · August 3, 2023, 1:47am

Hello,

I am vectorizing a set of documents using a torch.nn module that is part of the huggingface transformers library and getting an error that asks me to set CUDA_LAUNCH_BLOCKING and TORCH_USE_CUDA_DSA in order to get a coherent stack trace.

I cannot find anywhere where and when exactly these variables must be set in order to enable debugging. Can CUDA_LAUNCH_BLOCKING be set after torch installation but before import?
What value must TORCH_USE_CUDA_DSA be set to and can it also be done post-installation?

Thank you in advance for your help.

ptrblck · August 3, 2023, 2:39am

These are environment variables, which should be set before you launch your Python script. You could also try to set them inside your code via the os package, but I would strongly recommend to set them in your terminal via CUDA_LAUNCH_BLOCKING=1 python script.py args.

secrettoad · August 3, 2023, 3:00pm

Thanks for your reply. Am I understanding correctly that you are saying TORCH_USE_CUDA_DSA can be set after CUDA installation and torch installation? It can be set just before the python process initiates?

Also, should it be set to 1? or true?

Thanks again

ptrblck · August 3, 2023, 4:50pm

No, TORCH_USE_CUDA_DSA is an env variable used during the build and irrelevant during the runtime. To enable it you would need to build PyTorch from source with this env variable.

secrettoad · August 3, 2023, 6:05pm

Okay, makes sense. Thank you.

Last question: should the env var be set to 1 or some other value?

ptrblck · August 3, 2023, 11:18pm

Both env variables should be set to 1 to enable them.
From my previous post:

CUDA_LAUNCH_BLOCKING=1 python script.py args