CUDA 11.8 package version issues and CUDA 11.7 version issue

My custom built model trains fine when using CUDA 11.8 with these packages and versions:

Today I decided to create a new conda environment and followed the instructions to install pytorch 2.0 11.8. I notice that there are different versions now:

My model no longer trains and I get this error when trying to train it (this is an error I did not receive before).

“message”: “backend=‘inductor’ raised:\nCalledProcessError: Command ‘[’/usr/bin/gcc’, ‘/tmp/tmpypr9vj84/main.c’, ‘-O3’, ‘-I/SD5/people/s1208875/miniforge3/envs/torch_test/lib/python3.9/site-packages/triton/common/…/third_party/cuda/include’, ‘-I/SD5/people/s1208875/miniforge3/envs/torch_test/include/python3.9’, ‘-I/tmp/tmpypr9vj84’, ‘-shared’, ‘-fPIC’, ‘-lcuda’, ‘-o’, ‘/tmp/tmpypr9vj84/’, ‘-L/lib64’, ‘-L/lib64’]’ returned non-zero exit status 1.\n\nSet TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information\n\n\nYou can suppress this exception and fall back to eager by setting:\n import torch._dynamo\n torch._dynamo.config.suppress_errors = True\n”

This becomes a problem if other people need to reproduce my conda environment that has CUDA 11.8. I read in another post that CUDA 11.7 is more stable so I will try to download and use that one instead.

Any thoughts as to what happened with CUDA 11.8?

The error points to triton in the paths, so I would start by looking into it instead of CUDA.