Pytorch Nightly with CUDA 12.8 on Nvidia 5070Ti: RuntimeError: CUDA error: too many resources requested for launch

I recently set up a workstation with a Nvidia 5070. It has been setup with CUDA 12.8. When I try training my model (which runs well on my Nvidia 3090 with CUDA 12.2), it gives me the following error: RuntimeError: CUDA error: too many resources requested for launch

Here is the full error
Traceback (most recent call last):---------------------------------------| 0.00% [0/1414 00:00<?]
File “/home/praveenbenedict/Projects/Authenta/authenta-generated/notebooks/modelling/Gen/train.py”, line 94, in
trainer.train()
File “/home/praveenbenedict/Projects/Authenta/authenta-generated/notebooks/modelling/Gen/trainer/trainer.py”, line 650, in train
self.train_augmenter(epochs=self.config[‘training’][‘settings’][‘epochs’][‘adversarial’][‘augmenter’])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/praveenbenedict/Projects/Authenta/authenta-generated/notebooks/modelling/Gen/trainer/trainer.py”, line 535, in train_augmenter
loss.backward()
File “/home/praveenbenedict/miniconda3/envs/authenta-generated/lib/python3.11/site-packages/torch/_tensor.py”, line 648, in backward
torch.autograd.backward(
File “/home/praveenbenedict/miniconda3/envs/authenta-generated/lib/python3.11/site-packages/torch/autograd/init.py”, line 353, in backward
_engine_run_backward(
File “/home/praveenbenedict/miniconda3/envs/authenta-generated/lib/python3.11/site-packages/torch/autograd/graph.py”, line 824, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: too many resources requested for launch

I’ve installed the Pytorch Nightly version that supports CUDA 12.8. Is this a temporary issue with Pytorch, or is there anything else that I am missing?

Could you try to narrow down which kernel fails to launch via cuda-gdb?

Thanks @ptrblck Here is what I got when I ran with cuda-gdb:

(cuda-gdb) run train.py
Starting program: /home/praveenbenedict/miniconda3/envs/authenta-generated-image-detection/bin/python train.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
[New Thread 0x7ffff4fff6c0 (LWP 2712)]
[New Thread 0x7ffff27fe6c0 (LWP 2713)]
[New Thread 0x7fffefffd6c0 (LWP 2714)]
[New Thread 0x7fffed7fc6c0 (LWP 2715)]
[New Thread 0x7fffeaffb6c0 (LWP 2716)]
[New Thread 0x7fffe87fa6c0 (LWP 2717)]
[New Thread 0x7fffe5ff96c0 (LWP 2718)]
[New Thread 0x7fffe57f86c0 (LWP 2719)]
[New Thread 0x7fffe0ff76c0 (LWP 2720)]
[New Thread 0x7fffde7f66c0 (LWP 2721)]
[New Thread 0x7fffddff56c0 (LWP 2722)]
[New Thread 0x7fffdb7f46c0 (LWP 2723)]
[New Thread 0x7fffd8ff36c0 (LWP 2724)]
[New Thread 0x7fffd47f26c0 (LWP 2725)]
[New Thread 0x7fffd3ff16c0 (LWP 2726)]
[New Thread 0x7fffd17f06c0 (LWP 2727)]
[New Thread 0x7fffccfef6c0 (LWP 2728)]
[New Thread 0x7fffcc7ee6c0 (LWP 2729)]
[New Thread 0x7fffc7fed6c0 (LWP 2730)]
[New Thread 0x7fffc57ec6c0 (LWP 2731)]
[New Thread 0x7fffc2feb6c0 (LWP 2732)]
[New Thread 0x7fffc07ea6c0 (LWP 2733)]
[New Thread 0x7fffbdfe96c0 (LWP 2734)]
[New Thread 0x7fffbd7e86c0 (LWP 2735)]
[New Thread 0x7fffb8fe76c0 (LWP 2736)]
[New Thread 0x7fffb67e66c0 (LWP 2737)]
[New Thread 0x7fffb3fe56c0 (LWP 2738)]
[New Thread 0x7ffe674996c0 (LWP 2739)]
[New Thread 0x7ffe66c986c0 (LWP 2740)]
[New Thread 0x7ffe664976c0 (LWP 2741)]
[New Thread 0x7ffe65c966c0 (LWP 2742)]
[New Thread 0x7ffe654956c0 (LWP 2743)]
[New Thread 0x7ffe64c946c0 (LWP 2744)]
[New Thread 0x7ffe644936c0 (LWP 2745)]
[New Thread 0x7ffe63c926c0 (LWP 2746)]
[New Thread 0x7ffe634916c0 (LWP 2747)]
[New Thread 0x7ffe62c906c0 (LWP 2748)]
[New Thread 0x7ffe6248f6c0 (LWP 2749)]
[New Thread 0x7ffe61c8e6c0 (LWP 2750)]
[New Thread 0x7ffe6148d6c0 (LWP 2751)]
[New Thread 0x7ffe60c8c6c0 (LWP 2752)]
[New Thread 0x7ffe6048b6c0 (LWP 2753)]
[New Thread 0x7ffe5fc8a6c0 (LWP 2754)]
[New Thread 0x7ffe5f4896c0 (LWP 2755)]
[New Thread 0x7ffe5ec886c0 (LWP 2756)]
[New Thread 0x7ffe5e4876c0 (LWP 2757)]
[New Thread 0x7ffe5dc866c0 (LWP 2758)]
[New Thread 0x7ffe5d4856c0 (LWP 2759)]
[New Thread 0x7ffe5cc846c0 (LWP 2760)]
[New Thread 0x7ffe5c4836c0 (LWP 2761)]
[New Thread 0x7ffe5bc826c0 (LWP 2762)]
[New Thread 0x7ffe5b4816c0 (LWP 2763)]
[New Thread 0x7ffe5ac806c0 (LWP 2764)]
[New Thread 0x7ffe5a47f6c0 (LWP 2765)]
Number of Fake images; Train: 45223 Valid: 12691
Number of Real images; Train: 95319 Valid: 10993
Real Images – train – Train: 95319 – Fake: 45223
Real Images – valid – Train: 10993 – Fake: 12691
Number of Fake images; Train: 45223 Valid: 12691
Number of Real images; Train: 95319 Valid: 10993
[New Thread 0x7ffe51dff6c0 (LWP 2767)]
[New Thread 0x7ffe50b6e6c0 (LWP 2768)]
[Thread 0x7ffe5d4856c0 (LWP 2759) exited]
[Thread 0x7ffe5b4816c0 (LWP 2763) exited]
[Thread 0x7ffe5ac806c0 (LWP 2764) exited]
[Thread 0x7ffe5a47f6c0 (LWP 2765) exited]
[Thread 0x7ffe5bc826c0 (LWP 2762) exited]
[Thread 0x7ffe5c4836c0 (LWP 2761) exited]
[Thread 0x7ffe5cc846c0 (LWP 2760) exited]
[Thread 0x7ffe5dc866c0 (LWP 2758) exited]
[Thread 0x7ffe5e4876c0 (LWP 2757) exited]
[Thread 0x7ffe5ec886c0 (LWP 2756) exited]
[Thread 0x7ffe5f4896c0 (LWP 2755) exited]
[Thread 0x7ffe5fc8a6c0 (LWP 2754) exited]
[Thread 0x7ffe6048b6c0 (LWP 2753) exited]
[Thread 0x7ffe60c8c6c0 (LWP 2752) exited]
[Thread 0x7ffe6148d6c0 (LWP 2751) exited]
[Thread 0x7ffe61c8e6c0 (LWP 2750) exited]
[Thread 0x7ffe6248f6c0 (LWP 2749) exited]
[Thread 0x7ffe62c906c0 (LWP 2748) exited]
[Thread 0x7ffe634916c0 (LWP 2747) exited]
[Thread 0x7ffe63c926c0 (LWP 2746) exited]
[Thread 0x7ffe644936c0 (LWP 2745) exited]
[Thread 0x7ffe64c946c0 (LWP 2744) exited]
[Thread 0x7ffe654956c0 (LWP 2743) exited]
[Thread 0x7ffe65c966c0 (LWP 2742) exited]
[Thread 0x7ffe664976c0 (LWP 2741) exited]
[Thread 0x7ffe66c986c0 (LWP 2740) exited]
[Thread 0x7ffe674996c0 (LWP 2739) exited]
[Thread 0x7ffff27fe6c0 (LWP 2713) exited]
[Thread 0x7fffefffd6c0 (LWP 2714) exited]
[Thread 0x7fffe57f86c0 (LWP 2719) exited]
[Thread 0x7fffed7fc6c0 (LWP 2715) exited]
[Thread 0x7fffeaffb6c0 (LWP 2716) exited]
[Thread 0x7fffcc7ee6c0 (LWP 2729) exited]
[Thread 0x7fffdb7f46c0 (LWP 2723) exited]
[Thread 0x7fffddff56c0 (LWP 2722) exited]
[Thread 0x7ffff4fff6c0 (LWP 2712) exited]
[Thread 0x7fffd3ff16c0 (LWP 2726) exited]
[Thread 0x7fffbd7e86c0 (LWP 2735) exited]
[Thread 0x7fffd47f26c0 (LWP 2725) exited]
[Thread 0x7fffb8fe76c0 (LWP 2736) exited]
[Thread 0x7fffb67e66c0 (LWP 2737) exited]
[Thread 0x7fffbdfe96c0 (LWP 2734) exited]
[Thread 0x7fffb3fe56c0 (LWP 2738) exited]
[Thread 0x7fffc2feb6c0 (LWP 2732) exited]
[Thread 0x7fffc07ea6c0 (LWP 2733) exited]
[Thread 0x7fffc57ec6c0 (LWP 2731) exited]
[Thread 0x7fffc7fed6c0 (LWP 2730) exited]
[Thread 0x7fffccfef6c0 (LWP 2728) exited]
[Thread 0x7fffd17f06c0 (LWP 2727) exited]
[Thread 0x7fffd8ff36c0 (LWP 2724) exited]
[Thread 0x7fffde7f66c0 (LWP 2721) exited]
[Thread 0x7fffe0ff76c0 (LWP 2720) exited]
[Thread 0x7fffe5ff96c0 (LWP 2718) exited]
[Thread 0x7fffe87fa6c0 (LWP 2717) exited]
[Detaching after fork from child process 2769]
[New Thread 0x7fffb3fe56c0 (LWP 2778)]
[Thread 0x7fffb3fe56c0 (LWP 2778) exited]
[New Thread 0x7fffb3fe56c0 (LWP 2779)]
[New Thread 0x7fffb67e66c0 (LWP 2780)]
[2025-04-15 10:59:32] [INFO] => Using checkpoint from run id: 13_04_2025_08_14_51 epoch id: 0
[2025-04-15 10:59:32] [INFO] => Loading checkpoint
[2025-04-15 10:59:32] [INFO] => Current Teacher model metrics - Accuracy: {‘accuracy’: 0.9007581159202742, ‘real_accuracy’: 0.9065225290697675, ‘fake_accuracy’: 0.8949937027707808, ‘f1’: 0.894299447464175}
[2025-04-15 10:59:32] [INFO] => Best Teacher model metrics- Accuracy: {‘accuracy’: 0.9007581159202742, ‘real_accuracy’: 0.9065225290697675, ‘fake_accuracy’: 0.8949937027707808, ‘f1’: 0.894299447464175}
[2025-04-15 10:59:32] [INFO] => Best Pipeline model metrics- Accuracy: {‘accuracy’: tensor(0.8988), ‘real_accuracy’: tensor(0.9122), ‘fake_accuracy’: tensor(0.8853), ‘f1’: 0.8919182083739046}
[2025-04-15 10:59:32] [INFO] => Loaded weights for Teacher Network from best checkpoint
[2025-04-15 10:59:32] [INFO] => Loaded the weights for student
[2025-04-15 10:59:32] [INFO] => Loaded the weights for augmenter
[2025-04-15 10:59:32] [INFO] => Loaded weights for Classifier Network from best checkpoint
[2025-04-15 10:59:32] [INFO] => Using weighted loss for teacher with a weight of 0.7372192144393921
[2025-04-15 10:59:32] [INFO] => Using weighted loss for Classifier with a weight of 0.7372192144393921
[2025-04-15 10:59:32] [INFO] => Adversarial Training…
[2025-04-15 10:59:32] [INFO] => Freezing CLIPFeatureExtractor
[2025-04-15 10:59:32] [INFO] => Freezing TeacherNetworkTransformer
[2025-04-15 10:59:32] [INFO] => Freezing StudentNetworkTransformer
[2025-04-15 10:59:32] [INFO] => Unfreezing FeatureAugmenterTransformer
[Detaching after fork from child process 2781]---------------| 0.00% [0/1414 00:00<?]
[Detaching after fork from child process 2782]
[Detaching after fork from child process 2783]
[Detaching after fork from child process 2784]
[New Thread 0x7fffb8fe76c0 (LWP 2785)]
[New Thread 0x7fffbd7e86c0 (LWP 2786)]
[New Thread 0x7fffe87fa6c0 (LWP 2787)]
[New Thread 0x7fffe0ff76c0 (LWP 2788)]
[New Thread 0x7fffde7f66c0 (LWP 2789)]
[New Thread 0x7fffddff56c0 (LWP 2790)]
[New Thread 0x7fffdb7f46c0 (LWP 2791)]
[New Thread 0x7fffd8ff36c0 (LWP 2792)]
[New Thread 0x7fffd47f26c0 (LWP 2793)]
[New Thread 0x7fffd3ff16c0 (LWP 2794)]
[New Thread 0x7fffd17f06c0 (LWP 2795)]
[New Thread 0x7fffccfef6c0 (LWP 2796)]
[New Thread 0x7fffcc7ee6c0 (LWP 2797)]
[New Thread 0x7fffc7fed6c0 (LWP 2798)]
[New Thread 0x7fffc57ec6c0 (LWP 2799)]
[New Thread 0x7fffc2feb6c0 (LWP 2800)]
[New Thread 0x7fffc07ea6c0 (LWP 2801)]
[Thread 0x7fffb3fe56c0 (LWP 2779) exited]
[New Thread 0x7fffb3fe56c0 (LWP 2810)]
warning: Cuda API error detected: cudaLaunchKernel returned (0x2bd)

warning: Cuda API error detected: cudaGetLastError returned (0x2bd)

[Thread 0x7fffc2feb6c0 (LWP 2800) exited]
[Thread 0x7fffc57ec6c0 (LWP 2799) exited]
[Thread 0x7fffc07ea6c0 (LWP 2801) exited]
[Thread 0x7fffc7fed6c0 (LWP 2798) exited]
Traceback (most recent call last):
File “/home/praveenbenedict/Projects/Authenta/authenta-generated-image-detection/notebooks/modelling/GenDet/train.py”, line 94, in
trainer.train()
File “/home/praveenbenedict/Projects/Authenta/authenta-generated-image-detection/notebooks/modelling/GenDet/trainer/trainer.py”, line 650, in train
self.train_augmenter(epochs=self.config[‘training’][‘settings’][‘epochs’][‘adversarial’][‘augmenter’])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/praveenbenedict/Projects/Authenta/authenta-generated-image-detection/notebooks/modelling/GenDet/trainer/trainer.py”, line 535, in train_augmenter
loss.backward()
File “/home/praveenbenedict/miniconda3/envs/authenta-generated-image-detection/lib/python3.11/site-packages/torch/_tensor.py”, line 648, in backward
torch.autograd.backward(
File “/home/praveenbenedict/miniconda3/envs/authenta-generated-image-detection/lib/python3.11/site-packages/torch/autograd/init.py”, line 353, in backward
_engine_run_backward(
File “/home/praveenbenedict/miniconda3/envs/authenta-generated-image-detection/lib/python3.11/site-packages/torch/autograd/graph.py”, line 824, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: too many resources requested for launch
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

[Thread 0x7fffb3fe56c0 (LWP 2810) exited]
[Thread 0x7ffe50b6e6c0 (LWP 2768) exited]
[Thread 0x7fffccfef6c0 (LWP 2796) exited]
[Thread 0x7fffd17f06c0 (LWP 2795) exited]
[Thread 0x7fffd3ff16c0 (LWP 2794) exited]
[Thread 0x7fffd47f26c0 (LWP 2793) exited]
[Thread 0x7fffd8ff36c0 (LWP 2792) exited]
[Thread 0x7fffdb7f46c0 (LWP 2791) exited]
[Thread 0x7fffddff56c0 (LWP 2790) exited]
[Thread 0x7fffde7f66c0 (LWP 2789) exited]
[Thread 0x7fffe0ff76c0 (LWP 2788) exited]
[Thread 0x7fffe87fa6c0 (LWP 2787) exited]
[Thread 0x7fffbd7e86c0 (LWP 2786) exited]
[Thread 0x7fffb8fe76c0 (LWP 2785) exited]
[Thread 0x7fffb67e66c0 (LWP 2780) exited]
[Thread 0x7ffe51dff6c0 (LWP 2767) exited]
[Thread 0x7ffff7ca9740 (LWP 2711) exited]
[Thread 0x7fffcc7ee6c0 (LWP 2797) exited]
[New process 2711]
[Inferior 1 (process 2711) exited with code 01]
(cuda-gdb) run train.py
Starting program: /home/praveenbenedict/miniconda3/envs/authenta-generated-image-detection/bin/python train.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
[New Thread 0x7ffff4fff6c0 (LWP 2953)]
[New Thread 0x7ffff47fe6c0 (LWP 2954)]
[New Thread 0x7fffefffd6c0 (LWP 2955)]
[New Thread 0x7fffed7fc6c0 (LWP 2956)]
[New Thread 0x7fffecffb6c0 (LWP 2957)]
[New Thread 0x7fffe87fa6c0 (LWP 2958)]
[New Thread 0x7fffe7ff96c0 (LWP 2959)]
[New Thread 0x7fffe37f86c0 (LWP 2960)]
[New Thread 0x7fffe2ff76c0 (LWP 2961)]
[New Thread 0x7fffde7f66c0 (LWP 2962)]
[New Thread 0x7fffddff56c0 (LWP 2963)]
[New Thread 0x7fffd97f46c0 (LWP 2964)]
[New Thread 0x7fffd6ff36c0 (LWP 2965)]
[New Thread 0x7fffd47f26c0 (LWP 2966)]
[New Thread 0x7fffd3ff16c0 (LWP 2967)]
[New Thread 0x7fffcf7f06c0 (LWP 2968)]
[New Thread 0x7fffccfef6c0 (LWP 2969)]
[New Thread 0x7fffca7ee6c0 (LWP 2970)]
[New Thread 0x7fffc9fed6c0 (LWP 2971)]
[New Thread 0x7fffc57ec6c0 (LWP 2972)]
[New Thread 0x7fffc2feb6c0 (LWP 2973)]
[New Thread 0x7fffc27ea6c0 (LWP 2974)]
[New Thread 0x7fffbdfe96c0 (LWP 2975)]
[New Thread 0x7fffbb7e86c0 (LWP 2976)]
[New Thread 0x7fffbafe76c0 (LWP 2977)]
[New Thread 0x7fffb67e66c0 (LWP 2978)]
[New Thread 0x7fffb3fe56c0 (LWP 2979)]
[New Thread 0x7ffe674996c0 (LWP 2982)]
[New Thread 0x7ffe66c986c0 (LWP 2983)]
[New Thread 0x7ffe664976c0 (LWP 2984)]
[New Thread 0x7ffe65c966c0 (LWP 2985)]
[New Thread 0x7ffe654956c0 (LWP 2986)]
[New Thread 0x7ffe64c946c0 (LWP 2987)]
[New Thread 0x7ffe644936c0 (LWP 2988)]
[New Thread 0x7ffe63c926c0 (LWP 2989)]
[New Thread 0x7ffe634916c0 (LWP 2990)]
[New Thread 0x7ffe62c906c0 (LWP 2991)]
[New Thread 0x7ffe6248f6c0 (LWP 2992)]
[New Thread 0x7ffe61c8e6c0 (LWP 2993)]
[New Thread 0x7ffe6148d6c0 (LWP 2994)]
[New Thread 0x7ffe60c8c6c0 (LWP 2995)]
[New Thread 0x7ffe6048b6c0 (LWP 2996)]
[New Thread 0x7ffe5fc8a6c0 (LWP 2997)]
[New Thread 0x7ffe5f4896c0 (LWP 2998)]
[New Thread 0x7ffe5ec886c0 (LWP 2999)]
[New Thread 0x7ffe5e4876c0 (LWP 3000)]
[New Thread 0x7ffe5dc866c0 (LWP 3001)]
[New Thread 0x7ffe5d4856c0 (LWP 3002)]
[New Thread 0x7ffe5cc846c0 (LWP 3003)]
[New Thread 0x7ffe5c4836c0 (LWP 3004)]
[New Thread 0x7ffe5bc826c0 (LWP 3005)]
[New Thread 0x7ffe5b4816c0 (LWP 3006)]
[New Thread 0x7ffe5ac806c0 (LWP 3007)]
[New Thread 0x7ffe5a47f6c0 (LWP 3008)]
Number of Fake images; Train: 45223 Valid: 12691
Number of Real images; Train: 95319 Valid: 10993
Real Images – train – Train: 95319 – Fake: 45223
Real Images – valid – Train: 10993 – Fake: 12691
Number of Fake images; Train: 45223 Valid: 12691
Number of Real images; Train: 95319 Valid: 10993
[New Thread 0x7ffe51dff6c0 (LWP 3010)]
[New Thread 0x7ffe50b6e6c0 (LWP 3011)]
[Thread 0x7ffe644936c0 (LWP 2988) exited]
[Thread 0x7ffe634916c0 (LWP 2990) exited]
[Thread 0x7ffe64c946c0 (LWP 2987) exited]
[Thread 0x7ffe62c906c0 (LWP 2991) exited]
[Thread 0x7ffe654956c0 (LWP 2986) exited]
[Thread 0x7ffe65c966c0 (LWP 2985) exited]
[Thread 0x7ffe664976c0 (LWP 2984) exited]
[Thread 0x7ffe61c8e6c0 (LWP 2993) exited]
[Thread 0x7ffe6148d6c0 (LWP 2994) exited]
[Thread 0x7ffe66c986c0 (LWP 2983) exited]
[Thread 0x7ffe674996c0 (LWP 2982) exited]
[Thread 0x7ffe5f4896c0 (LWP 2998) exited]
[Thread 0x7ffe5dc866c0 (LWP 3001) exited]
[Thread 0x7ffe5d4856c0 (LWP 3002) exited]
[Thread 0x7ffe5cc846c0 (LWP 3003) exited]
[Thread 0x7ffe5c4836c0 (LWP 3004) exited]
[Thread 0x7ffe5e4876c0 (LWP 3000) exited]
[Thread 0x7ffe5ec886c0 (LWP 2999) exited]
[Thread 0x7ffe5b4816c0 (LWP 3006) exited]
[Thread 0x7ffe5fc8a6c0 (LWP 2997) exited]
[Thread 0x7ffe6048b6c0 (LWP 2996) exited]
[Thread 0x7ffe60c8c6c0 (LWP 2995) exited]
[Thread 0x7ffe6248f6c0 (LWP 2992) exited]
[Thread 0x7ffe63c926c0 (LWP 2989) exited]
[Thread 0x7ffe5a47f6c0 (LWP 3008) exited]
[Thread 0x7ffe5ac806c0 (LWP 3007) exited]
[Thread 0x7ffe5bc826c0 (LWP 3005) exited]
[Thread 0x7ffff47fe6c0 (LWP 2954) exited]
[Thread 0x7fffe37f86c0 (LWP 2960) exited]
[Thread 0x7fffddff56c0 (LWP 2963) exited]
[Thread 0x7fffe2ff76c0 (LWP 2961) exited]
[Thread 0x7fffd47f26c0 (LWP 2966) exited]
[Thread 0x7fffe7ff96c0 (LWP 2959) exited]
[Thread 0x7fffd6ff36c0 (LWP 2965) exited]
[Thread 0x7fffccfef6c0 (LWP 2969) exited]
[Thread 0x7fffe87fa6c0 (LWP 2958) exited]
[Thread 0x7fffd97f46c0 (LWP 2964) exited]
[Thread 0x7fffca7ee6c0 (LWP 2970) exited]
[Thread 0x7fffde7f66c0 (LWP 2962) exited]
[Thread 0x7fffecffb6c0 (LWP 2957) exited]
[Thread 0x7fffd3ff16c0 (LWP 2967) exited]
[Thread 0x7fffbdfe96c0 (LWP 2975) exited]
[Thread 0x7fffc9fed6c0 (LWP 2971) exited]
[Thread 0x7fffed7fc6c0 (LWP 2956) exited]
[Thread 0x7fffefffd6c0 (LWP 2955) exited]
[Thread 0x7ffff4fff6c0 (LWP 2953) exited]
[Thread 0x7fffb67e66c0 (LWP 2978) exited]
[Thread 0x7fffb3fe56c0 (LWP 2979) exited]
[Thread 0x7fffbafe76c0 (LWP 2977) exited]
[Thread 0x7fffbb7e86c0 (LWP 2976) exited]
[Thread 0x7fffc27ea6c0 (LWP 2974) exited]
[Thread 0x7fffc2feb6c0 (LWP 2973) exited]
[Thread 0x7fffc57ec6c0 (LWP 2972) exited]
[Thread 0x7fffcf7f06c0 (LWP 2968) exited]
[Detaching after fork from child process 3012]
[New Thread 0x7fffb3fe56c0 (LWP 3021)]
[Thread 0x7fffb3fe56c0 (LWP 3021) exited]
[New Thread 0x7fffb3fe56c0 (LWP 3022)]
[New Thread 0x7fffb67e66c0 (LWP 3023)]
[2025-04-15 11:03:28] [INFO] => Using checkpoint from run id: 13_04_2025_08_14_51 epoch id: 0
[2025-04-15 11:03:28] [INFO] => Loading checkpoint
[2025-04-15 11:03:28] [INFO] => Current Teacher model metrics - Accuracy: {‘accuracy’: 0.9007581159202742, ‘real_accuracy’: 0.9065225290697675, ‘fake_accuracy’: 0.8949937027707808, ‘f1’: 0.894299447464175}
[2025-04-15 11:03:28] [INFO] => Best Teacher model metrics- Accuracy: {‘accuracy’: 0.9007581159202742, ‘real_accuracy’: 0.9065225290697675, ‘fake_accuracy’: 0.8949937027707808, ‘f1’: 0.894299447464175}
[2025-04-15 11:03:28] [INFO] => Best Pipeline model metrics- Accuracy: {‘accuracy’: tensor(0.8988), ‘real_accuracy’: tensor(0.9122), ‘fake_accuracy’: tensor(0.8853), ‘f1’: 0.8919182083739046}
[2025-04-15 11:03:28] [INFO] => Loaded weights for Teacher Network from best checkpoint
[2025-04-15 11:03:28] [INFO] => Loaded the weights for student
[2025-04-15 11:03:28] [INFO] => Loaded the weights for augmenter
[2025-04-15 11:03:28] [INFO] => Loaded weights for Classifier Network from best checkpoint
[2025-04-15 11:03:28] [INFO] => Using weighted loss for teacher with a weight of 0.7372192144393921
[2025-04-15 11:03:28] [INFO] => Using weighted loss for Classifier with a weight of 0.7372192144393921
[2025-04-15 11:03:28] [INFO] => Adversarial Training…
[2025-04-15 11:03:28] [INFO] => Freezing CLIPFeatureExtractor
[2025-04-15 11:03:28] [INFO] => Freezing TeacherNetworkTransformer
[2025-04-15 11:03:28] [INFO] => Freezing StudentNetworkTransformer
[2025-04-15 11:03:28] [INFO] => Unfreezing FeatureAugmenterTransformer
[Detaching after fork from child process 3024]---------------| 0.00% [0/1414 00:00<?]
[Detaching after fork from child process 3025]
[Detaching after fork from child process 3026]
[Detaching after fork from child process 3027]
[New Thread 0x7fffbafe76c0 (LWP 3028)]
[New Thread 0x7fffbb7e86c0 (LWP 3029)]
[New Thread 0x7fffe85fa6c0 (LWP 3030)]
[New Thread 0x7fffe37f86c0 (LWP 3031)]
[New Thread 0x7fffe2ff76c0 (LWP 3032)]
[New Thread 0x7fffde7f66c0 (LWP 3033)]
[New Thread 0x7fffddff56c0 (LWP 3034)]
[New Thread 0x7fffd97f46c0 (LWP 3035)]
[New Thread 0x7fffd6ff36c0 (LWP 3036)]
[New Thread 0x7fffd47f26c0 (LWP 3037)]
[New Thread 0x7fffd3ff16c0 (LWP 3038)]
[New Thread 0x7fffcf7f06c0 (LWP 3039)]
[New Thread 0x7fffccfef6c0 (LWP 3040)]
[New Thread 0x7fffca7ee6c0 (LWP 3041)]
[New Thread 0x7fffc9fed6c0 (LWP 3042)]
[New Thread 0x7fffc57ec6c0 (LWP 3043)]
[New Thread 0x7fffc2feb6c0 (LWP 3044)]
[Thread 0x7fffb3fe56c0 (LWP 3022) exited]
[New Thread 0x7fffb3fe56c0 (LWP 3053)]
warning: Cuda API error detected: cudaLaunchKernel returned (0x2bd)

warning: Cuda API error detected: cudaGetLastError returned (0x2bd)

[Thread 0x7fffc57ec6c0 (LWP 3043) exited]
[Thread 0x7fffc2feb6c0 (LWP 3044) exited]
[Thread 0x7fffc9fed6c0 (LWP 3042) exited]
[Thread 0x7fffca7ee6c0 (LWP 3041) exited]
Traceback (most recent call last):
File “/home/praveenbenedict/Projects/Authenta/authenta-generated-image-detection/notebooks/modelling/GenDet/train.py”, line 96, in
trainer.train()
File “/home/praveenbenedict/Projects/Authenta/authenta-generated-image-detection/notebooks/modelling/GenDet/trainer/trainer.py”, line 650, in train
self.train_augmenter(epochs=self.config[‘training’][‘settings’][‘epochs’][‘adversarial’][‘augmenter’])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/praveenbenedict/Projects/Authenta/authenta-generated-image-detection/notebooks/modelling/GenDet/trainer/trainer.py”, line 535, in train_augmenter
loss.backward()
File “/home/praveenbenedict/miniconda3/envs/authenta-generated-image-detection/lib/python3.11/site-packages/torch/_tensor.py”, line 648, in backward
torch.autograd.backward(
File “/home/praveenbenedict/miniconda3/envs/authenta-generated-image-detection/lib/python3.11/site-packages/torch/autograd/init.py”, line 353, in backward
_engine_run_backward(
File “/home/praveenbenedict/miniconda3/envs/authenta-generated-image-detection/lib/python3.11/site-packages/torch/autograd/graph.py”, line 824, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: too many resources requested for launch
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

[Thread 0x7fffb3fe56c0 (LWP 3053) exited]
[Thread 0x7ffe50b6e6c0 (LWP 3011) exited]
[Thread 0x7fffccfef6c0 (LWP 3040) exited]
[Thread 0x7fffcf7f06c0 (LWP 3039) exited]
[Thread 0x7fffd3ff16c0 (LWP 3038) exited]
[Thread 0x7fffd47f26c0 (LWP 3037) exited]
[Thread 0x7fffd6ff36c0 (LWP 3036) exited]
[Thread 0x7fffd97f46c0 (LWP 3035) exited]
[Thread 0x7fffddff56c0 (LWP 3034) exited]
[Thread 0x7fffde7f66c0 (LWP 3033) exited]
[Thread 0x7fffe2ff76c0 (LWP 3032) exited]
[Thread 0x7fffe37f86c0 (LWP 3031) exited]
[Thread 0x7fffe85fa6c0 (LWP 3030) exited]
[Thread 0x7fffbb7e86c0 (LWP 3029) exited]
[Thread 0x7fffbafe76c0 (LWP 3028) exited]
[Thread 0x7fffb67e66c0 (LWP 3023) exited]
[Thread 0x7ffe51dff6c0 (LWP 3010) exited]
[Inferior 1 (process 2952) exited with code 01]

This is the first time I am using cuda-gdb, so I am not sure if this is what you expected me to do. Please do let me know if I need to provide anything else.

Thanks!