How to enable `TORCH_USE_CUDA_DSA`

Traceback (most recent call last):
  File "D:\Vikas\Deepvanet\Deepvaner\demo_me.py", line 140, in <module>
    demo()
  File "D:\Vikas\Deepvanet\Deepvaner\demo_me.py", line 128, in demo
    train(modal=args.modal, dataset=args.dataset, epoch=args.epoch, lr=args.learn_rate, use_gpu=use_gpu,
  File "D:\Vikas\Deepvanet\Deepvaner\train_me.py", line 115, in train
    input = (data[0].float().to(device), data[1].float().to(device))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I am relatively new to deep learning, I am trying to compile it with TORCH_USE_CUDA_DSA, on windows pc. I have the following piece of code in my code snippet, which I believe should enable device-side assertions. But it does not.

import os

os.environ['CUDA_LAUNCH_BLOCKING']="1"
os.environ['TORCH_USE_CUDA_DSA'] = "1"

however, cuda gives me asynchronous stack trace, and the above mentioned trace. Any help will be appreciated.

Launch the script with blocking launches by exporting this env variable in your terminal and rerun your code to see which line of code failed. If you are stuck, feel free to post a minimal and executable code snippet reproducing the issue.

Do you mean something like this

>> set CUDA_LAUNCH_BLOCKING = 1
>> SET TORCH_USE_CUDA_DSA = 1
>> python demo_me.py

TORCH_USE_CUDA_DSA won’t have any effect on the runtime unless you build PyTorch with this env variable. I’m not using Windows, but guess set should work (export would be the right approach on Linux).

C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Loss.cu:106: block: [0,0,0], thread: [32,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Loss.cu:106: block: [0,0,0], thread: [33,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Loss.cu:106: block: [0,0,0], thread: [34,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Loss.cu:106: block: [0,0,0], thread: [35,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Loss.cu:106: block: [0,0,0], thread: [36,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Loss.cu:106: block: [0,0,0], thread: [37,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Loss.cu:106: block: [0,0,0], thread: [38,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Loss.cu:106: block: [0,0,0], thread: [39,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Loss.cu:106: block: [0,0,0], thread: [40,0,0] Assertion `input_val >= zero && input_val <= one` failed.


........ some hundred lines 

CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

had this in my code

import os

os.environ['CUDA_LAUNCH_BLOCKING']="1"
os.environ['TORCH_USE_CUDA_DSA'] = "1"

and also this in the terminal

`>> set CUDA_LAUNCH_BLOCKING = 1

SET TORCH_USE_CUDA_DSA = 1`

(in the conda env, I was running), but it returned the asynchronous stack, nonetheless.

This is the actual error, that I am facing. Is this the vanishing or exploding gradient problem ?

trian and test accuracy for last 10 epochs are as follows:

Epoch: train11| train loss: 0.11601685239549946| val accuracy: 0.9291666746139526
Epoch: train12| train accuracy: 0.956944465637207
Epoch: train12| train loss: 0.13100543859250405| val accuracy: 0.9583333134651184
Epoch: train13| train accuracy: 0.9550926089286804
Epoch: train13| train loss: 0.14594609443755707| val accuracy: 0.8374999761581421
Epoch: train14| train accuracy: 0.9574074149131775
Epoch: train14| train loss: 0.14393691696664865| val accuracy: 0.925000011920929
Epoch: train15| train accuracy: 0.9643518328666687
Epoch: train15| train loss: 0.10682554904590634| val accuracy: 0.9375
Epoch: train16| train accuracy: 0.9745370149612427
Epoch: train16| train loss: 0.07701346902724573| val accuracy: 0.9458333253860474
Epoch: train17| train accuracy: 0.9597222208976746
Epoch: train17| train loss: 0.12430122270084479| val accuracy: 0.887499988079071
Epoch: train18| train accuracy: 0.9624999761581421
Epoch: train18| train loss: 0.10855092442430118| val accuracy: 0.9458333253860474
Epoch: train19| train accuracy: 0.9675925970077515
Epoch: train19| train loss: 0.09032832853057807| val accuracy: 0.9583333134651184
Epoch: train20| train accuracy: 0.9782407283782959
Epoch: train20| train loss: 0.06491927988827227| val accuracy: 0.9708333611488342
Epoch: train21| train accuracy: 0.9814814925193787
Epoch: train21| train loss: 0.05672524696873391| val accuracy: 0.987500011920929
Epoch: train22| train accuracy: 0.9800925850868225
Epoch: train22| train loss: 0.06026066580842084| val accuracy: 0.949999988079071
Epoch: train23| train accuracy: 0.9791666865348816
Epoch: train23| train loss: 0.0585806009852711| val accuracy: 0.9750000238418579
Epoch: train24| train accuracy: 0.9837962985038757
Epoch: train24| train loss: 0.056918492759851835| val accuracy: 0.9458333253860474
Epoch: train25| train accuracy: 0.9731481671333313
Epoch: train25| train loss: 0.09047737471101917| val accuracy: 0.9750000238418579
Epoch: train26| train accuracy: 0.9694444537162781
Epoch: train26| train loss: 0.0833502056159298| val accuracy: 0.9708333611488342
Epoch: train27| train accuracy: 0.9791666865348816
Epoch: train27| train loss: 0.0535214664414525| val accuracy: 0.9750000238418579

plz, let me know, if you need any more information

No, it’s an indexing error.

If you have trouble isolating it, feel free to post a minimal and executable code snippet reproducing the issue.

This error will not occur at the same point always.

class DeepVANet(nn.Module):
    def __init__(self, bio_input_size=32, face_feature_size=16, bio_feature_size=64,pretrain=True):
        super(DeepVANet,self).__init__()
        self.face_feature_extractor = FaceFeatureExtractor(feature_size=face_feature_size,pretrain=pretrain)
        
        self.bio_feature_extractor =  Transformer1d(
                                        bio_input_size, 
                                        n_classes=64, 
                                        n_length=128, 
                                        d_model=32, 
                                        nhead=8, 
                                        dim_feedforward=128, 
                                        dropout=0.1, 
                                        activation='relu'
                                        )
    
        self.classifier = nn.Sequential(
            nn.Linear(face_feature_size + bio_feature_size, 50),
            nn.ReLU(inplace=True),
            nn.Linear(50, 20),
            nn.ReLU(inplace= True),
            nn.Linear(20,1),
            nn.Sigmoid()
        )

    def forward(self,x):

        img_features = self.face_feature_extractor(x[0])
        bio_features = self.bio_feature_extractor(x[1])
        features = torch.cat([img_features,bio_features.float()],dim=1)
        output = self.classifier(features)
        output = output.squeeze(-1)
        return output

BioFeatureExtractor(passed through CNN and then through LSTM, to give me 16 features) and bio_feature_extractor(gives me 64 features) are two different models, I am trying to fuse them to classify either 0 or 1.

I have run them both separately, they work perfectly fine, this problem only arises, when I am trying to fuse both the models.

The model can be found at GitHub - vvikasreddy/Deepvaner

let me know, if you need more info…

Setting up an entire project without executable code which reproduces the issue for you, would take too much time and is not guaranteed to fail. Take a look at this post to see how your code can be adapted so we could use it for debugging.

Figured out, because my inputs are not normalized, it was giving me those values, sometimes it is just going out of bounds, normalized and problem solved