I’m currently trying to understand transformer models more and i’m doing that by training a transformer model from the pytorch from scratch implementations. the tutorials i’ve been using are the ones from the text section of the tutorials in pytorch, specifically the LANGUAGE TRANSLATION WITH
NN.TRANSFORMER AND TORCHTEXT and PREPROCESS CUSTOM TEXT DATASET USING TORCHTEXT
I’ve modified the code a bit so that it would work but it seems that I’ve still missed some parts as i get the error:
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
TORCH_USE_CUDA_DSA to enable device-side assertions.
to try to debug it i set CUDA_LAUNCH_BLOCKING=1
and TORCH_USE_CUDA_DSA=1 as well as make the device it uses to be the cpu.
I also made sure that the amount of columns for each target and source pair is the same by printing them after padding, I also checked the numericalized tokens and made sure that they aren’t above the length of the vocabulary.
below is the link of the google colab file where i run the code. the data is the eng-fr from the tatoeba project. Tab-delimited Bilingual Sentence Pairs from the Tatoeba Project (Good for Anki and Similar Flashcard Applications)
it would be so nice if someone were to tell me what is causing the error and how i could fix this