CUDA error: device-side assert triggered only on my device, but code works on other devices

Hi I just got a CUDA compatible device and am running some LLM training code which works on my friend’s device (as well as mine using only CPU) just fine but I get the error below (using CUDA_LAUNCH_BLOCKING=1) after it goes through several iterations (different each run). I am trying to run this code: ng-video-lecture/gpt.py at master · karpathy/ng-video-lecture · GitHub
(I also ran the code where the input data is the same each iteration and even then it runs for a few iterations and errors at either line 164 or 165
164 tok_emb = self.token_embedding_table(idx) # (B,T,C)
165 pos_emb = self.position_embedding_table(torch.arange(T, device=device)) # (T,C)

Additionally, I checked the min and max values of the input tensors and they are as expected (between 0 and vocab_size-1).

I am using python version 3.10.0 and installed pytorch using the following command (from the website):
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

I would appreciate any help with this as I looked through other solutions but I cannot seem to find one for myself.
error:

C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "C:\Users\erik3\OneDrive\Desktop\Beyond-Books\tt.py", line 239, in <module>
    losses = estimate_loss()
  File "C:\Users\erik3\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\erik3\OneDrive\Desktop\Beyond-Books\tt.py", line 64, in estimate_loss
    logits, loss = model(X, Y)
  File "C:\Users\erik3\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\erik3\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\erik3\OneDrive\Desktop\Beyond-Books\tt.py", line 185, in forward
    pos_emb = self.position_embedding_table(torch.arange(T, device=device)) # (T,C)
  File "C:\Users\erik3\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\erik3\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\erik3\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\sparse.py", line 163, in forward
    return F.embedding(
  File "C:\Users\erik3\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\functional.py", line 2264, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

The indexing error is often caused by invalid indices in an embedding input, so print the inputs in each iteration to narrow down which batch fails.

Thank you for the reply. To narrow down the issue, I used a batch size of 1 and for each iteration, I input the same exact tensor. I copy pasted my full code below. As you can see, in the get_batch() function, the randint parameters have been adjusted to always take the same line from the input text, and in the forward() function for the LangMod class, the tensor being embedded is printed each iteration. As a result, each iteration, the same tensor is being printed. I copy pasted the output below, although it is truncated as several interations completed successfully before the error, so the same tensor is printed each time. (the min and max values of each input tensor is also printed and stays the same each iteration)

I am struggling to understand the cause of the issue, as the inputs are exactly the same and the error seems to occur randomly. An additional observation I found is that with smaller batch size and input_size, the error occurs later, meaning more iterations are successful before the error occurs. I am very confused.

Code:

import torch
import torch.nn as nn
from torch.nn import functional as F

import os
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

# hyperparameters
batch_size = 1 # how many independent sequences will we process in parallel?
block_size = 1000 # what is the maximum context length for predictions?
max_iters = 5000
eval_interval = 500
learning_rate = 3e-4
device = 'cuda' if torch.cuda.is_available() else 'cpu'
#device="cpu"
eval_iters = 200
n_embd = 384
n_head = 6
n_layer = 6
dropout = 0.2
# ------------

torch.manual_seed(1337)

# wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt
with open('input.txt', 'r', encoding='utf-8') as f:
    text = f.read()

# here are all the unique characters that occur in this text
chars = sorted(list(set(text)))
vocab_size = len(chars)
print(vocab_size)
# create a mapping from characters to integers
stoi = { ch:i for i,ch in enumerate(chars) }
itos = { i:ch for i,ch in enumerate(chars) }
encode = lambda s: [stoi[c] for c in s] # encoder: take a string, output a list of integers
decode = lambda l: ''.join([itos[i] for i in l]) # decoder: take a list of integers, output a string

# Train and test splits
data = torch.tensor(encode(text), dtype=torch.long)
n = int(0.9*len(data)) # first 90% will be train, rest val
train_data = data[:n]
val_data = data[n:]

# data loading
def get_batch(split):
    # generate a small batch of data of inputs x and targets y
    data = train_data if split == 'train' else val_data
    ix = torch.randint(1, (batch_size,))
    x = torch.stack([data[i:i+block_size] for i in ix])
    y = torch.stack([data[i+1:i+block_size+1] for i in ix])
    x, y = x.to(device), y.to(device)
    return x, y

@torch.no_grad()
def estimate_loss():
    out = {}
    model.eval()
    for split in ['train', 'val']:
        losses = torch.zeros(eval_iters)
        for k in range(eval_iters):
            X, Y = get_batch(split)
            logits, loss = model(X, Y)
            losses[k] = loss.item()
        out[split] = losses.mean()
    model.train()
    return out

class Head(nn.Module):
    """ one head of self-attention """

    def __init__(self, head_size):
        super().__init__()
        self.key = nn.Linear(n_embd, head_size, bias=False)
        self.query = nn.Linear(n_embd, head_size, bias=False)
        self.value = nn.Linear(n_embd, head_size, bias=False)
        self.register_buffer('tril', torch.tril(torch.ones(block_size, block_size)))

        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        # input of size (batch, time-step, channels)
        # output of size (batch, time-step, head size)
        B,T,C = x.shape
        #print(max([max([max(i) for i in x[j]]) for j in range(len(x))]))
        k = self.key(x)   # (B,T,hs)
        q = self.query(x) # (B,T,hs)
        # compute attention scores ("affinities")
        wei = q @ k.transpose(-2,-1) * k.shape[-1]**-0.5 # (B, T, hs) @ (B, hs, T) -> (B, T, T)
        wei = wei.masked_fill(self.tril[:T, :T] == 0, float('-inf')) # (B, T, T)
        wei = F.softmax(wei, dim=-1) # (B, T, T)
        wei = self.dropout(wei)
        # perform the weighted aggregation of the values
        v = self.value(x) # (B,T,hs)
        out = wei @ v # (B, T, T) @ (B, T, hs) -> (B, T, hs)
        return out

class MultiHeadAttention(nn.Module):
    """ multiple heads of self-attention in parallel """

    def __init__(self, num_heads, head_size):
        super().__init__()
        self.heads = nn.ModuleList([Head(head_size) for _ in range(num_heads)])
        self.proj = nn.Linear(head_size * num_heads, n_embd)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        out = torch.cat([h(x) for h in self.heads], dim=-1)
        out = self.dropout(self.proj(out))
        return out

class FeedFoward(nn.Module):
    """ a simple linear layer followed by a non-linearity """

    def __init__(self, n_embd):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(n_embd, 4 * n_embd),
            nn.ReLU(),
            nn.Linear(4 * n_embd, n_embd),
            nn.Dropout(dropout),
        )

    def forward(self, x):
        return self.net(x)

class Block(nn.Module):
    """ Transformer block: communication followed by computation """

    def __init__(self, n_embd, n_head):
        # n_embd: embedding dimension, n_head: the number of heads we'd like
        super().__init__()
        head_size = n_embd // n_head
        self.sa = MultiHeadAttention(n_head, head_size)
        self.ffwd = FeedFoward(n_embd)
        self.ln1 = nn.LayerNorm(n_embd)
        self.ln2 = nn.LayerNorm(n_embd)

    def forward(self, x):
        x = x + self.sa(self.ln1(x))
        x = x + self.ffwd(self.ln2(x))
        return x

class GPTLanguageModel(nn.Module):

    def __init__(self):
        super().__init__()
        # each token directly reads off the logits for the next token from a lookup table
        self.token_embedding_table = nn.Embedding(vocab_size, n_embd)
        self.position_embedding_table = nn.Embedding(block_size, n_embd)
        self.blocks = nn.Sequential(*[Block(n_embd, n_head=n_head) for _ in range(n_layer)])
        self.ln_f = nn.LayerNorm(n_embd) # final layer norm
        self.lm_head = nn.Linear(n_embd, vocab_size)

        # better init, not covered in the original GPT video, but important, will cover in followup video
        self.apply(self._init_weights)

    def _init_weights(self, module):
        if isinstance(module, nn.Linear):
            torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)
            if module.bias is not None:
                torch.nn.init.zeros_(module.bias)
        elif isinstance(module, nn.Embedding):
            torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)

    def forward(self, idx, targets=None):
        print(max(idx[0]))
        print(min(idx[0]))
        B, T = idx.shape
        print(idx[0].tolist())
        # idx and targets are both (B,T) tensor of integers
        tok_emb = self.token_embedding_table(idx) # (B,T,C)
        pos_emb = self.position_embedding_table(torch.arange(T, device=device)) # (T,C)
        x = tok_emb + pos_emb # (B,T,C)

        x = self.blocks(x) # (B,T,C)
        x = self.ln_f(x) # (B,T,C)
        logits = self.lm_head(x) # (B,T,vocab_size)

        if targets is None:
            loss = None
        else:
            B, T, C = logits.shape
            logits = logits.view(B*T, C)
            targets = targets.view(B*T)
            loss = F.cross_entropy(logits, targets)

        return logits, loss

    def generate(self, idx, max_new_tokens):
        # idx is (B, T) array of indices in the current context
        for _ in range(max_new_tokens):
            # crop idx to the last block_size tokens
            idx_cond = idx[:, -block_size:]
            # get the predictions
            logits, loss = self(idx_cond)
            # focus only on the last time step
            logits = logits[:, -1, :] # becomes (B, C)
            # apply softmax to get probabilities
            probs = F.softmax(logits, dim=-1) # (B, C)
            # sample from the distribution
            idx_next = torch.multinomial(probs, num_samples=1) # (B, 1)
            # append sampled index to the running sequence
            idx = torch.cat((idx, idx_next), dim=1) # (B, T+1)
        return idx

model = GPTLanguageModel()
m = model.to(device)
# print the number of parameters in the model
print(sum(p.numel() for p in m.parameters())/1e6, 'M parameters')

# create a PyTorch optimizer
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)
for iter in range(max_iters):

    # every once in a while evaluate the loss on train and val sets
    if iter % eval_interval == 0 or iter == max_iters - 1:
        losses = estimate_loss()
        print(f"step {iter}: train loss {losses['train']:.4f}, val loss {losses['val']:.4f}")
    # sample a batch of data
    xb, yb = get_batch('train')

    # evaluate the loss
    logits, loss = model(xb, yb)
    optimizer.zero_grad(set_to_none=True)
    loss.backward()
    optimizer.step()

# generate from the model
context = torch.zeros((1, 1), dtype=torch.long, device=device)
print(decode(m.generate(context, max_new_tokens=500)[0].tolist()))
#open('more.txt', 'w').write(decode(m.generate(context, max_new_tokens=10000)[0].tolist()))

Error:

... 43, 1, 61, 53, 56, 42, 6, 1, 45, 53, 53, 42, 1, 41, 47, 58, 47, 64, 43, 52, 57, 8, 0, 0, 18, 47, 56, 57, 58, 1, 15, 47, 58, 47, 64, 43, 52, 10, 0, 35, 43, 1, 39, 56, 43, 1, 39, 41, 41, 53, 59, 52, 58, 43, 42, 1, 54, 53, 53, 56, 1, 41, 47, 58, 47, 64, 43, 52, 57, 6, 1, 58, 46, 43, 1, 54, 39, 58, 56, 47, 41, 47, 39, 52, 57, 1, 45, 53, 53, 42, 8, 0, 35, 46, 39, 58, 1, 39, 59, 58, 46, 53, 56, 47, 58, 63, 1, 57, 59, 56, 44, 43, 47, 58, 57, 1, 53, 52, 1, 61, 53, 59, 50, 42, 1, 56, 43, 50, 47, 43, 60, 43, 1, 59, 57, 10, 1, 47, 44, 1, 58, 46, 43, 63, 0, 61, 53, 59, 50, 42, 1, 63, 47, 43, 50, 42, 1, 59, 57, 1, 40, 59, 58, 1, 58, 46, 43, 1, 57, 59, 54, 43, 56, 44, 50, 59, 47, 58, 63, 6, 1, 61, 46, 47, 50, 43, 1, 47, 58, 1, 61, 43, 56, 43, 0, 61, 46, 53, 50, 43, 57, 53, 51, 43, 6, 1, 61, 43, 1, 51, 47, 45, 46, 58, 1, 45, 59, 43, 57, 57, 1, 58, 46, 43, 63, 1, 56, 43, 50, 47, 43, 60, 43, 42, 1, 59, 57, 1, 46, 59, 51, 39, 52, 43, 50, 63, 11, 0, 40, 59, 58, 1, 58, 46, 43, 63, 1, 58, 46, 47, 52, 49, 1, 61, 43, 1, 39, 56, 43, 1, 58, 53, 53, 1, 42, 43, 39, 56, 10, 1, 58, 46, 43, 1, 50, 43, 39, 52, 52, 43, 57, 57, 1, 58, 46, 39, 58, 0, 39, 44, 44, 50, 47, 41, 58, 57, 1, 59, 57, 6, 1, 58, 46, 43, 1, 53, 40, 48, 43, 41, 58, 1, 53, 44, 1, 53, 59, 56, 1, 51, 47, 57, 43, 56, 63, 6, 1, 47, 57, 1, 39, 57, 1, 39, 52, 0, 47, 52, 60, 43, 52, 58, 53, 56, 63, 1, 58, 53, 1, 54, 39, 56, 58, 47, 41, 59, 50, 39, 56, 47, 57, 43, 1, 58, 46, 43, 47, 56, 1, 39, 40, 59, 52, 42, 39, 52, 41, 43, 11, 1, 53, 59, 56, 0, 57, 59, 44, 44, 43, 56, 39, 52, 41, 43, 1, 47, 57, 1, 39, 1, 45, 39, 47, 52, 1, 58, 53, 1, 58, 46, 43, 51, 1, 24, 43, 58, 1, 59, 57, 1, 56, 43, 60, 43, 52, 45, 43, 1, 58, 46, 47, 57, 1, 61, 47, 58, 46, 0, 53, 59, 56, 1, 54, 47, 49, 43, 57, 6, 1, 43, 56, 43, 1, 61, 43, 1, 40, 43, 41, 53, 51, 43, 1, 56, 39, 49, 43, 57, 10, 1, 44, 53, 56, 1, 58, 46, 43, 1, 45, 53, 42, 57, 1, 49, 52, 53, 61, 1, 21, 0, 57, 54, 43, 39, 49, 1, 58, 46, 47, 57, 1, 47, 52, 1, 46, 59, 52, 45, 43, 56, 1, 44, 53, 56, 1, 40, 56, 43, 39, 42, 6, 1, 52, 53, 58, 1, 47, 52, 1, 58, 46, 47, 56, 57, 58, 1, 44, 53, 56, 1, 56, 43, 60, 43, 52, 45, 43, 8, 0, 0]
tensor(64, device='cuda:0')
tensor(0, device='cuda:0')
[18, 47, 56, 57, 58, 1, 15, 47, 58, 47, 64, 43, 52, 10, 0, 14, 43, 44, 53, 56, 43, 1, 61, 43, 1, 54, 56, 53, 41, 43, 43, 42, 1, 39, 52, 63, 1, 44, 59, 56, 58, 46, 43, 56, 6, 1, 46, 43, 39, 56, 1, 51, 43, 1, 57, 54, 43, 39, 49, 8, 0, 0, 13, 50, 50, 10, 0, 31, 54, 43, 39, 49, 6, 1, 57, 54, 43, 39, 49, 8, 0, 0, 18, 47, 56, 57, 58, 1, 15, 47, 58, 47, 64, 43, 52, 10, 0, 37, 53, 59, 1, 39, 56, 43, 1, 39, 50, 50, 1, 56, 43, 57, 53, 50, 60, 43, 42, 1, 56, 39, 58, 46, 43, 56, 1, 58, 53, 1, 42, 47, 43, 1, 58, 46, 39, 52, 1, 58, 53, 1, 44, 39, 51, 47, 57, 46, 12, 0, 0, 13, 50, 50, 10, 0, 30, 43, 57, 53, 50, 60, 43, 42, 8, 1, 56, 43, 57, 53, 50, 60, 43, 42, 8, 0, 0, 18, 47, 56, 57, 58, 1, 15, 47, 58, 47, 64, 43, 52, 10, 0, 18, 47, 56, 57, 58, 6, 1, 63, 53, 59, 1, 49, 52, 53, 61, 1, 15, 39, 47, 59, 57, 1, 25, 39, 56, 41, 47, 59, 57, 1, 47, 57, 1, 41, 46, 47, 43, 44, 1, 43, 52, 43, 51, 63, 1, 58, 53, 1, 58, 46, 43, 1, 54, 43, 53, 54, 50, 43, 8, 0, 0, 13, 50, 50, 10, 0, 35, 43, 1, 49, 52, 53, 61, 5, 58, 6, 1, 61, 43, 1, 49, 52, 53, 61, 5, 58, 8, 0, 0, 18, 47, 56, 57, 58, 1, 15, 47, 58, 47, 64, 43, 52, 10, 0, 24, 43, 58, 1, 59, 57, 1, 49, 47, 50, 50, 1, 46, 47, 51, 6, 1, 39, 52, 42, 1, 61, 43, 5, 50, 50, 1, 46, 39, 60, 43, 1, 41, 53, 56, 52, 1, 39, 58, 1, 53, 59, 56, 1, 53, 61, 52, 1, 54, 56, 47, 41, 43, 8, 0, 21, 57, 5, 58, 1, 39, 1, 60, 43, 56, 42, 47, 41, 58, 12, 0, 0, 13, 50, 50, 10, 0, 26, 53, 1, 51, 53, 56, 43, 1, 58, 39, 50, 49, 47, 52, 45, 1, 53, 52, 5, 58, 11, 1, 50, 43, 58, 1, 47, 58, 1, 40, 43, 1, 42, 53, 52, 43, 10, 1, 39, 61, 39, 63, 6, 1, 39, 61, 39, 63, 2, 0, 0, 31, 43, 41, 53, 52, 42, 1, 15, 47, 58, 47, 64, 43, 52, 10, 0, 27, 52, 43, 1, 61, 53, 56, 42, 6, 1, 45, 53, 53, 42, 1, 41, 47, 58, 47, 64, 43, 52, 57, 8, 0, 0, 18, 47, 56, 57, 58, 1, 15, 47, 58, 47, 64, 43, 52, 10, 0, 35, 43, 1, 39, 56, 43, 1, 39, 41, 41, 53, 59, 52, 58, 43, 42, 1, 54, 53, 53, 56, 1, 41, 47, 58, 47, 64, 43, 52, 57, 6, 1, 58, 46, 43, 1, 54, 39, 58, 56, 47, 41, 47, 39, 52, 57, 1, 45, 53, 53, 42, 8, 0, 35, 46, 39, 58, 1, 39, 59, 58, 46, 53, 56, 47, 58, 63, 1, 57, 59, 56, 44, 43, 47, 58, 57, 1, 53, 52, 1, 61, 53, 59, 50, 42, 1, 56, 43, 50, 47, 43, 60, 43, 1, 59, 57, 10, 1, 47, 44, 1, 58, 46, 43, 63, 0, 61, 53, 59, 50, 42, 1, 63, 47, 43, 50, 42, 1, 59, 57, 1, 40, 59, 58, 1, 58, 46, 43, 1, 57, 59, 54, 43, 56, 44, 50, 59, 47, 58, 63, 6, 1, 61, 46, 47, 50, 43, 1, 47, 58, 1, 61, 43, 56, 43, 0, 61, 46, 53, 50, 43, 57, 53, 51, 43, 6, 1, 61, 43, 1, 51, 47, 45, 46, 58, 1, 45, 59, 43, 57, 57, 1, 58, 46, 43, 63, 1, 56, 43, 50, 47, 43, 60, 43, 42, 1, 59, 57, 1, 46, 59, 51, 39, 52, 43, 50, 63, 11, 0, 40, 59, 58, 1, 58, 46, 43, 63, 1, 58, 46, 47, 52, 49, 1, 61, 43, 1, 39, 56, 43, 1, 58, 53, 53, 1, 42, 43, 39, 56, 10, 1, 58, 46, 43, 1, 50, 43, 39, 52, 52, 43, 57, 57, 1, 58, 46, 39, 58, 0, 39, 44, 44, 50, 47, 41, 58, 57, 1, 59, 57, 6, 1, 58, 46, 43, 1, 53, 40, 48, 43, 41, 58, 1, 53, 44, 1, 53, 59, 56, 1, 51, 47, 57, 43, 56, 63, 6, 1, 47, 57, 1, 39, 57, 1, 39, 52, 0, 47, 52, 60, 43, 52, 58, 53, 56, 63, 1, 58, 53, 1, 54, 39, 56, 58, 47, 41, 59, 50, 39, 56, 47, 57, 43, 1, 58, 46, 43, 47, 56, 1, 39, 40, 59, 52, 42, 39, 52, 41, 43, 11, 1, 53, 59, 56, 0, 57, 59, 44, 44, 43, 56, 39, 52, 41, 43, 1, 47, 57, 1, 39, 1, 45, 39, 47, 52, 1, 58, 53, 1, 58, 46, 43, 51, 1, 24, 43, 58, 1, 59, 57, 1, 56, 43, 60, 43, 52, 45, 43, 1, 58, 46, 47, 57, 1, 61, 47, 58, 46, 0, 53, 59, 56, 1, 54, 47, 49, 43, 57, 6, 1, 43, 56, 43, 1, 61, 43, 1, 40, 43, 41, 53, 51, 43, 1, 56, 39, 49, 43, 57, 10, 1, 44, 53, 56, 1, 58, 46, 43, 1, 45, 53, 42, 57, 1, 49, 52, 53, 61, 1, 21, 0, 57, 54, 43, 39, 49, 1, 58, 46, 47, 57, 1, 47, 52, 1, 46, 59, 52, 45, 43, 56, 1, 44, 53, 56, 1, 40, 56, 43, 39, 42, 6, 1, 52, 53, 58, 1, 47, 52, 1, 58, 46, 47, 56, 57, 58, 1, 44, 53, 56, 1, 56, 43, 60, 43, 52, 45, 43, 8, 0, 0]
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [16,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [17,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [18,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [19,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [20,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [21,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [22,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [23,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [24,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [27,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [372,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "c:\Users\erik3\OneDrive\Desktop\Beyond-Books\tt.py", line 224, in <module>
    logits, loss = model(xb, yb)
  File "C:\Users\erik3\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\erik3\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "c:\Users\erik3\OneDrive\Desktop\Beyond-Books\tt.py", line 173, in forward
    pos_emb = self.position_embedding_table(torch.arange(T, device=device)) # (T,C)
  File "C:\Users\erik3\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\erik3\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\erik3\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\sparse.py", line 163, in forward
    return F.embedding(
  File "C:\Users\erik3\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\functional.py", line 2264, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Thanks for sharing the code! I’m unable to reproduce it using a recent nightly binary torch==2.4.0.dev20240506+cu124 and can successful run all iterations. Which PyTorch version are you using? If an older one, could you update and rerun your code?

Yes I am not sure how to reproduce it myself, like I said the same code works fine on a different device. I tried with different pytorch versions, previously I used 2.3, and I just tried with 2.4 but still the same issue. Could it be an issue with cuda, or hardware even? would it be worth it to uninstall everything and starting over? If so, how would you recommend I do that?

I assume you are seeing the issue using the posted code snippet or are you only able to reproduce it in the real end2end code?

I am seeing the issue when I run the code snippet I shared, and the original code from the github repo I shared, as well as my own LLM model, with similar code, which also works fine on other devices. What I meant was that I am unsure how I would reproduce the error on a different device as I have seen the code work properly, only now that I am running it on my new laptop, I get this issue.

Adding some additional information. I tried running another python script, which works on other devices just fine, but gives an error on mine. Similar to the above error, each time I run the code, it behaves slightly differently but always results in some CUDA error. This time I ran a convolutional neural network, which again, the code works fine on other devices, and has no nn.embedding at all. I am very confident the code is not the issue, and I tried multiple pytorch versions but same result. Does this narrow down the source of the issue somewhat at least?

Yes, it narrows down the source to your setup and you could try to run a few stress tests to check if your SW/HW setup in this laptop is not working properly.

right, do you have any recommendations, links or ideas I could try?

I don’t have any specific tools in mind and you could search for some stress tests you are comfortable with.