I was greeted by this error when I tried to run the training, from this line:
outputs = model(inputIDs, labels=labels)
This is the traceback:
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [162,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1289: block: [165,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
File "c:\Users\chenp\Documents\AICrowd\amazon-kdd-cup-2024-starter-kit\models\retrievalModel.py", line 65, in <module>
outputs = model(inputIDs, labels=labels)
File "C:\Users\chenp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\chenp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\chenp\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 1305, in forward
transformer_outputs = self.transformer(
File "C:\Users\chenp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\chenp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\chenp\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 1070, in forward
hidden_states = inputs_embeds + position_embeds
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
The JSON files:
{"output_field": "XXXX"}
{"output_field": "YYYY"}
{"output_field": "ZZZZ"}
{"input_field": "XXXX"}
{"input_field": "YYYY"}
{"input_field": "ZZZZ"}
My code below:
from typing import List, Union
import random
import os
import transformers as trf
import trl as tr
import datasets as ds
import pandas as pd
import torch
from tqdm import tqdm
import torch.nn as nn
import torch.optim as optim
# Files format:
# {"input_field": "XXXX"}
# {"input_field": "YYYY"}
# {"input_field": "ZZZZ"}
# {"output_field": "XXXX"}
# {"output_field": "YYYY"}
# {"output_field": "ZZZZ"}
trainFName="development.json"
trainDF=pd.read_json(trainFName, lines=True)
evalFName="eval.json"
evalDF=pd.read_json(evalFName, lines=True)
inputFieldDF=trainDF["input_field"]
outputFieldDF=evalDF["output_field"]
trainData=inputFieldDF.tolist()
evalData=inputFieldDF.tolist()
model_name="./models/distilgpt2"
config=trf.AutoConfig.from_pretrained(model_name+"/config.json")
tokenizer = trf.GPT2Tokenizer.from_pretrained(model_name)
if tokenizer.pad_token is None:
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
model = trf.GPT2LMHeadModel.from_pretrained(model_name, config=config)
def tokenize_data(trainData):
max_length = 512 # Adjust as needed
return tokenizer(trainData, padding="max_length", truncation=True, max_length=max_length, return_tensors='pt')
trainEncoded = tokenize_data(trainData)
evalEncoded = tokenize_data(evalData)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
# Configure training
epochs = 4
batchSize = 2 # Adjust based on your GPU memory
model.train()
for epoch in range(epochs):
# Training loop
for i in range(0, len(trainEncoded['input_ids']), batchSize):
inputIDs = trainEncoded['input_ids'][i:i+batchSize].to(device)
labels = trainEncoded['input_ids'][i:i+batchSize].to(device)
optimizer.zero_grad()
outputs = model(inputIDs, labels=labels)
loss = outputs.loss
loss.backward()
optimizer.step()
if i % 10 == 0: # Adjust print frequency based on dataset size
print(f"Epoch: {epoch}, Step: {i}, Loss: {loss.item()}")
# Evaluation loop
model.eval()
eval_loss = 0
eval_steps = 0
for i in range(0, len(evalEncoded['input_ids']), batchSize):
with torch.no_grad():
inputIDs = evalEncoded['input_ids'][i:i+batchSize].to(device)
labels = evalEncoded['input_ids'][i:i+batchSize].to(device)
outputs = model(inputIDs, labels=labels)
eval_loss += outputs.loss.item()
eval_steps += 1
avg_eval_loss = eval_loss / eval_steps
print(f"Epoch: {epoch}, Evaluation Loss: {avg_eval_loss}")
How exactly can I fix this?