Hello, I’m a newbie to ML and I’m training a causal transformer and am trying to compute the cross entropy loss between the logits and the target. However, I am receiving the error
IndexError Traceback (most recent call last)
Cell In[15], line 38
35 print(target.shape)
37 # IndexError: Target 1476 is out of bounds.
—> 38 torch.nn.functional.cross_entropy(logits, target, ignore_index=len(logits)-1)File /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:3014, in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
3012 if size_average is not None or reduce is not None:
3013 reduction = _Reduction.legacy_get_string(size_average, reduce)
→ 3014 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)IndexError: Target 1476 is out of bounds.
Not sure where exactly 1476 the number 1476 is coming from. One guess is that it may one of the token ids (the vocab_size of the T5 tokenizer is in the tens of thousands). But in that case I am unsure why my last dimension is 1024 as opposed to the vocab size.
This is the code which I am running
from transformers import T5Tokenizer, T5Model, AdamW
CONTEXT_LENGTH = 512
tokenizer = T5Tokenizer.from_pretrained("t5-large", model_max_length=CONTEXT_LENGTH)
model = T5Model.from_pretrained("t5-large")
model.train()
optimizer = AdamW(model.parameters(), lr=1e-5)
# Specifying where exactly we apply weight decay
no_decay = ["bias", "LayerNorm.weight"]
optimizer_grouped_parameters = [
{'params': [param for name, param in model.named_parameters() if not any(nd in name for nd in no_decay)], 'weight_decay': 0.01},
{'params': [param for name, param in model.named_parameters() if any(nd in name for nd in no_decay)], 'weight_decay': 0.0}
]
optimizer = AdamW(optimizer_grouped_parameters, lr=1e-5)
import torch
# TODO: Figure out weird tokenization errors w/ \, later
dummy_test = f'Q: A board game spinner is divided into three parts labeled $A$, $B$ and $C$. The probability of the spinner landing on $A$ is $frac13$ and the probability of the spinner landing on $B$ is $frac512$. What is the probability of the spinner landing on $C$? Express your answer as a common fraction. A: The spinner is guaranteed to land on exactly one of the three regions, so we know that the sum of the probabilities of it landing in each region will be 1. If we let the probability of it landing in region $C$ be $x$, we then have the equation $1 = frac512+frac13+x$, from which we have $x=boxedfrac14$.'
#dummy_test = 'Hello World'
def do_we_need_to_batch(string):
input_ids = tokenizer(string, truncation=False).input_ids
return len(input_ids) >= CONTEXT_LENGTH
tokenized_string = tokenizer(dummy_test, truncation=False).input_ids
input = tokenized_string[:-1]
input = tokenizer.decode(input, truncation=False)
input = tokenizer(input, return_tensors='pt').input_ids
target = tokenized_string[1:]
target = tokenizer.decode(target, truncation=False)
target = tokenizer(target, return_tensors='pt').input_ids
logits = model(input_ids=input, decoder_input_ids=input).last_hidden_state
print(logits.shape)
print(target.shape)
B, T, C = logits.shape
logits = logits.view(B*T, C)
target = target.view(B*T)
print(logits.shape)
print(target.shape)
# IndexError: Target 1476 is out of bounds.
torch.nn.functional.cross_entropy(logits, target, ignore_index=len(logits)-1)