IndexError: Target is out of bounds in cross_entropy

Hello, I’m a newbie to ML and I’m training a causal transformer and am trying to compute the cross entropy loss between the logits and the target. However, I am receiving the error


IndexError Traceback (most recent call last)
Cell In[15], line 38
35 print(target.shape)
37 # IndexError: Target 1476 is out of bounds.
—> 38 torch.nn.functional.cross_entropy(logits, target, ignore_index=len(logits)-1)

File /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:3014, in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
3012 if size_average is not None or reduce is not None:
3013 reduction = _Reduction.legacy_get_string(size_average, reduce)
→ 3014 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)

IndexError: Target 1476 is out of bounds.

Not sure where exactly 1476 the number 1476 is coming from. One guess is that it may one of the token ids (the vocab_size of the T5 tokenizer is in the tens of thousands). But in that case I am unsure why my last dimension is 1024 as opposed to the vocab size.

This is the code which I am running

from transformers import T5Tokenizer, T5Model, AdamW

CONTEXT_LENGTH = 512

tokenizer = T5Tokenizer.from_pretrained("t5-large", model_max_length=CONTEXT_LENGTH)
model = T5Model.from_pretrained("t5-large")

model.train()
optimizer = AdamW(model.parameters(), lr=1e-5)

# Specifying where exactly we apply weight decay
no_decay = ["bias", "LayerNorm.weight"]

optimizer_grouped_parameters = [
    {'params': [param for name, param in model.named_parameters() if not any(nd in name for nd in no_decay)], 'weight_decay': 0.01},
    {'params': [param for name, param in model.named_parameters() if any(nd in name for nd in no_decay)], 'weight_decay': 0.0}
]
optimizer = AdamW(optimizer_grouped_parameters, lr=1e-5)

import torch

# TODO: Figure out weird tokenization errors w/ \, later
dummy_test = f'Q: A board game spinner is divided into three parts labeled $A$, $B$  and $C$. The probability of the spinner landing on $A$ is $frac13$ and the probability of the spinner landing on $B$ is $frac512$.  What is the probability of the spinner landing on $C$? Express your answer as a common fraction. A: The spinner is guaranteed to land on exactly one of the three regions, so we know that the sum of the probabilities of it landing in each region will be 1. If we let the probability of it landing in region $C$ be $x$, we then have the equation $1 = frac512+frac13+x$, from which we have $x=boxedfrac14$.'
#dummy_test = 'Hello World'

def do_we_need_to_batch(string):
    input_ids = tokenizer(string, truncation=False).input_ids
    return len(input_ids) >= CONTEXT_LENGTH

tokenized_string = tokenizer(dummy_test, truncation=False).input_ids

input = tokenized_string[:-1]
input = tokenizer.decode(input, truncation=False)
input = tokenizer(input, return_tensors='pt').input_ids

target = tokenized_string[1:]
target = tokenizer.decode(target, truncation=False)
target = tokenizer(target, return_tensors='pt').input_ids

logits = model(input_ids=input, decoder_input_ids=input).last_hidden_state

print(logits.shape)
print(target.shape)

B, T, C = logits.shape
logits = logits.view(B*T, C)
target = target.view(B*T)

print(logits.shape)
print(target.shape)

# IndexError: Target 1476 is out of bounds.
torch.nn.functional.cross_entropy(logits, target, ignore_index=len(logits)-1)

last_hidden_state is not the logits tensor. It’s the last hidden state before applying the unembedding matrix. However, T5Model does not have an unembedding matrix available. Switched over to using T5ForConditionalGeneration and everything just sort of works.