Question about transformation of the target from embedding index to vector / one hot array

I’m trying to understand how/where target indexes are supposed to get converted, both in the context of a language model and in the context of embeddings for categorical variables in general. To my understanding the dataloaders of language models are passing an integer that is the index within the embedding layer for the target word, but I haven’t been able to figure out where/if the actual embedding being called anywhere for the target.

How/where does the target get transformed? The crit function (ostensibly F.cross_entropy) can’t take the index of a word and compare that to the output of the model? It seems like there’s some sort of implicit conversion to the vector or a one-hot encoding going on here, but I don’t see where that’s happening in the code.

I’m guessing I’m missing a key component here. Should the forward be returning the detached output of the target embedding as well, and if weights are tied that goes through the transposed embedding weights and a softmax to predict the word/class?

I feel like that’s the way to do it but I haven’t seen any examples that do it that way and I don’t understand how the target in other examples is transforming from an index to something that can be compared with the model output.

CrossEntropyLoss calculates the loss using the index targets directly. It doesn’t convert them to one-hot encoding. This saves space and is a bit more efficient.

Interesting. Thanks for the insight.

So if I want to use this for multiple embeddings together I’d need to write a custom loss function that handles this then? Makes sense. I’ll take a look at the implementation of cross_entropy.