I’m trying to understand how/where target indexes are supposed to get converted, both in the context of a language model and in the context of embeddings for categorical variables in general. To my understanding the dataloaders of language models are passing an integer that is the index within the embedding layer for the target word, but I haven’t been able to figure out where/if the actual embedding being called anywhere for the target.
How/where does the target get transformed? The crit function (ostensibly F.cross_entropy) can’t take the index of a word and compare that to the output of the model? It seems like there’s some sort of implicit conversion to the vector or a one-hot encoding going on here, but I don’t see where that’s happening in the code.
I’m guessing I’m missing a key component here. Should the forward be returning the detached output of the target embedding as well, and if weights are tied that goes through the transposed embedding weights and a softmax to predict the word/class?
I feel like that’s the way to do it but I haven’t seen any examples that do it that way and I don’t understand how the target in other examples is transforming from an index to something that can be compared with the model output.