Hey! Thanks so much for your response. I super appreciate it. I’m not sure I explained my model well, so I apologize for any confusion.

I’m building a word-level LSTM RNN for text generation (I’ve already built the char-level and am hoping to compare the convergence rate of the two).

The corpus is so large, though, it’s computationally impossible to run the word-level on my GPU using one-hot vector encoding (the GPU crashes). I’d like to use nn.embedding() to create dense vectors with a fixed, reasonable dimension (I randomly picked 1x100), which makes it possible to run the model.

I’m having problems when I get to back-prop, though. That’s where I’m getting stuck in my understanding of what’s going on.

It’s my understanding if I’m feeding in a sequence of word-embedded vectors for each word, the loss function will compare the model’s output prediction with the word-embedded target, then I’ll call `loss.backward`

for back-prop. Does that make it more clear why I’m using word-embedding for the target? Eventually, when sampling, I’ll convert the model’s word-embedding vector prediction to the word it represents.

Please let me know if this clears anything up. Again, I’m super grateful for your help with this.