If you don’t need to preprocess the data, you could directly push the tensors to the GPU inside your Dataset or you could create the batch directly from the data via indexing.
The loops might slow down your code. Getting rid of it depends of course on your use case and I don’t know, how embeddings is defined and if you could use a single layer for it.
I cannot get rid of the for-loops because I have different sized one-hot categoricals. (I explain in another post why I need one-hot and not Longs.) One might have 10 categories, one might have 20, so they each have different embedding matrices.
I am concerned that with indexes so small, that data locality will be lost and that I should pre-splice the matrix by categorical so they each are in their own matrix.