I have a 3D tensor of names that comes out of an LSTM that’s (batch size x name length x embedding size)
I’ve been reshaping it to a 2D to put it through a linear layer, because linear layer requires (batch size, linear dimension size) by using the following
y0 = output.contiguous().view(-1, output.size(-1))
this converts outputs to (batchsize * name length, number of possible characters)
then once I put it through the linear layer let’s call the output y0 I do this
y = y0.contiguous().view(output.size(0), -1, y0.size(-1))
But I’m not really sure if the cells of y are correlated properly with the cells of output and I worry this is messing up my learning, because batch size of 1 is actually generating proper names and any larger batch size is generating nonsense.
So what I mean exactly is
outputs = (batch size * name length, embed size)
y = (batch size * name length, number of possible characters)
I need to make sure y[i,j,:] is the linear transformed output of outputs[i,j,:]
The target tensors is of (name length x correct character index) because I’m using cross entropy. So I need to ensure that every fiber of y correlates to the same index as output.