I started learning nlp on my own.Intially i started with movie review sentiment classification and it is working fine.Next I started working on text generation from shakespeare text.But it is not training at all.For every epoch i am printing the predicted results and they are same everytime.
All about dimensions
I took an entire corpus and divided into a fixed length of say 100.Each character is replaced by it’s index in dictionary.So input and target looks like this
input:
[38, 32, 38, 50, 56, 28, 5, 41, 38, 5, 67, 52, 33, 4, 32, 67, 52, 67, 8, 28, 62, 66, 38, 59, 71, 59, 50, 28, 31, 44, 27, 28, 66, 56, 15, 50, 50, 28, 28, 38, 50, 28, 53, 80, 37, 57, 1, 28, 38, 42]
target:
[32, 38, 50, 56, 28, 5, 41, 38, 5, 67, 52, 33, 4, 32, 67, 52, 67, 8, 28, 62, 66, 38, 59, 71, 59, 50, 28, 31, 44, 27, 28, 66, 56, 15, 50, 50, 28, 28, 38, 50, 28, 53, 80, 37, 57, 1, 28, 38, 42, 82]
let us say batch size is32
let us say fixed length is 100
input to model:[32,100]
For enabling to use batches we have to take respective words of each sentence into a row
So applying transpose
then shape becomes [100,32]
Then sending it to nn.Embedding layer(output shape now becomes [100,32,emb_dim]) and again sending it to lstm layer(output becomes [100,32,hidden_dim]).
If you have doubt plz take a look at my model.
Model implementation
class CharRNN(nn.Module):
def __init__(self,n_hidden=50,n_layers=2,drop_prob=0.5,vocab_dim=50,emb_size=50,batch_size=64,device="cuda"):
super().__init__()
self.drop_prob = drop_prob
self.n_layers = n_layers
self.n_hidden = n_hidden
self.batch_size=batch_size
self.device=device
self.embedding=nn.Embedding(vocab_dim,emb_size)
self.lstm=nn.LSTM(emb_size,n_hidden,n_layers,dropout=self.drop_prob)
def init_hidden(self):
"""Set initial hidden states."""
h0 = torch.zeros(
self.n_layers,
self.batch_size,
self.n_hidden,
)
c0 = torch.zeros(
self.n_layers,
self.batch_size,
self.n_hidden,
)
h0 = h0.to(self.device)
c0 = c0.to(self.device)
return h0, c0
def apply_rnn(self, embedding_out):
activations, (hn,cn) = self.lstm(embedding_out, self.init_hidden())
return activations
def forward(self, inputs, return_activations=False):
self.batch_size = len(inputs)
inputs = torch.LongTensor(inputs).to(self.device)
inputs=inputs.transpose(0,1)
# Get embeddings
embedding_out = self.embedding(inputs)
activations = self.apply_rnn(embedding_out)
out = torch.sigmoid(activations)
# Put the output back in correct order
return out
Now out output is of shape [100,32,emb_dim].Now to apply cross entropy initally my target is of shape [32,100].So firstly i have to transpose it again and permute it.So after transposing shape becomes [32,100,emb_dim] and after permuting the shape becomes [32,emb_dim,100].After that i am finding loss.
Here emb_dim is the output produced and i am using it as one hot vector by sending it through sigmid and finding most probable word in it.
Here is my training code
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(
filter(lambda p: p.requires_grad, model.parameters()),
lr=0.1,
)
def train_epoch(model, optimizer, train_loader):
model.train()
total_loss = total = 0
progress_bar = tqdm_notebook(train_loader, desc='Training', leave=False)
for inputs,targets in progress_bar:
targets=torch.LongTensor(targets)
target = targets.to(device) # targets shape[batch_size,fixed_length]
#target=target.transpose(0,1)
# Clean old gradients
optimizer.zero_grad()
# Forwards pass
output = model(inputs) # inputs shape [batch_size,fixed_length] outputs shape [fixed_length,batch_size,hidden_dim]
outputs=output.transpose(0,1) # outputs shape [batch_size,fixed_length,hidden_dim]
outputs=outputs.permute(0,2,1 # outputs shape [batch_size,hidden_dim,fixed_length])
loss = nn.CrossEntropyLoss()(outputs, target)
# Perform gradient descent, backwards pass
loss.backward()
# Take a step in the right direction
optimizer.step()
#scheduler.step()
# Record metrics
total_loss += loss.item()
total += len(target)
progress_bar.set_description(
f'train_loss: {loss:.2e}'
f'\tavg_loss: {total_loss/total:.2e}\n',
)
return total_loss / total
I also printed results during my 9th and 10th epoch.Plz take a look at them and see what went wrong
predicted_text=
c&aeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
Parameter containing:
tensor([[-0.0848, 1.7742, -0.7627, ..., 0.2818, 0.3630, 0.6283],
[ 0.1355, -0.4772, 0.0499, ..., 0.6572, -1.6990, 0.6295],
[ 1.3463, -1.8206, -0.1466, ..., -0.4201, 1.1724, -1.2711],
...,
[ 0.7217, 0.2917, -0.4138, ..., -0.9130, -0.3257, 0.7373],
[ 1.9201, 1.0811, 0.0864, ..., -1.5404, -0.4448, -1.3606],
[-0.3362, 0.4130, 0.4206, ..., 1.8701, 1.0428, -0.9026]],
device='cuda:0', requires_grad=True)
Gradient containing:
tensor([[-6.0252e-05, 2.3833e-05, -9.2437e-05, ..., -4.4102e-05,
6.2251e-05, 3.9563e-05],
[-1.2471e-04, 1.6429e-04, -1.6832e-06, ..., -4.8308e-05,
2.9626e-05, 6.5522e-05],
[-2.6410e-05, 6.6710e-05, -7.8888e-05, ..., -1.4033e-05,
-2.6348e-05, 2.2501e-05],
...,
[-8.3676e-06, -4.0126e-05, -5.0642e-05, ..., 1.2275e-05,
-3.3493e-05, 1.0394e-06],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]], device='cuda:0')
epoch # 9 train_loss: 1.33e-01 valid_loss: 2.17e-01
predicted_text=
c&aeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
Parameter containing:
tensor([[-0.0848, 1.7740, -0.7626, ..., 0.2818, 0.3629, 0.6283],
[ 0.1355, -0.4778, 0.0503, ..., 0.6571, -1.6992, 0.6297],
[ 1.3463, -1.8207, -0.1465, ..., -0.4201, 1.1724, -1.2711],
...,
[ 0.7217, 0.2913, -0.4136, ..., -0.9129, -0.3258, 0.7374],
[ 1.9201, 1.0810, 0.0864, ..., -1.5403, -0.4448, -1.3605],
[-0.3362, 0.4130, 0.4206, ..., 1.8701, 1.0428, -0.9026]],
device='cuda:0', requires_grad=True)
Gradient Containing:
tensor([[-5.8078e-05, 3.2291e-05, -8.6524e-05, ..., -4.5590e-05,
7.4338e-05, 3.1892e-05],
[-1.0505e-04, 1.4412e-04, 7.3077e-06, ..., -4.6723e-05,
3.1852e-05, 5.1921e-05],
[-2.7566e-05, 6.2100e-05, -7.5454e-05, ..., -1.2124e-05,
-2.9602e-05, 2.2030e-05],
...,
[-7.7157e-06, -3.8574e-05, -4.5693e-05, ..., 1.3392e-05,
-3.1313e-05, -9.7651e-07],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]], device='cuda:0')
epoch # 10 train_loss: 1.33e-01 valid_loss: 2.16e-01
Thank you.