Hi, I am getting this error and I have looked at other forums but nothing has worked, not sure why this is happening in the first place.
Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
My code is as follows :
class LSTM(nn.Module):
def init(self, embedding_dim, hidden_dim):
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(train_loader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
running_loss = 0.0
The above answer is right, I think you want to use the last hidden layer to do this, see the code below. Also, you might want to do gradient clipping before you call optimizer.step().
Actually maybe not but my interpretation was that you want to get a representation for the text (so use the last hidden state to represent the entire text you want to predict the sentiment of or something else). The output is for each state (time step). Iām guessing you donāt want to predict for each state unless you have a language model, just per sentence or text, so use the last hidden state as the representation. No? He could also combine the hidden states in some way and then also feed to the forward layer but unsure.
Aha right. So this line: lstm_out, (ht, ct) = self.lstm(src) returns the full sequence AND the last hidden and context states, per batch. If itās batch first you have lstm_out[:, -1, :] == ht. So, Iām thinking he wants to use the last stepās hidden. PyTorch always returns (all sequence of hidden, (last hidden, last context)) ⦠batch_first controls if you get N X L X D or L X N X D where L is the length in time and N is the batch size.
Hi, could you please explain this? You said I should use the output of the last time step, right now I am just using the entire lstm_out as the input to my linear layer. Should I be using something different? How would this change if bidirectional = True? Thank you all for youāre helpful replies!
This is tricky and you are using bidirectional=False (the default) but here is my info on this. This is the resource: LSTM ā PyTorch 1.13 documentation and you can look here: āFor bidirectional LSTMs, h_n is not equivalent to the last element of output; the former contains the final forward and reverse hidden states, while the latter contains the final forward hidden state and the initial reverse hidden state.ā
Here is an example.
Imagine no batches, so dimensions are L X D for everything.
Basically, imagine you do sentiment analysis and you have a sentence that tokenizes to [1,2, 3, 4] and you embed this into a 4 X 128 vector. If you want to do a forward RNN, you should use the last hidden layer. This is hidden or output[-1, :].
If you use a bidirectional RNN, youād probably want to feed hidden and NOT output to your softmax. output in this case has [(h1_forward, h1_backward), (h2_forward, h2_backward), (h3_forward, h3_backward), (h4_forward, h4_backward)]. But, the backward RNN starts at step 4 so (h4_forward, h4_backward) containās the forward RNNās encoding but the second element has almost no information (yet) as far as the backward RNN is concerned.
An āencodingā of the sentence is thus (h4_forward, h1_backward), the result of the forward RNNās pass and the backward RNNās pass. This is probably what youād like to feed to the classifier head.
I.e., When you have bidirectional RNN, the last output IS NOT the hidden state returned.