Loss.backward() - IndexError: select(): index 1 out of range for tensor of size [1, 32, 100] at dimension 0

I have created the following NN using PyTorch API (for NLP Multi-class Classification)

class MultiClassClassifer(nn.Module):
  #define all the layers used in model
  def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
    
    #Constructor
    super(MultiClassClassifer, self).__init__()

    #embedding layer
    self.embedding = nn.Embedding(vocab_size, embedding_dim)

    #dense layer
    self.hiddenLayer = nn.Linear(embedding_dim, hidden_dim)

    #Batch normalization layer
    self.batchnorm = nn.BatchNorm1d(hidden_dim)

    #output layer
    self.output = nn.Linear(hidden_dim, output_dim)

    #activation layer
    self.act = nn.Softmax(dim=1) #2d-tensor

    #initialize weights of embedding layer
    self.init_weights()

  def init_weights(self):

    initrange = 1.0
    
    self.embedding.weight.data.uniform_(-initrange, initrange)
  
  def forward(self, text, text_lengths):

    embedded = self.embedding(text)
    embedded = torch.mean(embedded, dim=1, keepdim=True)
    print(embedded.shape)

    #packed sequence
    packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths, batch_first=True)

    tensor, batch_size = packed_embedded[0], packed_embedded[1]
    print(tensor.shape)

    hidden_1 = self.batchnorm(self.hiddenLayer(tensor))
    print(hidden_1.shape)

    output = self.act(self.output(hidden_1))
    print(output.shape)

    return output

Instantiating the model

INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 64
OUTPUT_DIM = 3

model = MultiClassClassifer(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM)

To train the model I have created the following train() method:

def train(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    
    model.train()
    
    for batch in iterator:
        
        optimizer.zero_grad()

        text, text_lengths = batch.text_normalized_tweet
                
        predictions = model(text, text_lengths)
        
        loss = criterion(predictions, batch.label)
        print(loss)
        print(loss.item())

        loss.backward() #exception occurs here
      
        optimizer.step()
        
        epoch_loss += loss.item()
  
    return epoch_loss / len(iterator)

I have created the following NN using PyTorch API (for NLP Multi-class Classification)

class MultiClassClassifer(nn.Module):
  #define all the layers used in model
  def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
    
    #Constructor
    super(MultiClassClassifer, self).__init__()

    #embedding layer
    self.embedding = nn.Embedding(vocab_size, embedding_dim)

    #dense layer
    self.hiddenLayer = nn.Linear(embedding_dim, hidden_dim)

    #Batch normalization layer
    self.batchnorm = nn.BatchNorm1d(hidden_dim)

    #output layer
    self.output = nn.Linear(hidden_dim, output_dim)

    #activation layer
    self.act = nn.Softmax(dim=1) #2d-tensor

    #initialize weights of embedding layer
    self.init_weights()

  def init_weights(self):

    initrange = 1.0
    
    self.embedding.weight.data.uniform_(-initrange, initrange)
  
  def forward(self, text, text_lengths):

    embedded = self.embedding(text)
    embedded = torch.mean(embedded, dim=1, keepdim=True)
    print(embedded.shape)

    #packed sequence
    packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths, batch_first=True)

    tensor, batch_size = packed_embedded[0], packed_embedded[1]
    print(tensor.shape)

    hidden_1 = self.batchnorm(self.hiddenLayer(tensor))
    print(hidden_1.shape)

    output = self.act(self.output(hidden_1))
    print(output.shape)

    return output

Instantiating the model

INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 64
OUTPUT_DIM = 3

model = MultiClassClassifer(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM)

To train the model I have created the following train() method:

def train(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    
    model.train()
    
    for batch in iterator:
        
        optimizer.zero_grad()

        text, text_lengths = batch.text_normalized_tweet
                
        predictions = model(text, text_lengths)
        
        loss = criterion(predictions, batch.label)
        print(loss)
        print(loss.item())

        loss.backward() #exception occurs here
      
        optimizer.step()
        
        epoch_loss += loss.item()
  
    return epoch_loss / len(iterator)

However, when I start the first epoch of the training, during the loss.backward() step I receive the following error

IndexError: select(): index 1 out of range for tensor of size [1, 32, 100] at dimension 0

Here is the output of the print() functions

torch.Size([32, 1, 100])
torch.Size([32, 100])
torch.Size([32, 64])
torch.Size([32, 3])
tensor(1.1309, grad_fn=<NllLossBackward0>)
1.1309444904327393

Does the error have to do with the nn.Embedding layer of my network? And if so, how can I fix this exception.

You may find here my Golab Notebook.

hi,
i think you have problem with this line.

i managed to reproduce the error with following code.

import torch
import torch.nn as nn

total = 256 * 56 * 4 
x = torch.arange(0, total).view(256,56 , 4).float()
x.requires_grad = True
o= nn.utils.rnn.pack_padded_sequence(x, torch.tensor(256*[57]), batch_first=True)
o[0].mean().backward()

if you place any number greater than 56 in lengths of each batch element, it raises an error.

1 Like

Indeed since you reproduced the error the rnn.pack_padded_sequece() has the problem.

The case is that I want to train my own embeddings that’s why I used the embedding layer. Afterwards, train a second algorithm using pre-trained glove embeddings. To train the customer embeddings with nn.Embedding layer is it necessary to use the rnn.pack_padded_sequence() method or is there any alternative?

¯\_(ツ)_/ ¯ i know nothing about it

It’s Ok, I hope some1 will find a workareound. Because neither do I know how to debug this. I am coming from Tensorflow and Keras, so I am totally newb with PyTorch

1 Like

mMagmer do you know the solution to the error raised for the o[0].mean().backward()?
Since you find it out maybe you know also a workaround. Appreciate your help :slight_smile:

i think you’re not suppose to have any length greater than original sequence.
57 -->55

import torch
import torch.nn as nn

total = 256 * 56 * 4 
x = torch.arange(0, total).view(256,56 , 4).float()
x.requires_grad = True
o= nn.utils.rnn.pack_padded_sequence(x, torch.tensor(256*[55]), batch_first=True)
o[0].mean().backward()

The error, at least from

is pretty clear. pack_padded_sequence takes two args: input , lengths . first is BxTxH (because you are doing batch_first=True), the second is [length]*B, i.e., a B dim vector. If you are asking it to use batchsize 57 when the real batchsize is 56.

This line is weird. Do you really want to mean over the tokens (dim=1) before putting it into an RNN?

In any case,

is going to be wrong, because batch_first means from 32x1x100 you have a batch of size 1, so the second args should be a vector of dim 1. and I can expect that text_lengths does not have that dimension.

mMagmer, How can I specify the value of torch.tensor(256*[55]) because in my case I use the text_lengths which is a fixed number per batch iteration. For example, rnn.pack_padded_sequence(32, 13,…) (batch 1), rnn.pack_padded_sequence(32, 7,…) (batch 2). As you can see each batch has a different length of tokens per sentence.

i’m not an expert in nlp .
but based on name of function pack_padded_sequence , i think you should first pad the sequence to have the same length.
see this link.
the input sequence of pack_padded_sequence is padded before that.
again i know nothing about nlp.

Hmm isn’t that strange?..I mean I have a training iterator that has batches of different shape. First batch has shape [32, 13], next batch has shape [32, 7], next batch has shape [32,11] etc. So based on the solutions I followed that use rnn.pack_padded_sequence(), the packed_embedded was a tensor with [32x11,100], [32x13,100], [32x7, 100]…etc So, I have used the .mean() method to bring the tensor dimensions down to 32x1 (averaged). This is the solution I followed. Do you know any alternative solution to apply multi-class text classification with my own embeddings? Would appreciate your help.

@sagsriv
Do you have any solution to this one

packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths, batch_first=True)

, based on my implementation? I would really appreciate any help :slight_smile:

@sagsriv This is not true…batch_first=False will throw the error you describe. With batch_first=True the nn will understand that batch_size=32 and not 1.