TypeError: forward() missing 1 required positional argument: 'target' when using AdaptiveLogSoftmaxWithLoss

Akib_Sadmanee · July 30, 2020, 6:31am

I am trying to build a next word prediction model with pytorch in google colab. As my vocabulary size is over 1.5 million, I am using AdaptiveLogSoftmaxWithLoss module of pytorch to reduce RAM consumption.

The simple BiLSTM model definition is as follows:

class BLSTM(nn.Module):
    def __init__(self, emb_size, hidden_size, num_layers, vocab_size, cutoffs):
        super(BLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.blstm = nn.LSTM(emb_size, hidden_size, num_layers, batch_first=True, bidirectional=True)
        self.fc = nn.AdaptiveLogSoftmaxWithLoss(hidden_size*2, vocab_size, cutoffs)
        # self.fc = nn.Linear(hidden_size*2, vocab_size)
    
    def forward(self, x):
        # Set initial states
        h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size)
        c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size)

        # Forward propagate LSTM
        embed = nn.Embedding(vocab_size, emb_size)
        out, _ = self.blstm(embed(x), (h0, c0))  # out: tensor of shape (batch_size, seq_length, hidden_size*2)
        
        # Decode the hidden state of the last time step
        out = self.fc(out[:, -1, :])
        
        return out

The model and loss function are called as follows:

model = BLSTM(emb_size, hidden_size, num_layers, vocab_size, cutoffs)

# Loss and optimizer
criterion = nn.NLLLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

Inside the epoch the loss is calculated as follows:

        inputs = x[0].to(device)
        targets = x[1].to(device)

        # Forward pass
        outputs = model(inputs)
        outputs = outputs.to(device)
        loss = criterion(outputs, targets)
        print(loss.item())
        
        #Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

This is the complete error trace I am getting after running the epoch loop:

TypeError                                 Traceback (most recent call last)
<ipython-input-33-51e15380f8c7> in <module>()
      8 
      9         # Forward pass
---> 10         outputs = model(inputs)
     11         outputs = outputs.to(device)
     12         loss = criterion(outputs, targets)

2 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

TypeError: forward() missing 1 required positional argument: 'target'

I tried the same code with a simple nn.Linear() layer and the code runs fine. But when I replace the Linear layer with AdaptiveLogSoftmaxWithLoss, I get the above mentioned error.

Unity05 · August 3, 2020, 9:00am

Hi @Akib_Sadmanee

according to AdaptiveLogSoftmaxWithLoss's name and to its implementation (https://pytorch.org/docs/master/_modules/torch/nn/modules/adaptive.html#AdaptiveLogSoftmaxWithLoss), it also requires the target argument for calculating the loss.

Regards,
Unity05

Akib_Sadmanee · August 3, 2020, 9:43am

Hello @Unity05
Thank you for your reply. Can you help me on how to pass the target. Below is the class description,
torch.nn.AdaptiveLogSoftmaxWithLoss` ( in_features: int, n_classes: int, cutoffs: Sequence[int], div_value: float = 4.0, head_bias: bool = False )
I don’t see any parameter that takes in the targets tensor.

Unity05 · August 3, 2020, 9:49am

Hi @Akib_Sadmanee,

Indeed, the constructor doesn’t take target as an argument.

def __init__(
        self,
        in_features: int,
        n_classes: int,
        cutoffs: Sequence[int],
        div_value: float = 4.,
        head_bias: bool = False
    ) -> None:

Therefore, your initialization is right.

However, when using AdaptiveLogSoftmaxWithLoss's forward() method (what you do in your own forward() method) you can see by checking its implementation that it expects target as an argument.

 def forward(self, input: Tensor, target: Tensor) -> _ASMoutput:

I hope this helped.

Regards,
Unity05

vdw · August 3, 2020, 9:54am

AdaptiveLogSoftmaxWithLoss returns a loss, so you do not need another NLLoss. I never used that one, but I assume you need to change your forward() method to

def forward(self, x, targets):
        # Set initial states
        h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size)
        c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size)

        # Forward propagate LSTM
        embed = nn.Embedding(vocab_size, emb_size)
        out, _ = self.blstm(embed(x), (h0, c0))  # out: tensor of shape (batch_size, seq_length, hidden_size*2)
        
        # Decode the hidden state of the last time step
        out = self.fc(out[:, -1, :], targets)       
        return out

Note that out is

NamedTuple`` with ``output`` and ``loss`` fields:
            * **output** is a Tensor of size ``N`` containing computed target
              log probabilities for each example
            * **loss** is a Scalar representing the computed negative
              log likelihood loss

And you need to change your training loop to

# Forward pass
outputs = model(inputs, targets)
outputs = outputs.to(device) # the loss is "somehwere" in the outputs tuple
#loss = criterion(outputs, targets)
#print(loss.item())

Akib_Sadmanee · August 3, 2020, 10:16am

Thanks @vdw. It’s working now. For clarification,
I didn’t find any optimizer in the AdaptiveLogSoftmaxWithLoss. I still need to run the optimizer, right?
And for backpropagation, I assume now I need to call outputs.loss.backward().

Akib_Sadmanee · August 3, 2020, 10:19am

Thanks. This solution works. I tried the sample mentioned above by vdw which does the same thing you mentioned.