When and why do we use Contiguous()?

Shisho_Sama · June 11, 2019, 5:10am

I have seen codes such as this:

class lstm_char(nn.Module):
    def __init__(self, unique_input_tokens, inputsize, hiddensize, outputsize, num_layer, dropout_prob=0.2):
        super().__init__()

        self.hiddensize = hiddensize
        self.num_layer = num_layer
        self.int2char = dict(enumerate(unique_input_tokens))
        self.char2int = {char:int for (int, char) in int2char.items()}

        self.lstm = nn.LSTM(input_size=inputsize,
                             hidden_size = hiddensize,
                             num_layers = num_layer,
                             batch_first=True, #since we are using batches of data we set this to true!
                             dropout=dropout_prob)

        self.dropout = nn.Dropout2d(dropout_prob)
        self.fc = nn.Linear(hiddensize, outputsize)
    
    def forward(self, x, hidden):
        """
        hidden contains hiddenstate and cellstate
        """
        output, hidden = self.lstm(x, hidden)

        output = self.dropout(output)
        output = output.contiguous().view(-1, self.hiddensize)
        output = self.fc(output)
        return output, hidden

But why would someone want to that?! What would they gain by making for example, a tensor of shape (3,5, 83) which is (batch, sequences, features) to a tensor of shape (15, 83) ?

Wouldn’t we want the output be (3,5,83)? As we have 3 input samples, we need to have 3 outputs as well!?
Can someone please explain to me what the coder had in mind by using contiguous() prior to using fc layer?
Thank you very much in advance

jerinphilip · June 11, 2019, 5:27am

If I remember correctly, these typically used to happen with old codes. Earlier nn.Linear perhaps didn’t support multidimensional inputs (*,*, H). So a hack was to convert (T, B, H) to (TxB, H) to pass through the Linear layer. It doesn’t make a difference to Linear so long as the final dimension was (*, H) in the computation and backprop as due to PyTorch’s dynamic graph. Frameworks like Keras had a TimeDistributedDense for this case. In this case, after linear operations, you could get back the (T, B, H) with another view(...).

nn.Linear docs indicate it can take multi-dimensional inputs now and handle them. So the contiguous().view(...) is not required now. Your example however may have code later on which works with the dimensions output with this model.

And as for contiguous(..), it’s typically called because most cases view(...) would throw an error if contiguous(..) isn’t called before. Normally some changes like view(..), transpose(...) or permute(..) would just change the metadata (being lazy) and not the underlying storage. This create issues with parallel computations. Inorder to consolidate it into a contiguous memory as expected by other ops, contiguous() is called.

Shisho_Sama · June 11, 2019, 5:32am

Thank you very much sir
It all makes sense now, its greatly appreciated