RNN: for many to many classification task

pp18 · March 25, 2018, 5:14pm

I could not find anywhere how to perform many-to-many classification task in pytorch.
To give details I have a time-series sequence where each timestep is labeled either 0 or 1. For example, if I have input size of [256x64x4]: 256: Batch size, 64: Sequence-length, 4: Feature size (Assume that data is structured batch-first) then the output size is [256x64x1]. I have written the following code for forward-propagation (not sure If I am procedding fine).

class RNNS2S(nn.Module):
    """docstring for RNNS2S"""

    def __init__(self, input_dimensions=4, hidden_dimensions=512, num_classes=2, num_layers=2,
                 dropout=0.0, batch_first=True, bidirectional=False, seq_len=64):
        super(RNNS2S, self).__init__()
        self.input_dimensions = input_dimensions
        self.hidden_dimensions = hidden_dimensions

        self.num_classes = num_classes

        self.num_layers = num_layers
        self.dropout = dropout

        self.batch_first = batch_first

        self.bidirectional = bidirectional
        self.seq_len = seq_len

        # GRU Layer
        self.gru = nn.GRU(input_size=self.input_dimensions, hidden_size=self.hidden_dimensions,
                          num_layers=self.num_layers, batch_first=self.batch_first,
                          dropout=self.dropout, bidirectional=self.bidirectional)


        # Fully connected for full seq output i.e. linear layer of fcs.
        # This will give me an output of sequence size more precisely: batch_size*seq_len*num_classes
       
        # self.fcs = [] # Can not create array of linear layer like this will give error.
        # for i in range(seq_len):
        #    self.fcs[i] = nn.Linear(in_features=self.hidden_dimensions, out_features=self.num_classes)
        self.fc = nn.Linear(in_features=self.hidden_dimensions, out_features=self.num_classes)


    def forward(self, X):
        assert len(X.size()) == 3, '[GRU]: Input dimension must be of length 3 i.e. [MxSxN]' # M: Batch Size(if batch first), S: Seq Lenght, N: Number of features

        batch_index = 0 if self.batch_first else 1
        num_direction = 2 if self.bidirectional else 1


        # Hidden state in first seq of the GRU
        h_0 = Variable(self.num_layers * num_direction, torch.zeros(X.size(batch_index), self.hidden_dimensions))

        output_gru, h_n = self.gru(X, h_0)

        # Note that it is batch-first
        # Now each individual in the sequence will have prediction label.
        # fc_output = []  # Can not store the output like this. It is an error.
        # for i in range(self.seq_len):
            # fc_output[i] = self.fcs[i](output_gru[:, i, :])

        fc_output = [self.fcs[i](output_gru[:, i, :]) for i in range(self.seq_len)]
        return fc_output  # output will be batch_size*num_classes for each entry

Assuming that above is correct how do I write loss function which sums over the cross-entropy of each time-step in sequence (as can be more clear from the following figure) and then take the average over batch-size. Moreover according to the following figure:

We can see that the weight w_y and b_y should be shared in the final layer.

My overall doubt is writing the loss function according to the figure and making sure that weight is shared (Regarding weight sharing I am sure that this is a requirement. I know that in the input layer it should be shared but not sure about the output layer). I don’t know if it is already in the framework or I have to find a way around it.

Any help will be highly appreciated.
Thank you.

jpeg729 · March 26, 2018, 9:49am

You are using a separate Linear layer with different parameters for each timestep.

Here is what I would do

class RNNS2S(nn.Module):
    def __init__(self, ...):
        ...
        self.fc = nn.Linear(in_features=self.hidden_dimensions, out_features=self.num_classes)

    def forward(self, X):
        ...
        output_gru, h_n = self.gru(X, h_0)
        # output_gru has shape (batch_size, seq_len, hidden_dimensions)
        
        # nn.Linear operates on the last dimension of its input
        # i.e. for each slice [i, j, :] of gru_output it produces a vector of size num_classes
        fc_output = self.fc(output_gru)
        # fc_output will be batch_size*seq_len*num_classes
        return fc_output

Now, I can produce input_data of shape (batches, timesteps, 4) along with targets of shape (batches, timesteps, 1) and if I feed input_data into the model then I will get output of shape (batches, timesteps, 1), i.e. one prediction for each timestep of each sample in the batch.

Then I can use any of the standard loss functions in PyTorch such as CrossEntropyLoss to compare fc_output to my targets.

pp18 · March 26, 2018, 5:52pm

@jpeg729
Thank you very much for your help. Only after looking at the solution one realizes that it is not very hard if one dabbles more.

I guess following will complete the answer to my question since documentation of CrossEntropyLoss has the restriction saying input has to be a 2D Tensor of size (minibatch, C) for the problem at hand output is 3D Tensor. Unless there is a direct way to do it, rather than writing custom loss function.

class CrossEntropyS2SLoss(nn.Module):
    """docstring for CrossEntropyS2SLoss"""
    def __init__(self, size_average=True):
        super(CrossEntropyS2SLoss, self).__init__()
        self.size_average = size_average

    def forward(self, output, y):
        batch_size = y.size(0)
        loss_func = nn.CrossEntropyLoss(size_average=self.size_average)

        # Loss from one sequence
        loss = loss_func(output[0, :, :].contiguous(), y[0, :, :].contiguous().view(-1)) # output will be batch_size*seq_len*num_classes an y will be batch_size*seq_len*1
        for i in range(1, batch_size):
            loss += loss_func(output[i, :, :].contiguous(), y[i, :, :].contiguous().view(-1)) #

        return loss

jpeg729 · March 26, 2018, 5:54pm

I missed that detail. You can use .view to make it 2d.

2d_variable = 3d_variable(-1, 3d_tensor.size(2))

pp18 · March 26, 2018, 8:17pm

what I wrote in the custom-loss function should also work, correct?

jpeg729 · March 26, 2018, 8:36pm

At a glance there is nothing wrong with it. I would expect the .view method to be faster than the python for loop.

pp18 · March 26, 2018, 11:10pm

Thanks for clarifying that. Thank you very much for all your help.