I could not find anywhere how to perform many-to-many classification task in pytorch.
To give details I have a time-series sequence where each timestep is labeled either 0 or 1. For example, if I have input size of [256x64x4]: 256: Batch size, 64: Sequence-length, 4: Feature size (Assume that data is structured batch-first) then the output size is [256x64x1]. I have written the following code for forward-propagation (not sure If I am procedding fine).
class RNNS2S(nn.Module): """docstring for RNNS2S""" def __init__(self, input_dimensions=4, hidden_dimensions=512, num_classes=2, num_layers=2, dropout=0.0, batch_first=True, bidirectional=False, seq_len=64): super(RNNS2S, self).__init__() self.input_dimensions = input_dimensions self.hidden_dimensions = hidden_dimensions self.num_classes = num_classes self.num_layers = num_layers self.dropout = dropout self.batch_first = batch_first self.bidirectional = bidirectional self.seq_len = seq_len # GRU Layer self.gru = nn.GRU(input_size=self.input_dimensions, hidden_size=self.hidden_dimensions, num_layers=self.num_layers, batch_first=self.batch_first, dropout=self.dropout, bidirectional=self.bidirectional) # Fully connected for full seq output i.e. linear layer of fcs. # This will give me an output of sequence size more precisely: batch_size*seq_len*num_classes # self.fcs =  # Can not create array of linear layer like this will give error. # for i in range(seq_len): # self.fcs[i] = nn.Linear(in_features=self.hidden_dimensions, out_features=self.num_classes) self.fc = nn.Linear(in_features=self.hidden_dimensions, out_features=self.num_classes) def forward(self, X): assert len(X.size()) == 3, '[GRU]: Input dimension must be of length 3 i.e. [MxSxN]' # M: Batch Size(if batch first), S: Seq Lenght, N: Number of features batch_index = 0 if self.batch_first else 1 num_direction = 2 if self.bidirectional else 1 # Hidden state in first seq of the GRU h_0 = Variable(self.num_layers * num_direction, torch.zeros(X.size(batch_index), self.hidden_dimensions)) output_gru, h_n = self.gru(X, h_0) # Note that it is batch-first # Now each individual in the sequence will have prediction label. # fc_output =  # Can not store the output like this. It is an error. # for i in range(self.seq_len): # fc_output[i] = self.fcs[i](output_gru[:, i, :]) fc_output = [self.fcs[i](output_gru[:, i, :]) for i in range(self.seq_len)] return fc_output # output will be batch_size*num_classes for each entry
Assuming that above is correct how do I write loss function which sums over the cross-entropy of each time-step in sequence (as can be more clear from the following figure) and then take the average over batch-size. Moreover according to the following figure:
We can see that the weight w_y and b_y should be shared in the final layer.
My overall doubt is writing the loss function according to the figure and making sure that weight is shared (Regarding weight sharing I am sure that this is a requirement. I know that in the input layer it should be shared but not sure about the output layer). I don’t know if it is already in the framework or I have to find a way around it.
Any help will be highly appreciated.