Gradients vanishing or exploding?

Saswat · October 18, 2022, 1:14pm

I am dealing with a classification problem (5000 classes).

My network has 3 MLP layers followed by 1 LSTM block.

class Model(nn.Module):

    def __init__(self, embedding_dim, hidden_dim, output_size, num_selected, layer_size=1):
        super(Model, self).__init__()
        self.layer_size = layer_size
        self.hidden_size = hidden_dim

        self.fc = nn.Sequential(
            nn.Linear(4000, 1024),
            nn.ReLU(),
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Linear(512, embedding_dim),
            nn.ReLU(),
        )

        self.lstm = nn.LSTM(embedding_dim, hidden_dim, bidirectional=True, num_layers=layer_size)
        self.hidden2tag = nn.Linear(2*hidden_dim, output_size)

    def forward(self, input_vector):
        embeddings = self.fc(input_vector)
        lstm_output, _ = self.lstm(embeddings)
        logits = self.hidden2tag(lstm_output)
        return logits

This is the gradient flow observed. Are my gradients exploding in the Linear layers and vanishing in the LSTM? How do I bring uniformity to this flow?

Performance is also affected as the network always predicts a single class as output