Influence of Unused FFN on Model Accuracy in PyTorch

Priya1 · March 31, 2024, 11:37pm

I am encountering a peculiar issue with my PyTorch model where the presence of an initialized but unused FeedForward Network (FFN) affects the model’s accuracy. Specifically, when the FFN is initialized in my CRS_A class but not used in the forward pass, my model’s accuracy is higher compared to when I completely remove (or comment out) the FFN initialization. The FFN is defined as follows in my model’s constructor:

class CRS_A(nn.Module):
    def __init__(self, modal_x, modal_y, hid_dim=128, d_ff=512, dropout_rate=0.1):
        super(CRS_A, self).__init__()

        self.cross_attention = CrossAttention(modal_y, modal_x, hid_dim)
        self.ffn = nn.Sequential(
            nn.Conv1d(modal_x, d_ff, kernel_size=1),
            nn.GELU(),
            nn.Dropout(dropout_rate),
            nn.Conv1d(d_ff, 128, kernel_size=1),
            nn.Dropout(dropout_rate),
        )
        self.norm = nn.LayerNorm(modal_x)
       
        self.linear1 = nn.Conv1d(1024, 512, kernel_size=1)
        self.linear2 = nn.Conv1d(512, 300, kernel_size=1)
        self.dropout1 = nn.Dropout(0.1)
        self.dropout2 = nn.Dropout(0.1)
    def forward(self, x, y, adj):
        x = x + self.cross_attention(y, x, adj)  #torch.Size([5, 67, 1024])
        x = self.norm(x).permute(0, 2, 1)
        x = self.dropout1(F.gelu(self.linear1(x))) #torch.Size([5, 512, 67])
        x_e = self.dropout2(F.gelu(self.linear2(x))) #torch.Size([5, 300, 67])

        return x_e, x

As you can see, the self.ffn is not used in the forward pass. Despite this, removing or commenting out the FFN’s initialization leads to a noticeable drop in accuracy.

Could this be due to some form of implicit regularization, or is there another explanation for this behavior? Has anyone encountered a similar situation, and how did you address it? Any insights or explanations would be greatly appreciated.

ptrblck · April 1, 2024, 12:40am

Initializing modules calls into the pseudorandom number generator and will thus have an effect on the overall training. You should be able to see a similar effect when changing the random seed and rerunning the training procedure.

Priya1 · April 1, 2024, 12:47am

Thank you for your response. Yes you are true, if I change seed or rerun I can see accuracy changes with ffn initialization. what is the recommended solution here?

ptrblck · April 1, 2024, 2:04pm

You could change some hyperparameters, such as the learning rate or the parameter initialization, and check if this would stabilize the training.