I am encountering a peculiar issue with my PyTorch model where the presence of an initialized but unused FeedForward Network (FFN) affects the model’s accuracy. Specifically, when the FFN is initialized in my CRS_A
class but not used in the forward
pass, my model’s accuracy is higher compared to when I completely remove (or comment out) the FFN initialization. The FFN is defined as follows in my model’s constructor:
class CRS_A(nn.Module):
def __init__(self, modal_x, modal_y, hid_dim=128, d_ff=512, dropout_rate=0.1):
super(CRS_A, self).__init__()
self.cross_attention = CrossAttention(modal_y, modal_x, hid_dim)
self.ffn = nn.Sequential(
nn.Conv1d(modal_x, d_ff, kernel_size=1),
nn.GELU(),
nn.Dropout(dropout_rate),
nn.Conv1d(d_ff, 128, kernel_size=1),
nn.Dropout(dropout_rate),
)
self.norm = nn.LayerNorm(modal_x)
self.linear1 = nn.Conv1d(1024, 512, kernel_size=1)
self.linear2 = nn.Conv1d(512, 300, kernel_size=1)
self.dropout1 = nn.Dropout(0.1)
self.dropout2 = nn.Dropout(0.1)
def forward(self, x, y, adj):
x = x + self.cross_attention(y, x, adj) #torch.Size([5, 67, 1024])
x = self.norm(x).permute(0, 2, 1)
x = self.dropout1(F.gelu(self.linear1(x))) #torch.Size([5, 512, 67])
x_e = self.dropout2(F.gelu(self.linear2(x))) #torch.Size([5, 300, 67])
return x_e, x
As you can see, the self.ffn
is not used in the forward
pass. Despite this, removing or commenting out the FFN’s initialization leads to a noticeable drop in accuracy.
Could this be due to some form of implicit regularization, or is there another explanation for this behavior? Has anyone encountered a similar situation, and how did you address it? Any insights or explanations would be greatly appreciated.