VAE is just predicting the mode of each input channel

Hello, I have a set of ~1300 binary features that I’m trying to embed using a fully-connected VAE. I noticed that the model, regardless of normalization, dropout, activation functions, etc, is no longer learning once it hits a prediction accuracy of 76%. I checked, and this value corresponds to predicting each feature using its mode.

I’ve even expanded the size (and latent space) of this model to check that there is enough complexity. I would expect a model like this to perfectly fit.

FcVae(
  (encode_list): ModuleList(
    (0-2): 3 x FCBlock(
      (fc_block): Sequential(
        (0): Linear(in_features=1282, out_features=1282, bias=True)
        (1): Identity()
        (2): LeakyReLU(negative_slope=0.02, inplace=True)
      )
    )
  )
  (encode_fc_mean): FCBlock(
    (fc_block): Sequential(
      (0): Linear(in_features=1282, out_features=1282, bias=True)
    )
  )
  (encode_fc_log_var): FCBlock(
    (fc_block): Sequential(
      (0): Linear(in_features=1282, out_features=1282, bias=True)
    )
  )
  (decode_list): ModuleList(
    (0-2): 3 x FCBlock(
      (fc_block): Sequential(
        (0): Linear(in_features=1282, out_features=1282, bias=True)
        (1): Identity()
        (2): LeakyReLU(negative_slope=0.02, inplace=True)
      )
    )
    (3): FCBlock(
      (fc_block): Sequential(
        (0): Linear(in_features=1282, out_features=1282, bias=True)
      )
    )
  )
)
[Network Embed] Total number of parameters : 14.803 M

I should mention the loss I’m using is BCE with ADAM optimizer.

I think I figured it out, I was going a bit too hard with the regularization (weight_decay was 0.01). When I dropped weight_decay to 0.0001, it started to recover in-feature variance.