Limiting the number of activated output neurons in autoencoder

Hi everyone,

I am implementing an autoencoder. It has a symmetric stuctrure:

  • Input layer: 100 neurons
  • hidden layer 1: 40 neurons
  • hidden layer 2: 20 neurons
  • hidden layer 3 == encoder output layer: 4 neurons
  • hidden layer 4: 20 neurons
  • hidden layer 5: 40 neurons
  • Output layer: 100 neurons

I am using ReLu activation function and Adam optimizer with weight decay,

What I am trying to deploy is a mechanism which allows me to limit the number of activated neurons in the bottleneck layer (hidden layer 3). It means that I would like to have 1 neuron out of 4 activated. That is why, for every batch or for the entire training dataset (so, before perform the proper training) - I still must tune this option -, I compute the following term that must be added to the loss function. Once it is done, I perform the backpropagation.

def kl_divergence(rho, rho_hat):
    # i count the number of activated neurons

    for i in range(rho_hat.shape[0]):
        for j in range(rho_hat.shape[1]):
            if rho_hat[i,j]>0:
                rho_hat[i,j]= 1
            else:
                rho_hat[i,j]= 0
    rho_hat = ((rho_hat.sum(dim=0))/rho_hat.shape[0])

    for i in range(len(rho_hat)): # len(rho_hat)==4
        if rho_hat[i]==0:
            rho_hat[i]=0.000001
        elif rho_hat[i]==1:
            rho_hat[i]=0.999999
    ##########print("rho hat summed")
    ##########print(rho_hat)
#     rho_hat = torch.mean(torch.sigmoid(rho_hat), 1) # sigmoid because we need the probability distributions
# #     # we make rho the same dimension as rho_hat sot we can calculate the KL divergence between them
#     rho = torch.tensor([rho] * len(rho_hat)).to(device)
#     print(rho_hat)
#     print(rho_hat.shape)
    rho = torch.tensor([rho] * len(rho_hat)).to(device)
    # we return the KL divergence between rho and rho_hat.
    ##########print("rho hat log computation")
    return torch.sum(rho * torch.log(rho/rho_hat) + (1 - rho) * torch.log((1 - rho)/(1 - rho_hat))).item()
 
# define the sparse loss function
# Note that the calculations happen layer-wise in the function sparse_loss(). We iterate through the model_children list 
# and calculate the values. These values are passed to the kl_divergence() function and we get the mean probabilities as
# rho_hat. Finally, we return the total sparsity loss from sparse_loss() function .
def sparse_loss(rho, x, feat_name):
    values = x
    loss = 0
    if feat_name=="SNR":
        for i in range(len(model_children_SNR)):
            values = model_children_SNR[i](values)
            values = F.relu(values)
#             print(values.shape)
            # VALUES WILL BE OUR RHO HAT
            if i==2:
                # this is the batch_loss
                loss += kl_divergence(rho, values)
                break
        return loss
    elif feat_name=="bsr":
        for i in range(len(model_children_bsr)):
            values = model_children_bsr[i](values)
            # VALUES WILL BE OUR RHO HAT
            if i==2:
                loss += kl_divergence(rho, values)
                break
        return loss