Hi everyone,
I am implementing an autoencoder. It has a symmetric stuctrure:
- Input layer: 100 neurons
- hidden layer 1: 40 neurons
- hidden layer 2: 20 neurons
- hidden layer 3 == encoder output layer: 4 neurons
- hidden layer 4: 20 neurons
- hidden layer 5: 40 neurons
- Output layer: 100 neurons
I am using ReLu activation function and Adam optimizer with weight decay,
What I am trying to deploy is a mechanism which allows me to limit the number of activated neurons in the bottleneck layer (hidden layer 3). It means that I would like to have 1 neuron out of 4 activated. That is why, for every batch or for the entire training dataset (so, before perform the proper training) - I still must tune this option -, I compute the following term that must be added to the loss function. Once it is done, I perform the backpropagation.
def kl_divergence(rho, rho_hat):
# i count the number of activated neurons
for i in range(rho_hat.shape[0]):
for j in range(rho_hat.shape[1]):
if rho_hat[i,j]>0:
rho_hat[i,j]= 1
else:
rho_hat[i,j]= 0
rho_hat = ((rho_hat.sum(dim=0))/rho_hat.shape[0])
for i in range(len(rho_hat)): # len(rho_hat)==4
if rho_hat[i]==0:
rho_hat[i]=0.000001
elif rho_hat[i]==1:
rho_hat[i]=0.999999
##########print("rho hat summed")
##########print(rho_hat)
# rho_hat = torch.mean(torch.sigmoid(rho_hat), 1) # sigmoid because we need the probability distributions
# # # we make rho the same dimension as rho_hat sot we can calculate the KL divergence between them
# rho = torch.tensor([rho] * len(rho_hat)).to(device)
# print(rho_hat)
# print(rho_hat.shape)
rho = torch.tensor([rho] * len(rho_hat)).to(device)
# we return the KL divergence between rho and rho_hat.
##########print("rho hat log computation")
return torch.sum(rho * torch.log(rho/rho_hat) + (1 - rho) * torch.log((1 - rho)/(1 - rho_hat))).item()
# define the sparse loss function
# Note that the calculations happen layer-wise in the function sparse_loss(). We iterate through the model_children list
# and calculate the values. These values are passed to the kl_divergence() function and we get the mean probabilities as
# rho_hat. Finally, we return the total sparsity loss from sparse_loss() function .
def sparse_loss(rho, x, feat_name):
values = x
loss = 0
if feat_name=="SNR":
for i in range(len(model_children_SNR)):
values = model_children_SNR[i](values)
values = F.relu(values)
# print(values.shape)
# VALUES WILL BE OUR RHO HAT
if i==2:
# this is the batch_loss
loss += kl_divergence(rho, values)
break
return loss
elif feat_name=="bsr":
for i in range(len(model_children_bsr)):
values = model_children_bsr[i](values)
# VALUES WILL BE OUR RHO HAT
if i==2:
loss += kl_divergence(rho, values)
break
return loss