I would like to add a sparsity regularization to the encodings of my VAE. This means I set an activation vector to 1 for all indices where the encoding is not 0 and then take the mean over the batch dimension to get a distribution. Then I can take a pointwise kullback leibler divergence to the desired sparsity probability.

However, the output of the function does not possess a `grad_fn`

attribute. So I guess it does not propagate the gradient back. Here is a minimal working example with the function I use to regularize.

```
import torch
import torch.nn.functional as F
def sparsity_regularizer(enc, sparsity=.05):
activations = torch.zeros(size=enc.size())
# mean activation
activations[torch.nonzero(enc, as_tuple=True)] = 1
# take mean along batch dimension
mean = torch.mean(activations, dim=0)
reg = -F.kl_div(mean.log(), sparsity*torch.ones(size=mean.size()), reduction="sum")
return reg
enc = torch.Tensor([0, 2])
enc.requires_grad = True
out = sparsity_regularizer(enc)
print(out)
```

What do I need to change?