I wonder if it’s possible in general to *learn* a variance in some distribution function family, say Gaussian? Many applications require smoothing discrete data by adding a small noise. For example, Mutual Information Neural Estimation works better if we add noise which decreases with time. I can set up a schedule for that. But I’m not guaranteed to pick optimal values for the variance. What I want is to *learn* noise variance.

```
scale = torch.nn.Parameter(torch.full((), 0.2)) # initialize std to 0.2
sampler = torch.distributions.normal.Normal(loc=0, scale=scale) # not possible :(
... # obtain discrete y_label
y_noise = y_label + sampler.sample()
```