Learnable variance

I wonder if it’s possible in general to learn a variance in some distribution function family, say Gaussian? Many applications require smoothing discrete data by adding a small noise. For example, Mutual Information Neural Estimation works better if we add noise which decreases with time. I can set up a schedule for that. But I’m not guaranteed to pick optimal values for the variance. What I want is to learn noise variance.

scale = torch.nn.Parameter(torch.full((), 0.2))  # initialize std to 0.2
sampler = torch.distributions.normal.Normal(loc=0, scale=scale)  # not possible :(
... # obtain discrete y_label
y_noise = y_label + sampler.sample()