Distribution Implementations

We didn’t plan on adding that, but it seems like a useful thing to have. It’s not going to be a priority for us, but if someone wants to send a PR, then we’ll be happy to merge it in.

I would love to contribute as well. However, I am still getting familiar with the infrastructure of PyTorch. It would be great if someone from the dev could provide a template for implementation (e.g., a Normal distribution), ideally one for the continuous and one for the discrete distribution, as a reference for best practice using PyTorch.

@junpenglao do you want to add that to autograd or just as a tensor operation?

add that to autograd for sure.
i had a look at the StochasticFunction class, however, I dont think it is a good way to implement as the reward in backward might not always applied. Maybe the best way is just to wrap it as a autograd.Function?

I don’t think you need it to be a StochasticFunction because there’s no sampling. The function is entirely deterministic. In fact, the basic pdfs can be computed using the function already available in autograd (exp, sqrt, and similar). Just define a plain Python function:

def normal_pdf(x, mean, std):
    # compute the pdf here using x, mean and std Variables

Actually, I do want to sample from it, as I am interested in using them for Bayesian inference. In that case, maybe it is better to wrap it in a Function class?

How do you compute the gradients in that case? Right now StochasticFunctions will use REINFORCE.

A distribution could be implemented as a class containing one Function “self.pdf” and a method sample()

class Distribution():
    def __init__(self):
        self.pdf = # autograd Function

    def sample():
        # use self.pdf for Monte Carlo algorithm

( by the way, how do you use python coloration when providing code in this forum ?)

2 Likes

When opening up the block append “python” right after the opening backticks:

```python
code here
```

1 Like

Ah ok ! thanks :sweat_smile:

@junpenglao did you start something concerning distributions? I am starting a project involving stochastic networks, so I am going implement a kind of distribution library with tools for estimation (assuming a tensor follows a distribution) and sampling.

My idea is to create one objects distrib.Distribution (e.g. distrib.Bernouilli() ) and two funcions:
distrib.estimation(tensor x, distribution A) = estimation of parameters of distribution A from sample x
distrib.sample(distribution A) = tensor x ~ A

For exemple, this would be an implementation of a RBM with nv visible units and nh hiden:

W = torch.rand(nv, nh)); a = torch.rand(nh)); b = torch.rand(nv));
net_forward = nn.Sequential(nn.Linear(wieght=W, bias=a), nn.Sigmoid())
net_backward = nn.Sequential(nn.Linear(wieght=W, bias=b), nn.SoftMax())

# training:
for input in training_set:
    v1 = input # from training set

    x = net_forward(v1)
    H1 = distrib.estimation(x, distrib.Bernouilli())
    h1 = distrib.sample(H1)

    y = net_backward(h)
    V2 = distrib.estimation(y, distrib.Multinomial())
    v2 = distrib.sample(V2)

    x = net_forward(v2)
    H2 =distrib. estimation(x, distrib.Bernouilli())
    h2 = distrib.sample(H2)

    # gradient descent step:
    W -= lr * (v1.h1 - v2.h2)
    a -= lr * (h1 - h2)
    b -= lr * (v1 - v2)

I had implemented a Normal distribution as a torch.autograd.Function:

import torch
from torch.autograd import Variable,Function
import torch.optim as optim
from scipy import stats
import numpy as np

dtype = torch.FloatTensor
def get_tau_sd(tau=None, sd=None):
    """
    Function port from PyMC3

    Find precision and standard deviation
    .. math::
        \tau = \frac{1}{\sigma^2}
    Parameters
    ----------
    tau : array-like, optional
    sd : array-like, optional
    Results
    -------
    Returns tuple (tau, sd)
    Notes
    -----
    If neither tau nor sd is provided, returns (1., 1.)
    """
    if tau is None:
        if sd is None:
            sd = 1.
            tau = 1.
        else:
            tau = sd**-2.

    else:
        if sd is not None:
            raise ValueError("Can't pass both tau and sd")
        else:
            sd = tau**-.5

    # cast tau and sd to float in a way that works for both np.arrays
    # and pure python
    tau = 1. * tau
    sd = 1. * sd

    return (tau, sd)

class Normal(Function):
    
    def __init__(self, mu=0, sd=None, tau=None, **kwargs):
        """Construct Normal distributions with mean and stddev `loc` and `scale`.
        The parameters `loc` and `scale` must be shaped in a way that supports
        broadcasting (e.g. `loc + scale` is a valid operation).
        Args:
          mu: Floating point tensor; the means of the distribution(s).
          sd: Floating point tensor; the stddevs of the distribution(s).
            Must contain only positive values.
            Raises:
          TypeError: if `loc` and `scale` have different `dtype`.
        """
        tau, sd = get_tau_sd(tau=tau, sd=sd)
        self.sd = sd
        self.tau = tau

        self.mean = self.median = self.mode = self.mu = mu
        self.variance = 1. / self.tau

        super(Normal, self).__init__(**kwargs)

    def random(self, size=None):
        standard_normal = torch.randn(size)
        sd = self.sd
        mu = self.mu
        
        muvec = mu.repeat(size)
        sdvec = sd.repeat(size)
        
        return standard_normal.mul_(sdvec).add_(muvec)

    def logp(self, value):
        tau = self.tau
        mu = self.mu
        
        muvec = mu.repeat(value.size(0), 1)
        tauvec = tau.repeat(value.size(0), 1)
        
        return (-tauvec * (value - muvec)**2 + torch.log(tauvec / np.pi / 2.)) / 2.

I think it is not easy to implement a distrib.estimation(tensor x, distribution A) function, especially if the distribution contains multiple parameters - most of the case you need to formulate the backward gradient carefully to make it work.

It would be lovely to have a distribution class, even with just simple random number generation and a log-likelihood function. But currently, there are also some crucial functions missing for fast implementation.

1 Like

I think distribution could be implemented as a collection of functions as below:

class Distribution:

    @staticmethod
    def pdf(self):
        raise NotImplementedError

    @staticmethod
    def logpdf(self):
        raise NotImplementedError
    
    @staticmethod
    def cdf(self):
        raise NotImplementedError
    
    @staticmethod
    def qtl(self):
        raise NotImplementedError

    @staticmethod
    def rnd(self):
        raise NotImplementedError

class Gauss(Distribution):
    #scalar version
    @staticmethod
    def logpdf(x, mu, sigma):
        return -0.5 * (x-mu) * (x-mu)/(sigma*sigma) - 0.5 * torch.log(2*math.pi*sigma*sigma)

and it could be used like this:

Gauss.logpdf(x, mu, sigma)

Hey guys maybe it would be a good idea to have a look at edward which was built with bayesian inference in mind. It uses tensorflow as backend but I guess most of the ideas of defining RVs and distributions would be the same. Maybe the translation and switch of edward's backend form tensorflow to pytorch might not be that extremely difficult and you won’t have to start from scratch. I think it would be nice to see sth equivalent to pytorch.distributed but this time pytorch.bayesian :wink:

1 Like

torch-distribution from deepmind is also a good reference, which could be even easier to transfer.

@chenyuntc that’s true I forgot about that :slight_smile:

I forgot to update this thread, but there is some initial work going on at: https://github.com/stepelu/ptstat

Any advances on that? Highly interested.

A quick status update: http://pytorch.org/docs/master/distributions.html

2 Likes

My team just open sourced a distributions library that we’re hoping to move upstream into PyTorch: https://github.com/uber/pyro/blob/dev/pyro/distributions

6 Likes