Bayesian computation in PyTorch

Hi all,

Just discover PyTorch yesterday, the dynamic graph idea is simply amazing! I am wondering if anybody is (or plans to) developing a Bayesian Computation package in PyTorch? Something like PyMC3 (theano) or Edward (tensorflow). I think the dynamic nature of PyTorch would be perfect for dirichlet process or mixture model, and Sequential Monte Carlo etc.


from keeping tabs on the community, I’m not aware of anyone working on a PyMC3 / Edward -like package. Would love to see how far you get, let us know how it goes :slight_smile:

I just recently started to work on this in my spare time.
My plan is to:

  • implement a selected number of variants of inference algorithms such as variational and MCMC inference (both traditional and scalable versions) separately from anything neural-network related
  • make such framework easily applicable to the existing nn module
  • implement basic functionalities for common statistical distributions
    I will post an update when I have something to show.

@stepelu great! let us know if you need any help.

I would love to contribute. Did you start a repository somewhere?
For the implementation of basic probability distribution, is it possible to just wrap the Scipy function:

No yet, I’ll post the link here when I do.
And unfortunately that approach wouldn’t work for CUDA.

I see, thanks for the info - better implementing them from scratch then.

Just bumping this old thread to see if anyone’s working on variational inference using PyTorch?

Been using Edward recently to do deep VI, and its great apart from having the usual TensorFlow disadvantages :frowning:

the folks at Uber may be building out an Edward-like thing for PyTorch.
Noah Goodman is at Uber ( he built ).


Wow, Noah Goodman sold out? Never thought that would happen? Guess every man has his price!

Then again I guess Uber will give him an army of engineers at his disposal :smile: Hope they open-source it?

Now that you mention Uber - yes I remember they’ve been working on Bayesian Optimization for their massive routing problems a long time. I see they signed up Zoubin Ghahramani as head scientist too.

I think principled Bayesian computation, overcomes many of the deficiencies of deep learning, and vice versa. Check out Shakir’s NIPs 2016 slides

Hey @smth,

please post here/let me know, when Uber or anyone else make public some sort of Black-Box variational inference engine public for PyTorch.

Tensorflow is driving me nuts - once you’ve used PyTorch it’s painful to go back to TF!

It doesn’t look too unfriendly porting over some of the code from, Edward.

For example, the main KL inference routines, are well written and there’s not much TF dependency, see

There is another Bayesian generative models lib, zhusuan, built on TF.

1 Like

Hi @yinhao,

thank you very much - I haven’t seen this library before, it looks very up to date and useful.

If you are working with Gaussian Processes, another very useful library is GPflow,

So it seems that there are now three variational inference libraries built upon Tensorflow by three different research groups, (Blei Lab, Tsinghua Machine Learning Group, and various contributors to GPflow)?

I guess it’s now only a matter of time before something is available in PyTorch :slight_smile:

Kind regard,


I started working on different topics shortly after sending my last message so I didn’t make much progress yet.

Now, however, I am back on a project which involves generative models and inference so I expect I’ll have more time to be working on this.

I created scilua before, so I can capitalize on that and start with statistical distributions, followed by HMC (basic and NUTS) and then variational methods.

Before proceeding there are a few preliminary points I’d like to discuss:

What is the plan for stochastic graphs?

For unbiased gradients, pathwise-type estimators come for free.

Other than that, my understanding is that the currently supported approach is via reinforce().

I also found where the score-type estimators are implemented for some distributions.

However, both of these are “local” approaches, and it doesn’t seems to me that the current framework would allow for the automatic implementation of unbiased gradient estimators for more complex cases, say example 2 in section 2.3 of stochastic computation graphs, without modifications.

Classes for statistical distributions?

Assuming that there is a plan to fully support stochastic graphs in the future it would make sense to implement distributions as classes instead of separate methods (Normal().log_pdf() vs normal_log_pdf()). Parameters would be passed to the constructor instead of passing them to every member function call.

Separate gradient estimators?

I would keep the logic related to gradient computations separated, via an option passed to the constructor or a wrapping class (Normal(gradient_estimator=PathwiseEstimator()) or PathwiseEstimator(Normal())) to retain flexibility as there are many different ways to produce such estimators.

This can introduce some issues as tensors are not currently promoted to autograd variables but I assume this will be done in the future.

A cuda named argument would be passed to the constructor of statistical classes to specify that operations like random number generation are done on the GPU (which).

Assumptions on data shapes?

It might be beneficial to assume [batch, random_variable.size()] dimensions: the first dimension has an iid samples meaning that gets averaged over log-likelyhood computations, and it gives space to generate multiple samples.

Basically I’m looking for the PyTorch’s core team comments / design suggestions to the points I mentioned above, and on the ones I failed to consider!


I am interested in contributing. I’m currently looking for a good package to test VI algorithms for big models and pytorch’s dynamic graph would make it a lot easier than working in theano or tensorflow.

1 Like

I started putting something online at, for now just some statistical distributions and a small number of related functions.

I’ll implement the Gamma, Beta and Dirichlet distributions too next week, sampling aside that would need a C/CUDA implementation (and it’s not trivial to do CUDA-efficiently and in a numerically stable way).

I am also working on MCMC: Metropolis-RW, Langevin-RW, HMC, NUTS.

Contributions are welcome, for instance:

  • not yet implemented statistical distributions (let me know if you plan to work on any of the above)
  • add missing kl-divergence implementations
  • add mean and variance methods

Is there any timeline for issue #274? It would allow me to remove all these Variable() calls and facilitate the MCMC code I’m writing too.


I’m also highly interested to contribute ! I have already contributed to edward and Turing by implementing various MCMC methods.

1 Like

Any advances? Highly interested too.

1 Like

Hi Bayes folks, It’s great to see people interested in this approach! We recently released a probabilistic programming language built on top of PyTorch, called Pyro PPL. Hopefully this will be of use / interest!


For those with an inclination towards gpflow and pytorch, I did some work on a port at . Of course, it is much more modest in ambition than pyro.

Best regards