I started working on different topics shortly after sending my last message so I didn’t make much progress yet.

Now, however, I am back on a project which involves generative models and inference so I expect I’ll have more time to be working on this.

I created scilua before, so I can capitalize on that and start with statistical distributions, followed by HMC (basic and NUTS) and then variational methods.

Before proceeding there are a few preliminary points I’d like to discuss:

**What is the plan for stochastic graphs?**

For unbiased gradients, pathwise-type estimators come for free.

Other than that, my understanding is that the currently supported approach is via `reinforce()`

.

I also found stochastic.py where the score-type estimators are implemented for some distributions.

However, both of these are “local” approaches, and it doesn’t seems to me that the current framework would allow for the automatic implementation of unbiased gradient estimators for more complex cases, say example 2 in section 2.3 of stochastic computation graphs, without modifications.

**Classes for statistical distributions?**

Assuming that there is a plan to fully support stochastic graphs in the future it would make sense to implement distributions as classes instead of separate methods (`Normal().log_pdf()`

vs `normal_log_pdf()`

). Parameters would be passed to the constructor instead of passing them to every member function call.

**Separate gradient estimators?**

I would keep the logic related to gradient computations separated, via an option passed to the constructor or a wrapping class (`Normal(gradient_estimator=PathwiseEstimator())`

or `PathwiseEstimator(Normal())`

) to retain flexibility as there are many different ways to produce such estimators.

This can introduce some issues as tensors are not currently promoted to autograd variables but I assume this will be done in the future.

A `cuda`

named argument would be passed to the constructor of statistical classes to specify that operations like random number generation are done on the GPU (which).

**Assumptions on data shapes?**

It might be beneficial to assume `[batch, random_variable.size()]`

dimensions: the first dimension has an iid samples meaning that gets averaged over log-likelyhood computations, and it gives space to generate multiple samples.

Basically I’m looking for the PyTorch’s core team comments / design suggestions to the points I mentioned above, and on the ones I failed to consider!