I started working on different topics shortly after sending my last message so I didnâ€™t make much progress yet.

Now, however, I am back on a project which involves generative models and inference so I expect Iâ€™ll have more time to be working on this.

I created scilua before, so I can capitalize on that and start with statistical distributions, followed by HMC (basic and NUTS) and then variational methods.

Before proceeding there are a few preliminary points Iâ€™d like to discuss:

**What is the plan for stochastic graphs?**

For unbiased gradients, pathwise-type estimators come for free.

Other than that, my understanding is that the currently supported approach is via `reinforce()`

.

I also found stochastic.py where the score-type estimators are implemented for some distributions.

However, both of these are â€ślocalâ€ť approaches, and it doesnâ€™t seems to me that the current framework would allow for the automatic implementation of unbiased gradient estimators for more complex cases, say example 2 in section 2.3 of stochastic computation graphs, without modifications.

**Classes for statistical distributions?**

Assuming that there is a plan to fully support stochastic graphs in the future it would make sense to implement distributions as classes instead of separate methods (`Normal().log_pdf()`

vs `normal_log_pdf()`

). Parameters would be passed to the constructor instead of passing them to every member function call.

**Separate gradient estimators?**

I would keep the logic related to gradient computations separated, via an option passed to the constructor or a wrapping class (`Normal(gradient_estimator=PathwiseEstimator())`

or `PathwiseEstimator(Normal())`

) to retain flexibility as there are many different ways to produce such estimators.

This can introduce some issues as tensors are not currently promoted to autograd variables but I assume this will be done in the future.

A `cuda`

named argument would be passed to the constructor of statistical classes to specify that operations like random number generation are done on the GPU (which).

**Assumptions on data shapes?**

It might be beneficial to assume `[batch, random_variable.size()]`

dimensions: the first dimension has an iid samples meaning that gets averaged over log-likelyhood computations, and it gives space to generate multiple samples.

Basically Iâ€™m looking for the PyTorchâ€™s core team comments / design suggestions to the points I mentioned above, and on the ones I failed to consider!