Implementing a Bayesian CNN in PyTorch

I wanted to implement a bayesian CNN from scratch using the PyTorch framework. I searched for the implementation of dropout in order to write a custom modification. I searched for the function given in the definition of dropout3d and could not find the apply() function?
Could anyone help me? Also I’m new to this forum. Thanks.


def dropout3d(input, p=0.5, training=False, inplace=False):
return _functions.dropout.FeatureDropout.apply(input, p, training, inplace)

1 Like

dropout3d calls a method of _functions.dropout.FeatureDropout which inherits from Dropout which has a forward method.

According to the docs on extending PyTorch you implement a custom function by creating a class with a forward method, and you use it by calling the apply method. Therefore, when dropout3d calls _functions.dropout.FeatureDropout.apply, we can assume that Dropout.forward gets run.

Therefore to write a custom version of dropout3d, you could copy the Dropout class from the above link and modify its forward method.

That’s right. But the thing is I want to implement dropout after each kernel filter. Currently the dropout layer works over the entire previous layer, but according to the paper "Bayesian CNN with approximate bernouilli variational inference, we’ll need to implement dropout after each filter during convolution.

So for that purpose I’ll need to modify the dropout implementation itself which is why I wanted to know where the apply() function would be available in order to change that.


I thought that each filter of a CNN produced its own channel in the output. Which means that doing dropout after each individual filter is equivalent to doing dropout after all of the filters.

The paper itself seems to support my interpretation…
From page 2 of that paper…

Our model is implemented by performing dropout after convolution layers.

This sentence appears as quoted here with no mention of any specificity in implementation of the dropout.

In section 2.2 the description of the dropout they use seems pretty standard to me.

From page 5

Implementing our Bayesian CNN is therefore as simple as using dropout after every convolution layer before pooling.

Again, no mention of any particular need to implement a custom dropout for this case.

As far as I can tell, model training is done as usual, but they say that predictions at test time should be produced using “Monte-Carlo dropout” which I haven’t yet managed to understand. But then again Bayesian stuff isn’t my thing and maybe I have understood it all backwards.

Thanks but the following is taken from the paper in section 5:

“In existing literature, however, dropout is used in CNNs only after inner-product layers – equivalent
to approximately integrating these alone. Here we wish to integrate over the kernels of the CNN as
well. Thus implementing a Bayesian CNN we apply dropout after all convolution layers as well as
inner-product layers.”

As stated, integration is done over the kernel of the CNN. Do I have the right idea? If so, how would I go about implementing?

I understood that to mean that dropout is usually applied only after fully connected layers but not directly after CNN layers because test error increases in that case.

As I understand it, the output of each kernel of a CNN is equivalent to exactly one channel of the CNN layer output. The paper wants to integrate over the kernels of the CNN, and they suggest using dropout as an approximate method. So I suppose you could loop over the channels of the output applying dropout to each channel individually - that would apply dropout to each kernel of the CNN individually, but that approach is mathematically equivalent to applying dropout to the entire layer output in one go. The last but one paragraph of section 5 that explains how to apply dropout to the CNN layer seems to support my interpretation.

This is my summary of the paper as I understand it. I could be wrong, but I am fairly sure I have understood it correctly.

In training apply standard dropout after every CNN and fully connected layer of the network. Train with the usual softmax loss combined with L2 regularisation.
When making predictions at test time, don’t turn off dropout as we usually do, instead make several runs through the network with dropout turned on, thus collecting several different predictions for the same input. The average of the predictions can be used as the predicted value. The spread of the predictions can be used as a measure of uncertainty. This is what they call Monte-Carlo dropout.
The number of runs needed for optimal test performance must be determined by testing. Their tests suggest that 20 runs through the network generally produces a significant decrease in test error, and that more than 100 runs is generally unnecessary.
The prediction runs can be efficiently run in parallel by making a batch containing the same image repeated many times.

1 Like

Isn’t Monte-Carlo dropout just applying dropout in forward pass during the inference time? Something like for each object, run the algorithm n times while applying dropout, and then averaging the results.

I think that is basically what I said.

Yep. Sorry, missed your last post when you said that, and replied on your earlier post.

What libraries I could use to implement Bayesian CNN?

I’m not sure what Bayesian CNNs are, but have a look at BoTorch, which provides a library for Bayesian Optimization. Would that fit your use case or are you looking for some specific model architectures?

Seems to have been answered in the double post.


This is a great recommendation. Thanks a lot!

Hi !
I am searching dor an approach to implement Bayesian Deep learning, i found two methode either by bayes by backprop or by dropout, I’ve read that Optimising any neural network with dropout is equivalent to a form of approximate Bayesian inference and a network trained with dropout already is a Bayesian neural network,
Could you please confirm these statements if you have any idea ?
Thank youu for your time,

One of the results on dropout as a approximate Bayesian inference is Gal and Ghahramani. Yarin Gal has a blog post on it with interactive demo and all.

1 Like