Using Dropout in reinforcement learning gym environment


I’m using Ray (rllib) and dropout with a gym environment and a fully connected forward network.

Since I’m using evolution strategies which injects a different set of parameter noise into the model for each rollout worker and corresponding mini-batch, it would be helpful to know does nn.dropout disable a different set of neurons for each environment step? Or is it the same mask for each network initialization during worker initialization and mini-batch?

Rllib just abstracts the torch model, it doesn’t change any of the underlying torch code.

When your model is run in model.train() mode, dropout will zero out a random distribution of layer parameters based on the p value set. Every time data goes through the model, a new set of parameter values are zeroed out.

On a side note, dropout is generally beneficial when you have a fixed training set in which the model may overfit on. It helps compensate for “holes” or “excesses” in the dataset. However, in reinforcement learning, the training data is produced on the fly and there is no risk of overfitting. The data is a random distribution of possible environments and more uniform with no repetition. So generally you won’t see dropout layers applied in reinforcement learning models.

thank you for the response, definitely very helpful!