Deep Deterministic Policy Gradient implementation

I’ve added this line because I tried all combinations of eval and train, and learning works only when i put eval everywhere. But that probably means that batch norm is not used at all.
I also tried to remove batchnorm layers altogether and it also enables learning.
Keras model probably also has a slight bug as it always keeps batchnorm layer in evaluation mode. But surprisingly, when I put it in training mode then learning abilities are not affected.
I am thoroughly confused right now and I will probably go carefully through the implementations to debug what’s going on. Just like you suggested in your first post. I think I need to debug my autograd chain. Is there a way to see it as a graph?