I am not sure if this is the correct forum to post queries related to algorithm implementation correctness, but since I have the implementation in PyTorch, I am giving it a try.

I tried implementing the Policy Gradient A2C algorithm in PyTorch, using the this and this as references. I also followed the OpenAI Baselines Tensorflow implementation.

After spending more than a week now on trying to figure out why the agents are not learning even after 5-6 hours of training (which itself is suspicious because with Vectorized implementations for A2C, it should start learning within 30 minutes), I have finally given up.

I am using the PongNoFrameskip-v4 vectorized environment and have the implementation here. I would really appreciate if someone could either help me identify any potentially obvious errors or guide me to the appropriate forums for such type of questions.

Thanks