SAC doesn't converge in gym Mountain Car environment

I implemented SAC accodring to this paper. I tested it on some gym environments (Half Cheetah and Inverted Pendulum) and got very decent results. So the algorithm should be working. However when I try it on the Mountain Car env, I can’t get it to learn. Sometimes I get better results with average reward above 0, but no matter what the model keeps getting stuck on a plateau after ~100 gradient steps:


My params:
image
What can be the cause? Since this is a sparse-reward environment I thought I’d need a bigger batch size. I’ve tried from 256 to 8000 but got similar results so in must be something different

Check, if you can get that environment working with another SAC implementation like stable baselines3 or ClearRL.
It one of them works, then the problem is your code.
The reparametrization trick is a very important concept, that must be implemented well.
Ah! I discovered that the entropy term matters A LOT. So try to implement the auto entropy as in the CleanRL code above. Setting that value manually is a pain in the a**.