SAC doesn't converge in gym Mountain Car environment

I implemented SAC accodring to this paper. I tested it on some gym environments (Half Cheetah and Inverted Pendulum) and got very decent results. So the algorithm should be working. However when I try it on the Mountain Car env, I can’t get it to learn. Sometimes I get better results with average reward above 0, but no matter what the model keeps getting stuck on a plateau after ~100 gradient steps:


My params:
image
What can be the cause? Since this is a sparse-reward environment I thought I’d need a bigger batch size. I’ve tried from 256 to 8000 but got similar results so in must be something different