CarRacing not learning with A2C in torchrl

I am trying to code the Gym env CarRacing-v2 using torchrl, I have mostly followed the example of torchrl But even after about 400 episodes, it doesn’t give any good results. Am I collecting the collectors data correctly? I just don’t know why it’s not learning properly? Is the network too basic?

Here is the code on Colab

I’d like to know if this is the right method to go about creating a proper actor critic method for this environment, also I would really appreciate for any suggestions of suitable upgrades in the network…