Difference between the 2 codes below (REINFORCE)

I used Adam with lr = 1e-3
First one is CPU

Seems to work. Maybe you need to train a bit longer if you use non-deterministic behavior but in general it should converge too.

1 Like

My concern is mostly with the values, since the seed is same. I thought, the values will also be same. Does it have to do with data types?