I used Adam with lr = 1e-3
First one is CPU
Seems to work. Maybe you need to train a bit longer if you use non-deterministic behavior but in general it should converge too.
1 Like
My concern is mostly with the values, since the seed is same. I thought, the values will also be same. Does it have to do with data types?