I’m using the ADAM optimizer for a SNN simple XOR test appl, although the “loss.backward()” computes valid gradient values, the “optimizer.step()” is not applying the expected parameter updates of “gradient * Learning_rate”. The updated values are much larger than the gradient value by itself. What I see is for the 1st batch update, is 0.1 (the learning rate) for all non-zero gradients. The following updates are larger than the gradient values and sometimes are the opposite direction. Has anyone else seen this behavior ?