It looks like you have different learning rates for Kiera’s model lr=0.01 and pytorch model lr=0.001 so most likely your main cause for differing convergence rates
It looks like you have different learning rates for Kiera’s model lr=0.01 and pytorch model lr=0.001 so most likely your main cause for differing convergence rates