I am using the version that is written on this thread, and is giving NaNs after the first iteration. Typically, when I got Nans in the past it was either an error in differentiation or a large training rate. Both of these cannot be this time because it is automatic differentiation and I am using extremely small training rates (3e-7 in Adam, while typically I use 3e-4).
Anyway, I will have to read the documentation to find how to print the gradients, which might give me an idea on what is going wrong. And then, if I have problems, I will make an another thread.
Thanks for everything!