Currently, I wrote a code that gives me NAN when learning rate is 0.1 but works fine with learning rate of 0.01. I am not sure if it is just the learning rate problem or my code problem. I have inspected it many times and couldn’t find any bug myself but I am still very suspicious and unsure.
Will big learning rate really cause NAN? I am using double tensor in my code.