Getting NaN in backward

well actually I am training the ladder network which is quite complex, so I am not sure I could provide a simple example where it happens. Maybe we need a huge graph as with this minimal example everything works.

The only division by zero I have in my code is in batch normalization, however I prevent this using epsilon which know is fixed to 1. I print the cost after the first forward and it is like 190 an I do not see variables at NaN, is just after calling backward.

When I change the init value from this variables let say to 0.00000000001 instead of 0 I get my ladder network under 0.70

i tried to replicate a simple example but I cannot get one. Maybe I could save the model parameters and send.

Thanks.