Thank you for your replay
You are right, I did that on purpose because I am trying to mimic a paper that explained the network in this way. However, I tried to add non-linearty between them ,but unfortunately didn’t fix the NaN error.
debugging the code, I notice the NaN appears in the weights of the model after I call the optimizer()