Questions about scaling of input and output when calculating gradients

Hi all,

I am trying to implement a PINN for Fick’s law that looks like this:

dc/dt = r^-2 * d/dr * (r^2 * D * dc/dr)

r and t are inputs to the network and are min-max scaled between 0 and 1 (Sobol sampled).
I also normalized my outputs when calculating training loss and boundary losses. What else do I need to consider regarding the scaling? The double derivative calculation and multiplying r^2, r^-2, and D (non-scaled) is kind off confusing me. What do I need to consider?

Thank you for answering my question!