Help/suggestion with MeshGraphNets

I’m working with meshgraphnets, i tried predicting 3 outputs with a single head, error reduced for one output only, so i tried weighted loss , multi-component loss weighting and Gradnorm at the end, when i tried GradNorm i logged the gradient norms for loss components which now had different output heads, they approached 0 after a few train steps, also i logged gradients and saw same thing happened, so i tried multi head outputs without gradnorm, but i see gradients getting 0 after few starting steps, i was normalizing my inputs and outputs with z_score, i didn’t use activation at the decoder heads, what can be the possible reason for this, one of my output has a mean close to 0, but other 2 are skewed , they don’t have long tails but are multi-modal at either side of 0, has someone explored a similar problem, i’d appreciate any suggestions or recommendations.

i didn’t use activation at the decoder heads

Why? No activation on the decoder can cause unbounded outputs. Have you tried adding one? E.g:
self.head = nn.Sequential(nn.Linear(...), nn.Tanh())

Also consider using min-max normalization as z-score can compress the signal if your distribution is skewed.