Hand crafted model degradres through training

So I was trying different activation functions and I came across something weird. Hand crafted this model and I can pretty much get 0% error when I manually assign values to the weights and biases of my network.
Then I tell network to train and after many epochs (2500), I’m telling network to predict again. I was expecting since the original weights are almost perfect the network should just improve that small remaining error, but turns out the network diverges on training data.
I know my training routine is not a problem, as this specific network with these specific input/output is the only example that I found with this behavior. Even in the same model, if I swap my activation functions from ReLU to mish or SiLU the network can train without an issue.

test(47,)..Submodel(
  (layers): Sequential(
    (0): Linear(in_features=6, out_features=2, bias=True)
    (1): ReLU()
    (2): Linear(in_features=2, out_features=3, bias=True)
    (3): ReLU()
    (4): Linear(in_features=3, out_features=6, bias=True)
    (output_activation): Mish()
    (output_scale): ScaleAndShift(*tensor([0.0200, 0.0700, 0.0700, 0.0200, 0.0200, 0.0200], dtype=torch.float64)+tensor([-1., -3., -3., -1., -1., -1.], dtype=torch.float64))
  )
)
input
[[[1.00000 1.00000 1.00000 1.00000 1.00000 1.00000]]

 [[0.00000 0.00000 0.00000 0.00000 0.00000 0.00000]]

 [[-1.00000 -1.00000 -1.00000 -1.00000 -1.00000 -1.00000]]]
prediction:
[[2.00000 2.00500 2.00500 2.00000 2.00000 2.00000]
 [1.00000 1.02500 1.02500 1.00000 1.00000 1.00000]
 [0.00000 0.03100 0.03100 0.00000 0.00000 0.00000]]
score
0.9998082146798993
done test       2500(  2) :    0.0005 ->    0.6260: v   11.2685;t  12.9918 51
prediction:
[[1.47839 1.55670 1.55706 1.47860 1.47862 1.47843]
 [1.47839 1.55670 1.55706 1.47860 1.47862 1.47843]
 [0.06020 -0.15821 -0.15911 0.05984 0.05971 0.06023]]
score
0.7444793404118504

Can anyone guess what’s going on here?