Why is only the second half of my neural net learning?

Everything in the second half (in the green box) is learning reliable weights but everything in the first half (blue box) is learning seemingly random weights. The backprop updates the weights and gradients of the first half, yet it is learning seemingly random weights that do not improve the training. The second half is learning correctly though.

It seems the issue is something to do with the transpose/view function somehow causing the grad_fn function to be messed up. What exactly is the issue and how can it be fixed?

How do you check, that the parameter updates of the first half of the model do not improve the training, while the updates for the second half do improve it?