Pytorch Model struggles to learn for large number of datapoints

I have a problem with my model, as it is struggling to learn when i increase the amount of training data.

My model relates to a physical system where the input is intended to be followed. It can also be described by a state space system.

My NN input basically consists of the state space input that needs to befollowed u(6 Values) and the current state of the system x (18 Values). The output of the system is an internal intermediate state z which is used to update the current system state x where z is a time derivative of x and is then passed through a second pre-trained NN to be further processed. The second NNs output w is then subject to another Matrix multiplication, where the result corresponds to the state space model output y which is supposed to follwow the input. Therefore, the loss of the network that outputs z is teh difference between u and y.

My state space input u consists of 20000 Datapoints, each with 6 Features as a timeseries. When i train my model to overfit onto one single datapoint this works without a problem, for 10 datapoints it still works, however, as soon as is use a larger subset of e.g. 100 Datapoints, the model fails to train for 3 of the 6 Features. For those 3 features the computations subsequent to the NN are more complicated than for the other 3 features.

However, the output transformation from z to y should’nt be a problem, as even if the implementation is not physically acurate, the model should still be able to learn according to the falsely implemented formulas. Nonetheless I checked these formulas and model multiple times and they should be fine.

As my model struggles, when I use more amount of training data, my suspision would be, that the model complexity is not sufficient to capture the dependencies of the problem. However, even doubling the amount of layers and the amount of neurons per layer did not help in any way.

Unfortunately my code includes sensitive data, therefore i am not able to share my code. I hope that my explanation is enough and someone has an idea on how I can get my model to learn for larger amounts of training data.