I’m trying to fit a neural network to a known function f with input and output dimensions both being 6 (which I would like to scale to 100~ in the future). I am using mean squared error as a loss function with ADAM optimizer and my target is 1e-6 accuracy. As f is known and the sample space is given, there is basically an infinite number of data available to feed into NN so I give 128 uniform samples in each step. In the first 1000 steps, I see a steep decline in the loss function but after that, the loss curve draws an L-like shape at 5e-3 (the decrease is not visible in the log plot) for about 1e6 steps until I stop. I tried varying the learning rate from 1e-3 to 1e-5, and the neural network size from 24 width * 3 hidden layers to 48 width * 9 hidden layers. However, I observe that there is a barrier to the loss function at the same 5e-3 level. I’m not sure what else I can try. While universal approximation holds ‘in theory’, is this a common phenomenon in deep learning?