I am currently engaged in training a neural network to learn from a set of data points that contain information about the derivatives, represented as

Train Data = {x_i, f’(x_i)} with i=100

Additionally, I have supplementary data available in the form of

{(f_k}, where k=100^2.

For the latter set I do not have the corresponding $x_k$. My goal is now to find a approximate the function using a neural network.

My proposed approach involves two local loss function defined as follows:

L_1 = Sum_i || N’(x_i) - f’(x_i)|| ^2

L_2 = Sum_i (argmin_k ||N(x_i) - f_k||^2

Is it then possible to define the loss function as

L = L_1 + L_2

and is it legit to follow this appropach? I am not sure since L_2 contains a closest point search for each epoch.