In deep neural networks, sometimes we have a complicated estimated function at the end including many multiplication and applying activation functions. How do we know that is differentiable or not? We just add layers and changing activation functions without knowing it.
If you are unsure if a specific operation is differentiable you could check if it’s output has a valid
Thanks, but I still have problem.
Suppose that we have a derivative loss function by degree=3( loss= x^3), so here we cannot find the min of the function with gradient descend.
To express more, let’s suppose we are tuning a deep nn model to find the better model. We try to add layers, changing hyperparams, changing activation function, adding or removing some neurons and … without caring about what the final function we are supposed to achieve. We do not know how the shape of the loss function will be. Is that convex or concave or none of them? So, how we can understand that we can find the min of the loss function with gradient descend?
you could apply the function on toy (or artificially constructed) datasets with known solutions – and test if descent methods bring you to these solutions.
If the function + some effort in tuning hyperparameters can find the known solutions on these carefully constructed datasets, you might have a better understanding of your function’s curvature and it’s feasibility to apply gradient methods.
Most of the time our loss function is nonconvex, so we may have local minimum. But this is not a problem when we are working on high dimensional space. Because any parameter can relieve us from the local minimum