are there some ways to combine statastical learning model within nn? if we want to use BP to train the model, do we have to implement classical statestical model by pytorch?
scikit-learn may be what you want.
but how do i get the gradient of the svm if we use it as the last layer or a block?
I know what’s your mean now. I have to admit that I haven’t seen anyone do this before. As far as I know, SVM can only change its hyperplane without changing the input features. In other words，it only assumes that its hyperplane is not perfect and would not think that the input was wrong.
We can think it in a different way. Suppose that SVM has the ability to change the representation of sample, which we call it as feats, we do not need DNN any more.
Sorry for not being able to help you.Maybe you can try other loss functions except Softmax.
l1-svm is not differentiable, but l2-svm is. If you want to use it, you will need to implement it and the gradient calculation (sec. 2.4 of paper below) by yourself.
There might be a substitution for SVM, that is, a SVM-like loss. This paper holds the opinion that minimizing their loss at the last layer is analogous to minimizing the margin in an SVM classifier. (See this paper sec.3.1 above, equation (3)). It looks like a hinge loss.
Sorry that I don’t know much about it. And the author did not give a theoretical analysis. You can try the loss if you want to.
Citing the sklearn implementation of the stochastic gradient descent regressor - you would have to use the hinge loss
parameter: loss - Defaults to ‘hinge’, which gives a linear SVM.
SVM 0-1 loss has infinite slope at 0, the hinge loss can be seen as a differentiable approximation to it.
For further details see