In this case, how can I define partial derivative of output with respect to the network parameters.
Should I have to implement all the chain rule?
Is there any method to get the partial derivative?
.....
# Input Data
X=np.random.rand(numRows * numSentences * numWords,inputDim)
# Resizing
X=X.reshape(totalBatches,numSentences,numWords,inputDim)
# Creating Y
Y=np.random.randint(2,size=(numRows,1))
for epoch in range(epochRange):
lossVal=0
for curBatch in range(totalBatches):
model.zero_grad()
dataInput=torch.autograd.Variable(torch.Tensor(X[curBatch]))
dataOutput=model(dataInput)
loss=lossCalc(dataOutput,torch.Tensor(Y[curBatch]))
loss.backward()
lossVal = lossVal + loss
optimizer.step()
if(epoch % 1==0):
print("For epoch {}, the loss is {}".format(epoch,lossVal))
print("Model Training completed")
Or do you want to use some operation on the gradients
If this is the case, how will you know the expected value for comparison?
My example code is just compare between input and output.
I think that torch.autograd calculates the partial derivative of output.
In this case
where u is output of network, f is input image and w is parameter of network.
But I don’t know how to calculate in my custom loss backward function.
Your custom loss only needs to compute the partial derivative wrt it’s input. The autograd engine will use the chain rule and the other gradients formulas implemented in pytorch to compute the gradienst wrt the weights.
What you are given, if your loss is L and you compute out = your_fn(inp), is grad_output is dL/dout. And you should return dL/dinp. That is usually written as dL/dinp = dL/dout * dout/dinp. which is the formula you present above.
The autograd will take care of computing dL/dw using your computed dL/dinp.
Hi , how did you solved problem? would you please explain whole custom loss how can be calculate with nice example.
Alban said we have to compute derivative of loss wrt input (dl/dinp) when using autograd but in tutorials custom relu function why we didnt calculated grad_input[input > 0 ] = 1.please would you give example on mse loss as you solved . Thanks