That helps, thanks a bunch!
I have a CNN architecture as follows:
Conv1: (3, 32, 5, 1, 0)
Conv2: (32, 64, 5, 1, 0)
Conv3: (64, 128, 5, 1, 0)
Conv4: (128, 256, 5, 1, 0)
And output layer as convolutional layer itself.
Conv5: (256, 10, *, 1, 0)
All convolutional layers are customized with torch.autograd.function i.e. they have forward and backward defined in it.
I am using two loss functions:
Class my_Loss_func(torch.nn.Module): init... Forward... return loss1
In training loop:
Loss1 = my_Loss_func (output of conv4, labels)
Loss2 = torch.nn.CrossEntropyLoss(final output, labels)
loss = Loss1 + Loss2
In doing so, I think backward pass would still execute but wrongly. Because back propagation is happening twice (override?) through conv4 i.e. once during Loss1 and the other time during Loss2 as such they are added. (So, update will take place twice as well ?)
What one wants is, first conv4 should be updated(
only once after back propagating
once) then conv3 , then …conv1.
Your code snippet is a bit unclear, so I’m not completely sure what your use case is.
backward call will calculate gradients of both losses w.r.t. the parameters used to calculate these losses.
If some parameters were used in both loss calculations, the gradient will be accumulated for these parameters.
Actually, Loss1 is contrastive loss whose inputs are features from conv4 and labels. In addition to that, Loss2 is just the cross entropy loss whose inputs are output of the network after conv5 and labels.
Task: Image classification
So, is it correct if I say :
The parameters of conv4 will get updated twice, once according to Loss1 and the second time according to Loss2 ?
The parameters of layers other than conv4 will also get updated according to Loss1 ?
I am guessing 1. should happen and 2. shouldn’t.
What do your opinion?
- No, the parameters will get updated in the
optimizer.step()call. The gradients of parameters of reused modules will get accumulated, if the corresponding computation graph uses them.
A small illustration of my last post:
Assuming your model architecture is:
input -> conv1 -> conv2 -> conv3 -> conv4 -> conv5 -> output -> loss2 \-> conv4_output -> loss1
If this is the workflow of the loss calculations, then
loss1.backward() will accumulate gradients for the parameters in
loss2.backward() will accumulate gradients for the parameters in
The same applies for the sum of both losses.
I an trying to create a custom loss function in CNN for regression. The input is a binary image (600x600) which the background is black, and foreground is white. The ground truth associated with each input is an image with color range from 0 t 255 which is normalized between 0 and 1.
x =Input, ground truth=y and predicted output=y_hat
I tried to penalize the foreground by custom loss function below, but it didn’t improve the result. I am wondering whether my idea is right or not, if yes what’s wrong with my custom function?
mse = nn.MSELoss(reduction=‘mean’)
def criterion(y, y_hat, x, loss, weight=0.1):
y_hat_modified = torch.where(x[:,0:1]==1, weight*y_hat,y_hat) # x[:,0:1] is input
I created a topic for it and you can see more detailed info there.
custom loss function for regression in cnn
Yes, that’s where I’m confused.
So, for all the parameters of conv1,2,3,4, will there be 2 values of gradients stored in .grad? or only one value will be there in .grad because of override?
optimizer.step() updates all the parameters based on
parameter.grad. So, I doubt if both the gradients in .grad will be used for update or maybe they are added and then… I don’t know.
There will be one
.grad value containing the sum of the gradients calculated during the
Alright , thanks for the explanation
so how is backward method inherited for custom functions? In case I have something more complicated like this:
from torch.nn.modules.loss import _Loss class GaussianLoss(_Loss): def __init__(self, sigma=None, abs_loss=None): super(GaussianLoss, self).__init__() assert sigma is not None assert abs_loss is not None self.sigma=sigma def forward(self, d): gaussian_val = torch.exp((-d).div(self.sigma)) return gaussian_val
In other words, does the autograd know how to take derivative of
d (which is
-d/sigma exp(-d/sigma) btw) ?
Yes, Autograd will be able to backpropagate all PyTorch functions, if they have a valid
.grad_fn (the majority of PyTorch ops has it unless the operation is not differentiable).
Thanks, so as long as I’m using
_Loss base class, it should work fine?
You don’t need to use
_Loss as the base class, but can use
Is this because ‘reduce’ was deprecated in favor of ‘reduction’?
You are not using
reduction anywhere in your code and just store the
sigma value in the class, so I’m unsure why you would need the
_Loss base class.
I guess I just assumed it’d a loss function’s base class that they all must inherit, since it’s what I saw in
You could still derive from
_Loss, if you want to set the
reduction parameter using the legacy checks as seen here.
This would also mean that you would call:
super(MSELoss, self).__init__(size_average, reduce, reduction)
__init__ method and could use
self.reduction in the
However, if you don’t need the
reduction argument, you can just use
In fact, you could even use a plain
Object class, as no parameters or buffers are registered in your custom loss function.
Hi,@ptrblck,ptrblck,could you answer some questions about custom loss funtion ? I use a autoencoder to recontruct a signal,input:x,output:y,autoencoder is made by CNN,I wanted to change the weights of the autoencoder,that mean I must change the weights in the autoencoder.parameters() .I made a custom loss function using numpy and scipy ,but I don’t know how to write backward function about the weight of autoencoder .Here is my loss function.If you know how to write it,please tell me,it is great matter to me.Thank you!
def forward(self, x, y, M, T):
x = x.detach().numpy()
x_std = np.std(x)
x_mean = np.mean(x)
x = (x-x_mean)/x_std
y = y.detach().numpy() y_std = np.std(y) y_mean = np.mean(y) y = (y-y_mean)/y_std N = len(x) XmT = np.zeros((N, M+1)) YmT = np.zeros((N, M+1)) for m in range(M+1): XmT[m*T:,m] = x[0:N-m*T] for m in range(M+1): YmT[m*T:,m] = y[0:N-m*T] self.save_for_backward(x, y) ckx = np.sum(np.multiply(np.prod(XmT,1),np.prod(XmT,1)))/(np.sum(np.multiply(x,x)))**(M+1) cky = np.sum(np.multiply(np.prod(YmT,1),np.prod(YmT,1)))/(np.sum(np.multiply(y,y)))**(M+1) ckloss = 1/(cky-ckx)**2 x = torch.tensor(x) y = torch.tensor(y) loss = torch.Tensor(ckloss) return loss #question ??? def backward(self, grad_loss): grad_output = grad_output.detach().numpy() x, y = self.saved_tensors x = x.numpy() y = y.numpy() grad_input = return torch.Tensor(grad_input)
self.M = M
self.T = T
output = autoencoderlossFuction(x, y, self.M, self.T)
Based on the provided code snippet I think you could replace all numpy operations with their PyTorch equivalent, which would automatically create the backward pass for you so that you don’t have to manually implement it.
I didn’t offer all the code,some codes are using scipy ,so I must wirte backward function