Here I just want to update the NetG, NetD will not be updated., I think the code is correct for updating the generator
sorry, when I want to add loss together to have different loss and then used .backward(0, should the loss be in the same range? for exam both be in between 0 and 1 or can be different?
Losses can be generally in different ranges, but note that the gradient magnitudes would thus also be in different ranges. This could basically mask the “smaller loss” as the gradients would look like noise during the calculation.
do you think is i better to use
``
MeanSquareError=(Difference**2).mean()
##  ALL defined loss 
errG=errG1+MeanSquareError
##  back propagate
errG.backward()
# Update G
optimizerG.step()
orrr
MeanSquareError.backward()
errG1.backward()
optimizerG.step()
The result should be the same, but I would use the second approach, as the backward
operation would only be called once and should thus be faster.
Hi Ptrblck,
I try to use the second different loss function and add it to the original one as I said before, but no updating occur in the weights. I change the second loss functions but no changes. Do you think is there any thing wrong? I am running the code on GPU. The first loss is nn.BCELoss() and teh second is L1. The result is as same as using just BCNLoss, L1 or other losses does not have effect.
netG = Generator994(ngpu,nz,ngf).to(device)
optimizerG = optim.Adam(netG.parameters(), lr=lr2, betas=(beta1, 0.999))
netG.zero_grad()
label.fill_(real_label)
label=label.to(device)
output = netD(fake).view(1)
# Calculate G's loss based on this output
errG1 = criterion(output, label)
xxx=torch.histc(GaussyMask.squeeze(1).view(1).cpu(),100, min=0, max=1, out=None)
ddGaussy=xxx/xxx.sum()
xxx1=torch.histc(fake.squeeze(1).view(1).cpu(),100, min=0, max=1, out=None)
ddFake=xxx1/xxx1.sum()
MSECMBSS=abs(ddGaussyddFake).sum()
# Calculate gradients for G adding two losses
errG=errG1+MSECMBSS
errG.backward()
D_G_z2 = output.mean().item()
D_G_z22+=D_G_z2
# Update G
optimizerG.step()
@ptrblck Just wanted to appreciate your amazing patience and grace in answering even totally nonpytorch questions!
(have benefitted from reading your answers on several occasions)
Hi,
Would i able to use numpy operations at first and at last return tensor operations.
Im making custom triplet loss function , here is my code
class HardTripletLoss(nn.Module):
def __init__(self, alpha=0.25):
super(HardTripletLoss,self).__init__()
self.alpha = alpha
def forward(self, q1_vec, q2_vec):
#..... some numpy operations with tensors
l_full = torch.mean(l_1 + l_2)
return l_full
would i able to do .backward
.
I could use torch operations with this functions but some operations like np.max
or np.maximum
or some other which is difficult to do with torch.
operations some good torch functions are in unstable yet.
If possible
would you please give me idea to use numpy operations on custom loss
Thanks
Also
I have tried to use l_full=torch.mean(l_1+l_2, requires_grad=True)
for gradient and while computing q1_vec and q2_vec
at first i used .detach().numpy()
,
which i used a toy example, dont know whether it worked or not but gave gradient value for .backward()
method,
v1 = torch.tensor([[0.26726124, 0.53452248, 0.80178373],[0.5178918 , 0.57543534, 0.63297887]], requires_grad=True)
v2 = torch.tensor([[ 0.26726124, 0.53452248, 0.80178373],[0.5178918 , 0.57543534, 0.63297887]], requires_grad=True)
HardTripletLoss()(v1, v2).backward()
tensor(0.5509, grad_fn=<DivBackward0>)
No, Autograd won’t be able to track the numpy operations, so you would need to implement the backward pass manually via a custom autograd.Function
as described here.
I don’t understand the second code snippet, as l_full
as lwell as the other tensors are not used in the last HardTripletLoss
example.
Hi @ptrblck. I have a similar problem where I’m creating a custom loss function extending the nn.module. However the results are absurd.
def forward(self, x):
with torch.set_grad_enabled(True):
time_step =torch.tensor(0.01)
out=self._rk4_step1(self.function, x, 0, time_step)
return out
def function(self,x,t):
self.n = n = x.shape[1]//2
qqd = x.requires_grad_(True)
L = self._lagrangian(qqd).sum()
J = grad(L, qqd, create_graph=True)[0] ;
DL_q, DL_qd = J[:,:n], J[:,n:]
DDL_qd = []
for i in range(n):
J_qd_i = DL_qd[:,i][:,None]
H_i = grad(J_qd_i.sum(), qqd, create_graph=True)[0][:,:,None]
DDL_qd.append(H_i)
DDL_qd = torch.cat(DDL_qd, 2)
DDL_qqd, DDL_qdqd = DDL_qd[:,:n,:], DDL_qd[:,n:,:]
T = torch.einsum('ijk, ij > ik', DDL_qqd, qqd[:,n:])
qdd = torch.einsum('ijk, ij > ik', DDL_qdqd.pinverse(), DL_q  T)
return torch.cat([qqd[:,self.n:], qdd], 1)
def _lagrangian(self, qqd):
x = F.softplus(self.fc1(qqd))
x = F.softplus(self.fc2(x))
# x = F.softplus(self.fc3(x))
L = self.fc_last(x)
return L
def _rk4_step1(self, f, x, t, h):
# one step of RungeKutta integration
k1 = torch.mul(f(x, t),h)
k2 = torch.mul(f(x + k1/2, t + h/2),h)
k3 = torch.mul(f(x + k2/2, t + h/2),h)
k4 = torch.mul(f(x + k3, t + h),h)
return x + 1/6 * (k1 + 2 * k2 + 2 * k3 + k4)
Is the autograd able to track all gradient even though I’m calling the forward multiple times in the rk4_step?
It might be a dumb question.
How does Autograd know how to compute the gradients of any arbitrary Loss function which is implemented using Tensor operations?
The derivatives.yaml file contains definitions for the backward passes for the implemented operations.
Autograd will track all operations in the forward pass and use these derivatives to compute the gradient of the loss w.r.t. the parameters.
That helps, thanks a bunch!
I have a CNN architecture as follows:
Forward:
Conv1: (3, 32, 5, 1, 0)
Conv2: (32, 64, 5, 1, 0)
Conv3: (64, 128, 5, 1, 0)
Conv4: (128, 256, 5, 1, 0)
And output layer as convolutional layer itself.
Conv5: (256, 10, *, 1, 0)
All convolutional layers are customized with torch.autograd.function i.e. they have forward and backward defined in it.
I am using two loss functions:
Class my_Loss_func(torch.nn.Module):
init...
Forward...
return loss1
In training loop:
…
…
Loss1 = my_Loss_func (output of conv4, labels)
Loss2 = torch.nn.CrossEntropyLoss(final output, labels)
loss = Loss1 + Loss2
loss.backward()
optimizer.step
…
In doing so, I think backward pass would still execute but wrongly. Because back propagation is happening twice (override?) through conv4 i.e. once during Loss1 and the other time during Loss2 as such they are added. (So, update will take place twice as well ?)
What one wants is, first conv4 should be updated(only once
after back propagating once
) then conv3 , then …conv1.
Your code snippet is a bit unclear, so I’m not completely sure what your use case is.
However, the backward
call will calculate gradients of both losses w.r.t. the parameters used to calculate these losses.
If some parameters were used in both loss calculations, the gradient will be accumulated for these parameters.
Actually, Loss1 is contrastive loss whose inputs are features from conv4 and labels. In addition to that, Loss2 is just the cross entropy loss whose inputs are output of the network after conv5 and labels.
Task: Image classification
So, is it correct if I say :

The parameters of conv4 will get updated twice, once according to Loss1 and the second time according to Loss2 ?

The parameters of layers other than conv4 will also get updated according to Loss1 ?
I am guessing 1. should happen and 2. shouldn’t.
What do your opinion?
 No, the parameters will get updated in the
optimizer.step()
call. The gradients of parameters of reused modules will get accumulated, if the corresponding computation graph uses them.
A small illustration of my last post:
Assuming your model architecture is:
input > conv1 > conv2 > conv3 > conv4 > conv5 > output > loss2
\> conv4_output > loss1
If this is the workflow of the loss calculations, then loss1.backward()
will accumulate gradients for the parameters in conv1,2,3,4
, while loss2.backward()
will accumulate gradients for the parameters in conv1,2,3,4,5
.
The same applies for the sum of both losses.
Hello ptrblck,
I an trying to create a custom loss function in CNN for regression. The input is a binary image (600x600) which the background is black, and foreground is white. The ground truth associated with each input is an image with color range from 0 t 255 which is normalized between 0 and 1.
x =Input, ground truth=y and predicted output=y_hat
I tried to penalize the foreground by custom loss function below, but it didn’t improve the result. I am wondering whether my idea is right or not, if yes what’s wrong with my custom function?
mse = nn.MSELoss(reduction=‘mean’)
def criterion(y, y_hat, x, loss, weight=0.1):
y_hat_modified = torch.where(x[:,0:1]==1, weight*y_hat,y_hat) # x[:,0:1] is input
return loss(y,y_hat_modified)
I created a topic for it and you can see more detailed info there.
custom loss function for regression in cnn
Yes, that’s where I’m confused.
So, for all the parameters of conv1,2,3,4, will there be 2 values of gradients stored in .grad? or only one value will be there in .grad because of override?
optimizer.step()
updates all the parameters based on parameter.grad
. So, I doubt if both the gradients in .grad will be used for update or maybe they are added and then… I don’t know.