You can pass the parameter in a list
to it:
param = nn.Parameter(torch.randn(1))
model = MyModel()
optimizer = torch.optim.SGD(list(model.parameters()) + [param], lr=1e-3)
You can pass the parameter in a list
to it:
param = nn.Parameter(torch.randn(1))
model = MyModel()
optimizer = torch.optim.SGD(list(model.parameters()) + [param], lr=1e-3)
I see you are so nice !
Hi @ptrblck,
I’m trying to make a custom loss function but am getting “RuntimeError: grad can be implicitly created only for scalar outputs” on the backward pass.
def dir_loss(output, target, data_fin):
delts = -torch.mul((output - data_fin), (target - data_fin))
return F.relu(delts)
I want the loss to be zero when negative and some value when not. Any suggestions?
The error is raised if you call .backward()
on a tensor with more than a single element and you would then either have to specify the gradient manually or reduce the tensor before as seen here:
loss = torch.randn(2, 2, requires_grad=True)
loss.backward()
> RuntimeError: grad can be implicitly created only for scalar outputs
# reduce
loss.mean().backward() # works
# manual gradient
loss.backward(torch.ones_like(loss)) # works
Hi again @ptrblck,
I’ve figured I might be better off using an if statement as follows:
def dir_loss(outputs,targets,fin_val):
loss=torch.mean(torch.where((outputs-fin_val)*(targets-fin_val)<0,10*(outputs-targets)**2,0.01*(outputs-targets)**2))
return loss
I’m trying to get the loss to penalize the network when the direction is not the same and reward when they are, similar to this paper: Exploring the Impact of Magnitude - and Direction-based Loss Function on the Profitability using Predicted Prices from Deep Learning by Chihcheng Hsu, Lichen Tai :: SSRN. However, this seems to have no effect so far. What is the best way to make the loss function reward the same direction?
Nevermind, I found a work around by modifying the targets based on the if statement before connecting them to the graph.
Hi @ptrblck
I am trying to use PyTorch to estimate count regression model parameters. I have created a small GitHub repository documenting the project.
I am struggling to update parameters of certain classes of regression models (Poisson Inverse Gaussian, Sichel and Delaport). The loss functions (log probability density functions) in either Numpy or PyTorch seem to be specified correctly (and agree with previous R, C, etc. implementations). However, when I try to perform NLL.backward() step in to calculate gradients (using PyTorch/AutoGrad) I get an error - which I cannot seem to debug?
For example: the Poisson Inverse Gaussian loss (negative log likelihood) is given below:
def pig_nll(x, mu, sigma):
## Determine length of data vector and parameters
ly = int(torch.max(torch.Tensor([len(x), len(mu), len(sigma)])).item())
#x = np.repeat(a=x, repeats=ly)
nsigma = sigma.repeat(ly)
nmu = mu.repeat(ly)
## Initial vectors to store computed PIG density values
ny = int(len(x))
maxyp1 = x.max().item() + 1
tofY = torch.zeros(int(maxyp1))
sumlty = torch.zeros(ly)
## Big for loop to compute PIG density (or log-density)
## This is directly from Rigby et al: tofyPIG2.c code.
for i in torch.arange(1, ny+1, dtype=torch.int32):
iy = x[i.item()-1] + 1
tofY[0] = nmu[i.item()-1] * ((1 + 2*nsigma[i.item()-1]*nmu[i.item()-1])**(-0.5))
sumT = torch.Tensor([0])
## Start inner loop to compute rest of PIG density
if (x[i.item()-1]==0):
sumT = torch.Tensor([0])
else:
for j in torch.arange(1, iy, dtype=torch.int32):
tofY[j.item()] = ((nsigma[i.item()-1] * ((2*(j.item())-1)/nmu[i.item()-1])) + (1/tofY[j.item()-1])) * ((tofY[0])**2)
sumT = sumT + torch.log(tofY[j.item()-1])
sumlty[i.item()-1] = sumT
## Add the kernel of the PIG density back to other constant component
logfy = -torch.lgamma(x+1) + (1 - torch.sqrt(1 + 2*sigma*mu))/sigma + sumlty
## Return neg log lik to user
nll = -torch.sum(logfy)
return nll
My PyTorch training loop looks as follows:
## Instantiate data tensor, and variable for (binomial) model parameters
x = torch.autograd.Variable(torch.from_numpy(dat.fish.to_numpy())).type(torch.FloatTensor)
l_mu = torch.autograd.Variable(torch.rand(1), requires_grad=True)
l_sigma = torch.autograd.Variable(torch.rand(1), requires_grad=True)
# torch.autograd.set_detect_anomaly(True)
## Learning rate
learning_rate_mu = 2e-5
learning_rate_sigma = 2e-5
## Training loop
for t in range(25000):
## Backprop on negative log likelihood loss
NLLpig = pig_nll(x=x, mu=l_mu, sigma=l_sigma)
NLLpig.backward()
## Logging to console
if t % 1000 == 0:
print("Iteration = ", t,
"loglik =", NLLpig.data.numpy(),
"lmu =", l_mu.data.numpy(),
"lsigma =", l_sigma.data.numpy(),
"dL/dlmu = ", l_mu.grad.data.numpy(),
"dL/dlsigma = ", l_sigma.grad.data.numpy())
## SGD update of parms
l_mu.data -= learning_rate_mu * l_mu.grad.data
l_sigma.data -= learning_rate_sigma * l_sigma.grad.data
## Zero the gradients
l_mu.grad.data.zero_()
l_sigma.grad.data.zero_()
And the error I get is below:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-14-15d640d81abc> in <module>
9 ## Backprop on negative log likelihood loss
10 NLLpig = pig_nll(x=x, mu=l_mu, sigma=l_sigma)
---> 11 NLLpig.backward()
12 ## Logging to console
13 if t % 1000 == 0:
~\anaconda3\envs\pytorch_env\lib\site-packages\torch\_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
253 create_graph=create_graph,
254 inputs=inputs)
--> 255 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
256
257 def register_hook(self, hook):
~\anaconda3\envs\pytorch_env\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
145 retain_graph = create_graph
146
--> 147 Variable._execution_engine.run_backward(
148 tensors, grad_tensors_, retain_graph, create_graph, inputs,
149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor []], which is output 0 of SelectBackward, is at version 2992; expected version 2991 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Here is error when torch.autograd.set_detect_anomaly(True)
is enabled"
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-14-c1fae9bda3ae> in <module>
9 ## Backprop on negative log likelihood loss
10 NLLpig = pig_nll(x=x, mu=l_mu, sigma=l_sigma)
---> 11 NLLpig.backward()
12 ## Logging to console
13 if t % 1000 == 0:
~\anaconda3\envs\pytorch_env\lib\site-packages\torch\_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
253 create_graph=create_graph,
254 inputs=inputs)
--> 255 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
256
257 def register_hook(self, hook):
~\anaconda3\envs\pytorch_env\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
145 retain_graph = create_graph
146
--> 147 Variable._execution_engine.run_backward(
148 tensors, grad_tensors_, retain_graph, create_graph, inputs,
149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor []], which is output 0 of SelectBackward, is at version 2992; expected version 2991 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Do you have any suggestions on why the PIG negative log likelihood loss is causing issues?? DEL/Sichel log likelihoods are throwing similar issues, so think it relates to way I have ported NLL from R/C to PyTorch in general. Thanks in advance for your help!! This thread is an amazing resource!!
I would start with removing the usage of all .data
attributes (as its usage is deprecated and could yield unwanted side effects). If necessary, wrap the code in a with torch.no_grad()
block.
To isolate the offending inplace operation you could replace all inplace ops (e.g. tofY[0] = nmu[i.item()-1] ...
) with their out-of-place equivalent. E.g. instead of creating tofY
as a zero tensor and assigning values to it, you could append the intermediate values in a list
and use e.g. torch.stack
on it afterwards to create the tensor.
Hi all,
Can I just add the calculation of loss inside forward()
method itself and call backward
directly on the output of the model(x)
?
My loss functions requires a lot of internal variables.
Thanks
Yes, that should be possible. However, I would also recommend to check which utilities you are planning to use and how it could interact with this “non-pure” forward
method. E.g. I don’t know what impact this approach would have for quantization, scripting, checkpointing etc.
You should be careful writing your loss inside of model itself. Specially whey it has internal variables. If you define loss inside your model, then calling model.parameter()
for optim, may have some of the loss variables in it.
I am facing also an error while i am trying to use dice score multi-class as custom loss fnc
Here it is the loss fnction
def dice(output, target):
dice_tmp = 0
for index in range(3):
dice_tmp += (2 * (output[:,index,:,:] * target[:,index,:,:]).sum()) / ((output[:,index,:,:] + target[:,index,:,:]).sum() + 1e-8)
dice = torch.mean(dice_tmp) # taking averag
return dice
here it is the training script:
loop = tqdm(loader)
for data, targets in loop:
data = data.to(device=DEVICE)
targets = targets.float().to(device=DEVICE)
#dice = 0
# forward
with torch.cuda.amp.autocast():
predictions = (torch.sigmoid(model(data)) > 0.5).float()
loss = dice(predictions, targets)
print(type(loss))
# backward
optimizer.zero_grad()
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
# update tqdm loop
loop.set_postfix(loss=loss.item())
And here it is the error that i am facing
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Hi, I would like to use a custom “case-defined” loss function. Something like that:
class myLoss(nn.Module):
def __init__(self, device="cuda"):
super(myLoss, self).__init__()
self.device = device
def forward(self, input):
if input >= 0:
return torch.log(2*input+1)
elif input < 0:
return torch.log(-2*input+1)
Is it ok? @ptrblck
Furthermore, now I am assuming that input
is scalar. But how can I adapt the loss function for batched tensors? I was think to a for cycle, but maybe thre are more efficient solutions.
Thanks a lot!
Using conditions is fine unless you are tracing your model via torch.jit.trace
as it’s not able to capture data-dependent control flow.
To use a condition on a batch of samples you could use torch.where
:
out = torch.where(input>=0, torch.log(2*input+1), torch.log(-2*input+1))
Hi @ptrblck,
Thanks for all your answers on this thread. I learned a lot.
Still, I have 1 question. I have a stateful loss function as below:
class PositionLossNormalized(torch.nn.Module):
def __init__(self, beta, epsilon=1e-8):
super(PositionLossNormalized, self).__init__()
self.beta = beta
self.beta_t = beta
self.running_avg = 0.0
self.epsilon = epsilon
def forward(self, targets, predictions):
sqr_diff = torch.square(targets - predictions)
Pxj = torch.mean(sqr_diff, dim=-1)
Pxj2 = torch.square(Pxj)
avg_Pxj = torch.mean(Pxj)
avg_Pxj2 = torch.mean(Pxj2)
self.running_avg = self.beta * self.running_avg + (1.0 - self.beta) * avg_Pxj2.item()
loss_val = avg_Pxj / (np.sqrt(self.running_avg / (1.0 - self.beta_t)) + self.epsilon)
self.beta_t = self.beta * self.beta_t
return loss_val
Would this work considering I have a numpy operation (viz. np.sqrt). My thinking is that the I am just scaling the value of avg_Pxj (which is autodiff-able) with a python scalar, and this should not break the autodiff chain. I ran the code and pytorch proceeds without any error. Am I right in doing this ? Or, should I change the code to use only pytorch functions as below :
class PositionLossNormalized(torch.nn.Module):
def __init__(self, beta, epsilon=1e-8):
super(PositionLossNormalized, self).__init__()
self.beta = torch.as_tensor(beta)
self.beta_t = torch.as_tensor(beta)
self.running_avg = torch.as_tensor(0.0)
self.epsilon = torch.as_tensor(epsilon)
def forward(self, targets, predictions):
sqr_diff = torch.square(targets - predictions)
Pxj = torch.mean(sqr_diff, dim=-1)
Pxj2 = torch.square(Pxj)
avg_Pxj = torch.mean(Pxj)
avg_Pxj2 = torch.mean(Pxj2)
self.running_avg = self.beta * self.running_avg + (1.0 - self.beta) * avg_Pxj2.clone().detach()
loss_val = avg_Pxj / (torch.sqrt(self.running_avg / (1.0 - self.beta_t)) + self.epsilon)
self.beta_t = self.beta * self.beta_t
return loss_val
Running without errors does not necessarily mean the graph will not be broken by autograd. However, note that standard Python operations can be used and apply elementwise:
loss = x/(y/(1-t)+z)**0.5
I’m not sure for NumPy how they define the operation underneath. But probably not a good idea as it’s likely it gives out a NumPy array, which breaks the graph.
And if using division, you may want to get in the habit of assigning inf/nan values. For example:
loss[torch.isinf(loss)|torch.isnan(loss)]=1000
I agree with @J_Johnson and think it would be cleaner to either use PyTorch methods or pure Python operations. Using np.sqrt
could be fine in this case but it also doesn’t seem to be necessary as torch.sqrt
or the Python equivalent can be used.
Just had an opportunity to test this:
import torch
import numpy as np
x=torch.rand((2,3), requires_grad=True)
print(np.sqrt(x))
RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
If the tensor has gradients, you’ll get the above error.
I want to use torch.argsort.
That breaks the gradients. Can you please tell me how do I resolve this?
The output of my CNN is the log probability of indices of the original array. Hence, I want to use 10 best of the indices to get the best elements of the original array. I hope I was able to convey the idea
My custom loss function includes torch.abs(). Do I need to inherit from nn.Module?