I am using a mask based on comparison:
def combine_images(target, x, y):
diff1 = torch.abs(target - x)
diff2 = torch.abs(target - y)
mask = (diff1 < diff2).float()
target_new = x * mask + y * (1 - mask)
My question is that is this function differentiable since it includes a type conversion
I just have a try on it with random tensors, it works.
In all these tensors, only mask with
Thanks. I have tried with
x = torch.tensor([0.0, 1.0], requires_grad=True)
y = torch.tensor([0.5, 0.5], requires_grad=True)
mask = (x < y).float()
z = (x * mask + y * (1 - mask) * 10).sum()
In : x.grad
Out: tensor([1., 0.])
In : y.grad
Out: tensor([ 0., 10.])
It seems it is working and the result looks reasonable for this trivial case. Not sure if it applies to general cases.
Type conversions are “differentiable” as can be seen in this dummy mixed-precision example:
x = torch.randn(1, 10, dtype=torch.float16, device='cuda')
w1 = torch.randn(10, 1, requires_grad=True, dtype=torch.float16, device='cuda')
w2 = torch.randn(1, 1, requires_grad=True, dtype=torch.float32, device='cuda')
output = torch.matmul(x, w1)
output = output.float()
output = torch.matmul(output, w2)
loss = (output - torch.randn(1, 1, dtype=torch.float32, device='cuda'))**2
As you can see, I’m starting with
float16 input and parameters, and convert them to
float32 later, which just works fine, as Autograd just transforms the gradients back to the appropriate type.
long it seems like the operation does not have a
grad_fn. So is there any way to still use Autograd when the type conversion is to
No, integral tensors cannot have gradient, only floating-point tensors can.
The underlying reason is that integral-valued functions are not mathematically meaningfully differentiable (ie they might be (locally) constant, but that’s it).
Right, I see! Thanks for the clarification.
Slightly off-topic question then - inside a training loss, I need to access the values of a tensor [
y_true] by indices. The other tensor [
y_pred] which consists of the indices, is of type
float and has
float values. Since I need to compute the gradient, is there any way to access values of
y_true, without rounding
y_pred (would like to avoid this due to its zero gradient output almost everywhere) and then doing the type conversion of it to
long? Please note that it is not possible to do any interpolation in
y_true in this context. A minimal example of the said loss function is as following -
def trainloss(y_pred, y_true):
#y_pred is of shape [100,2], y_true is of shape [64,64]
idx = torch.round(y_pred)
idx = idx.long()
loss = y_true[idx[:,0],idx[:,1]]
loss = torch.max(loss)
We would likely want to think about what the derivative should be mathematically before we look at the implementation. What should happen to y_pred relative to loss? You write that no interpolation is possible, but so if y_pred had some gradient that causes the rounding to decrement idx by one from one step to the other, then you end up with
y_true[old_idx[:, 0] - 1, old_idx[:, 1]. This would suggest that if that is smaller than that
y_pred[:, 0] should have a positive gradient. (Leaving aside the batch thing…)
One variant where you try to avoid needing to differentiating y_pred is the REINFORCE algorithm in RL. Essentially, when y_pred is a sample from a probability distribution, you can still say the probability of y_pred should go up if the loss is small (“success”) or down if it is large (“failure”).
Thanks for the reply. Sorry, I should have been more precise. By saying ’ it is not possible to do any interpolation in
y_true', I meant that it is not possible to generate values of
y_true at fractional indices by interpolation, such as
y_true[3.5,4.5]. That is why the rounding is needed. So in a nutshell, I was wondering on how to compute the loss with a list (
y_pred) of fractional indices. But I am guessing this is getting away from the topic of discussion in this post, I might make a separate post then.