# Is the type conversion differentiable?

I am using a mask based on comparison:

``````def combine_images(target, x, y):
diff1 = torch.abs(target - x)
diff2 = torch.abs(target - y)
return target_new
``````

My question is that is this function differentiable since it includes a type conversion `.float()`? Thanks.

Hi,

I just have a try on it with random tensors, it works.
In all these tensors, only mask with `requires_grad=False`.

Thanks. I have tried with

``````x = torch.tensor([0.0, 1.0], requires_grad=True)
z = (x * mask + y * (1 - mask) * 10).sum()
z.backward()

Out[18]: tensor([1., 0.])

Out[19]: tensor([ 0., 10.])
``````

It seems it is working and the result looks reasonable for this trivial case. Not sure if it applies to general cases.

1 Like

Type conversions are â€śdifferentiableâ€ť as can be seen in this dummy mixed-precision example:

``````x = torch.randn(1, 10, dtype=torch.float16, device='cuda')
w1 = torch.randn(10, 1, requires_grad=True, dtype=torch.float16, device='cuda')
w2 = torch.randn(1, 1, requires_grad=True, dtype=torch.float32, device='cuda')

output = torch.matmul(x, w1)
output = output.float()
output = torch.matmul(output, w2)

loss = (output - torch.randn(1, 1, dtype=torch.float32, device='cuda'))**2
loss.backward()

``````

As you can see, Iâ€™m starting with `float16` input and parameters, and convert them to `float32` later, which just works fine, as Autograd just transforms the gradients back to the appropriate type.

6 Likes

For `long` it seems like the operation does not have a `grad_fn`. So is there any way to still use Autograd when the type conversion is to `long`?

1 Like

No, integral tensors cannot have gradient, only floating-point tensors can.
The underlying reason is that integral-valued functions are not mathematically meaningfully differentiable (ie they might be (locally) constant, but thatâ€™s it).

3 Likes

Right, I see! Thanks for the clarification.

Slightly off-topic question then - inside a training loss, I need to access the values of a tensor [`y_true`] by indices. The other tensor [`y_pred`] which consists of the indices, is of type `float` and has `float` values. Since I need to compute the gradient, is there any way to access values of `y_true`, without rounding `y_pred` (would like to avoid this due to its zero gradient output almost everywhere) and then doing the type conversion of it to `long`? Please note that it is not possible to do any interpolation in `y_true` in this context. A minimal example of the said loss function is as following -

``````def trainloss(y_pred, y_true):
#y_pred is of shape [100,2], y_true is of shape [64,64]
idx = torch.round(y_pred)
idx = idx.long()
loss = y_true[idx[:,0],idx[:,1]]
loss = torch.max(loss)
return loss
``````

We would likely want to think about what the derivative should be mathematically before we look at the implementation. What should happen to y_pred relative to loss? You write that no interpolation is possible, but so if y_pred had some gradient that causes the rounding to decrement idx by one from one step to the other, then you end up with `y_true[old_idx[:, 0] - 1, old_idx[:, 1]`. This would suggest that if that is smaller than that `y_pred[:, 0]` should have a positive gradient. (Leaving aside the batch thingâ€¦)

One variant where you try to avoid needing to differentiating y_pred is the REINFORCE algorithm in RL. Essentially, when y_pred is a sample from a probability distribution, you can still say the probability of y_pred should go up if the loss is small (â€śsuccessâ€ť) or down if it is large (â€śfailureâ€ť).

1 Like

Thanks for the reply. Sorry, I should have been more precise. By saying â€™ it is not possible to do any interpolation in `y_true`', I meant that it is not possible to generate values of `y_true` at fractional indices by interpolation, such as `y_true[3.5,4.5]`. That is why the rounding is needed. So in a nutshell, I was wondering on how to compute the loss with a list (`y_pred`) of fractional indices. But I am guessing this is getting away from the topic of discussion in this post, I might make a separate post then.