As you can see, I’m starting with float16 input and parameters, and convert them to float32 later, which just works fine, as Autograd just transforms the gradients back to the appropriate type.
No, integral tensors cannot have gradient, only floating-point tensors can.
The underlying reason is that integral-valued functions are not mathematically meaningfully differentiable (ie they might be (locally) constant, but that’s it).
Slightly off-topic question then - inside a training loss, I need to access the values of a tensor [y_true] by indices. The other tensor [y_pred] which consists of the indices, is of type float and has float values. Since I need to compute the gradient, is there any way to access values of y_true, without rounding y_pred (would like to avoid this due to its zero gradient output almost everywhere) and then doing the type conversion of it to long? Please note that it is not possible to do any interpolation in y_true in this context. A minimal example of the said loss function is as following -
def trainloss(y_pred, y_true):
#y_pred is of shape [100,2], y_true is of shape [64,64]
idx = torch.round(y_pred)
idx = idx.long()
loss = y_true[idx[:,0],idx[:,1]]
loss = torch.max(loss)
return loss
We would likely want to think about what the derivative should be mathematically before we look at the implementation. What should happen to y_pred relative to loss? You write that no interpolation is possible, but so if y_pred had some gradient that causes the rounding to decrement idx by one from one step to the other, then you end up with y_true[old_idx[:, 0] - 1, old_idx[:, 1]. This would suggest that if that is smaller than that y_pred[:, 0] should have a positive gradient. (Leaving aside the batch thing…)
One variant where you try to avoid needing to differentiating y_pred is the REINFORCE algorithm in RL. Essentially, when y_pred is a sample from a probability distribution, you can still say the probability of y_pred should go up if the loss is small (“success”) or down if it is large (“failure”).
Thanks for the reply. Sorry, I should have been more precise. By saying ’ it is not possible to do any interpolation in y_true', I meant that it is not possible to generate values of y_true at fractional indices by interpolation, such as y_true[3.5,4.5]. That is why the rounding is needed. So in a nutshell, I was wondering on how to compute the loss with a list (y_pred) of fractional indices. But I am guessing this is getting away from the topic of discussion in this post, I might make a separate post then.