They’re not documented much aside from this page. Is it recommended to use them in practice? Specifically, if I want to replicate the functionality called in the backward pass I can find the backward
op associated with a given forward op and call it separately to get the same result and it should be exactly the same. For example, this replicates the functionality of F.embedding
’s backward pass (approximately):
class EmbeddingTest(torch.autograd.Function):
# replicate the behavior of the embedding backward
@staticmethod
def forward(ctx, input, weight):
ctx.save_for_backward(input, weight)
return F.embedding(input, weight)
@staticmethod
def backward(ctx, grad_output):
input, weight = ctx.saved_tensors
grad_input = grad_weight = None
if ctx.needs_input_grad[0]:
raise NotImplementedError('non-differentiable in general')
if ctx.needs_input_grad[1]:
grad_weight = torch.ops.aten.embedding_dense_backward(grad_output, input, weight.size(0), -1, False)
return grad_input, grad_weight
What are the potential pitfalls to calling torch.ops.aten...
?