I am working on writing a custom convolution function that involves taking the square of some inputs (x). I am wondering what are the differences between different square implementations:

- torch.square(x)
- torch.pow(x)
- x**2
- x*x

I would like to know how PyTorch handles each of these operations and if there is any difference in term of gradient computation and efficiency.

Thank you