Hi, I just started with Pytorch and basic arithmetic operations are the best to begin with, i believe.

I performed element-wise multiplication using Torch with GPU support and Numpy using the functions below and found that Numpy loops faster than Torch which shouldn’t be the case, I doubt.

I want to know how to perform general arithmetic operations with Torch using GPU.

**Note:** I ran these code snippets in Google Colab notebook

**Define the default tensor type to enable global GPU flag**

`torch.set_default_tensor_type(torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor)`

**Initialize Torch variables**

x = torch.Tensor(200, 100) # Is FloatTensor

y = torch.Tensor(200,100)

def mul(d,f):

g = torch.mul(d,f).cuda() # I explicitly called cuda() which is not necessary

return g

When call the function above as `%timeit mul(x,y)`

**Returns:**

The slowest run took 10.22 times longer than the fastest. This could mean that

an intermediate result is being cached.

10000 loops, best of 3: 50.1 µs per loop

Now trial with numpy,

Used the same values from torch variables

x_ = x.data.cpu().numpy()

y_ = y.data.cpu().numpy()

def mul_(d,f):

g = d*f

return g

`%timeit mul_(x_,y_)`

**Returns**

The slowest run took 12.10 times longer than the fastest. This could mean that

an intermediate result is being cached.

100000 loops, best of 3: 7.73 µs per loop

Huge speed difference, i notice.

I already posted the same question in Stackoverflow: (https://stackoverflow.com/questions/52526082/pytorch-cuda-vs-numpy-for-arithematic-operations-fastest)