Creating zeros of any dtype (performance)

Hi, I want to create arrays without knowing the dtype beforehand. My current problem is that I’m doing:
torch.zeros(myshape).type_as(mydata) but this internally copies the zero array for the new dtype, creating a perfomance bottleneck for my code, especially when copying to gpu.

I see that on master documentation you can do torch.zeros(myshape, dtype=mydata.dtype) which I assume avoids the copy.

Is there any way to avoid that copy with the 0.3.1 pytorch version?

I benchmarked for example creating the array in numpy for the correct dtype and the performance difference is huge

In [2]: import torch

In [3]: import numpy as np

In [4]: data = torch.from_numpy(np.zeros((10000, 10000), dtype=np.float32))

In [5]: data.type()
Out[5]: 'torch.FloatTensor'

In [6]: %timeit tmp = torch.zeros((10000, 10000)).type_as(data)
50.7 ms ± 3.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [7]: %timeit tmp = torch.from_numpy(np.zeros((10000, 10000), dtype=np.float32
   ...: ))
7.67 µs ± 29.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The problem being obviously that to use the second method I would need to write down a mapping from torch dtypes to numpy dtypes and then also handle the gpu tensors separately. And even then for GPUs it would still be copying the array instead of creating it directly.

The easiest way is to upgrade to 0.4 when zeros takes a dtype argument.
In 0.3, you could do torch.FloatTensor(10,7).zero_() or somesuch.

Best regards


I thought the next release would take months and here it comes right after my issue :smiley: I love it and will try to migrate over.