Since 0.4 upgrade torch.zeros is slow

hey pytorch community,

following code takes for my usage too long,

import timeit
u = timeit.Timer('torch.zeros(10,10,20)', setup='import torch')
u.timeit()

since the upgrade to 0.4 this has an avg runtime of 2.3 seconds.

did I miss something? :slight_smile:

best regards,
magnus

I tried your code and you are right. It does takes around 2 seconds if you measure it using the timeit.Timer function. But if you try it with just PyTorch code as in:

import torch
 a = torch.zeros(10, 10, 20)

Then it’s pretty much instantaneous. Seems like the timeit.Timer function adds some weird time overhead in this case. Maybe someone with more knowledge than me could clarify why that is the case.

Best regards,
diego

1 Like

Hey @Diego,

the Timer was just to demonstrate that it really takes this long -> but i agree with you there is alot of overhead, i’ll edit the post. I work a lot with RNNs and as you might know you’ll init your hidden (and cell) state with for instance with torch.zero tensors. After the update to 0.4 my code got extremely slow. I profiled my code and it turned out that torch.zeros was the problem…
Everytime I initialized my hidden with,
torch.zeros(3, 80, 1),
it took 0.5 seconds. What a disaster.

Best regards,
Magnus

That is very weird. Maybe you should post a snippet of the code (if possible) so that we can figure out where the problem is. I honestly don’t think that the torch.zeros function was changed so drastically as to cause that worsening in performance with the update. I could be wrong tho.

Yeah, somehow the setup in the timeit function does not seem to work properly for PyTorch. Maybe it’s because of the C-extensions and that it doesn’t import it completely before starting the timing and executing “torch.zeros” or sth like this.

Using the timeit magic command via ipython gives more reasonable results:

In [1]: import torch

In [2]: %timeit torch.zeros(10,10,20)
2.09 µs ± 31.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)