# Is there any faster way to rounding a tensor?

Run:
`torch.round(torch.tensor([-0.6, -0.5, -0.4, 0.4, 0.5, 0.6]))`

Got:
`tensor([-1., -0., -0., 0., 0., 1.])`.

Expect:
`tensor([-1., -1., -0., 0., 1., 1.])`.

I try:

``````x = torch.tensor([-0.6, -0.5, -0.4, 0.4, 0.5, 0.6])
x[x > 0] = torch.floor(x[x > 0] + 0.5)
x[x < 0] = torch.ceil(x[x < 0] - 0.5)
``````

But it is too slow.
Run this code on 2080TI,

``````import torch
import time

x = torch.rand(3, 64, 128, 128).float() * 10 - 5
x = x.cuda()
tic = time.time()
x[x > 0] = torch.floor(x[x > 0] + 0.5)
x[x < 0] = torch.ceil(x[x < 0] - 0.5)
toc = time.time()
print(toc - tic)  # 0.008876 (if use torch.round, it will be 0.0006)
``````

So, is there any faster way to rounding a tensor in my way(`0.5 -> 1` and `-0.5 -> -1`)?

OK, finally I build a cpp extension to solve this problem.
But I am also intersted in the python solution

You can always add `1e-6` to your original Tensor
That should only add a single element-wise operation over the Tensor and won’t have such a bad overhead.

``````x = torch.tensor([-0.6, -0.5, -0.4, 0.4, 0.5, 0.6])
y = torch.round(x + 1e-6)  # [-1., -0., -0., 0., 1., 1.]
y = torch.floor(x + 1e-6)  # [-1., -1., -1., 0., 0., 0.]
y = torch.ceil(x + 1e-6)  # [-0., -0., -0., 1., 1., 1.]
``````

All of them are not `[-1., -1., -0., 0., 1., 1.]`.

I tried `x = torch.sign(x) * torch.floor(torch.abs(x) + 0.5)`. It is indeed faster (a bit).
Thank you.

Ho right you have negative values as well
I’m afraid this is going to be hard to do as fast as round without reimplementing an efficient C kernel for it.

``````In [12]: x = torch.tensor([-0.6, -0.5, -0.4, 0.4, 0.5, 0.6])

In [13]: %timeit x.round()
56.9 µs ± 7.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [14]: %timeit torch.sign(x) * torch.floor(torch.abs(x) + 0.5)
349 µs ± 4.78 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [16]: %timeit (x + x.sign().float() / 2.).round()
305 µs ± 3.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
``````

Note that these timings may not be representative for larger Tensors though. Or if you’re running on GPU.

`(x + x.sign().float() / 2.).round()` is also incorrect.