Hi all,

I have developed a tracking system which operates on video. I also have strong constraints about embeddability (my code needs to run on small cards/CPUs) and the tracker must be real time. Thus, I spend a lot of time to optimize my code, using torch functions and doing as many operations as possible on GPU.

But there is a point not clear to me: I am not sure when data is exchanged between CPU and GPU, and how memory is exactly managed. For instance:

```
a = torch.tensor([1, 2, 3]).float().cuda() # tensor a will be on GPU
b = a + 1 # tensor b will also be on GPU
c = len(a) # tensor c will still be on GPU
```

Few questions about these basic operations:

- On line 1, does the
`.float()`

function make a copy of`a`

in memory? - On line 2, does the
`+`

operation copy the data of`a`

to the CPU, then compute the result, then send the data back to the GPU? Or is torch intelligent enough to make the addition directly on the GPU? Is it always better to use the`.add()`

torch function? - Do functions such as
`len()`

make a copy of the tensor on the CPU before counting?

And finally, is there a difference between these two lines:

```
a = torch.tensor(1.).cuda().add_(1.)
a = torch.tensor(1.).cuda().add_(torch.tensor(1.).cuda())
```

Thank you in advance, any help will be very appreciated!