Is it possible to make a function transparent between cpu and gpu?

david-leon · March 21, 2017, 7:01am

I defined a function in which there’re temporary variables are defined, for example:

def foo(input_variable):    
    tmp = Variable(torch.zeros(input_variable.size()))    
    return torch.cat((input_variable, tmp), 1)

This function runs well on CPU, however when run on GPU, it will break because tmp is created on CPU by default.

Is there any elegant way to make it transparent? I.e., tmp will be created on the same device with input_variable?

chenyuntc · March 21, 2017, 9:32am

tensor.new may be helpful

by the way, you can use below to format your code

```python

def foo(input_variable):
tmp = Variable(torch.zeros(input_variable.size()))
return torch.cat((input_variable, tmp), 1)

it will be formated like this

def foo(input_variable): 
     tmp = Variable(torch.zeros(input_variable.size())) 
     return torch.cat((input_variable, tmp), 1)

it’s markdown sytanx

david-leon · March 21, 2017, 9:48am

Thanks, this really helps!

pranav · March 21, 2017, 10:57am

How do you use torch.Tensor.new ?

The following works when X is on CPU but fails to if its not:

>>> X = Variable(torch.zeros(2, 2))
>>> torch.Tensor.new(X.data, torch.zeros(2, 2))

 0  0
 0  0
[torch.FloatTensor of size 2x2]

>>> X = Variable(torch.zeros(2, 2)).cuda()
>>> torch.Tensor.new(X.data, torch.zeros(2, 2))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-44093d4f4ab5> in <module>()
----> 1 torch.Tensor.new(X.data, torch.zeros(2, 2))

TypeError: unbound method new() must be called with FloatTensor instance as first argument (got FloatTensor instance instead)

albanD · March 21, 2017, 11:06am

Hi,

If you have an input Tensor a, you should replace torch.FloatTensor(2,2) by a.new(2,2) to create a Tensor of the same type as a.
If you want to created tensor to be zeroed out, you can do b = a.new(2,2).zero_().
This will work for a being any type (cuda included).

If your second tensor already exist, you can also look into the type_as method to change the type of a tensor to the type of another tensor.

In you case it was not working because torch.Tensor.new(X.data, torch.zeros(2, 2)) is equivalent, for X.data being a cuda.FloatTensor to torch.cuda.FloatTensor(torch.FloatTensor(2,2).zero_()) meaning that you try to create a cuda tensor from a cpu tensor which is not allowed.

david-leon · March 23, 2017, 2:03am

After trying out with x.new(), it turns out it’s much slower than creating a Variable on CPU then moving it into GPU.

For example, I experimented with 3 ways of adding noise to input variable x:
Way 1:
noise =x.data.new(np.random.rand(*x.size()))
Way 2:
noise =torch.from_numpy(np.random.rand(*x.size()).astype(np.float32)).cuda()
Way 3:
noise = x.data.clone().normal_()
And then
x.data += noise

For batch size = 128, on my model way 1 runs 3 times slower than way 2 & 3 (~4s vs ~1.6s). Way3 is slightly faster than way 2.

Is this supposed to be or am I doing wrong? Besides, is there any way to get the device a tensor currently resides on?

smth · March 23, 2017, 3:49am

what you want to do is:

noise = x.data.new(x.size()).normal_()

That will be the fastest.

pranav · March 23, 2017, 4:24am

did you mean x.data.new(...) ?

david-leon · March 23, 2017, 4:26am

What does Tensor.new() actually do? Why way1 is so slow?

chenyuntc · March 23, 2017, 5:06am

do

noise =x.data.new(*x.size()).normal_()

you should avoid the CPU -> GPU copy, that was slow. Also use torch’s build_in function if it’s accessible, even though it won’t cost much to transform between numpy.ndarray and torch.Tensor.

as far as I know, tensor.new has 3 usages:

tensor.new(size1, size2, …)
tensor.new(Tensor) or tensor.new(ndarray)
tensor.new() which would create tensor, then you can resize like a.new().resize_as_(b)