Is it possible to make a function transparent between cpu and gpu?

I defined a function in which there’re temporary variables are defined, for example:

def foo(input_variable):    
    tmp = Variable(torch.zeros(input_variable.size()))    
    return, tmp), 1)       

This function runs well on CPU, however when run on GPU, it will break because tmp is created on CPU by default.

Is there any elegant way to make it transparent? I.e., tmp will be created on the same device with input_variable?

1 Like may be helpful

by the way, you can use below to format your code


def foo(input_variable):
tmp = Variable(torch.zeros(input_variable.size()))
return, tmp), 1)

it will be formated like this

def foo(input_variable): 
     tmp = Variable(torch.zeros(input_variable.size())) 
     return, tmp), 1)

it’s markdown sytanx

Thanks, this really helps!

How do you use ?

The following works when X is on CPU but fails to if its not:

>>> X = Variable(torch.zeros(2, 2))
>>>, torch.zeros(2, 2))

 0  0
 0  0
[torch.FloatTensor of size 2x2]

>>> X = Variable(torch.zeros(2, 2)).cuda()
>>>, torch.zeros(2, 2))
TypeError                                 Traceback (most recent call last)
<ipython-input-5-44093d4f4ab5> in <module>()
----> 1, torch.zeros(2, 2))

TypeError: unbound method new() must be called with FloatTensor instance as first argument (got FloatTensor instance instead)


If you have an input Tensor a, you should replace torch.FloatTensor(2,2) by,2) to create a Tensor of the same type as a.
If you want to created tensor to be zeroed out, you can do b =,2).zero_().
This will work for a being any type (cuda included).

If your second tensor already exist, you can also look into the type_as method to change the type of a tensor to the type of another tensor.

In you case it was not working because, torch.zeros(2, 2)) is equivalent, for being a cuda.FloatTensor to torch.cuda.FloatTensor(torch.FloatTensor(2,2).zero_()) meaning that you try to create a cuda tensor from a cpu tensor which is not allowed.


After trying out with, it turns out it’s much slower than creating a Variable on CPU then moving it into GPU.

For example, I experimented with 3 ways of adding noise to input variable x:
Way 1:
Way 2:
noise =torch.from_numpy(np.random.rand(*x.size()).astype(np.float32)).cuda()
Way 3:
noise =
And then += noise

For batch size = 128, on my model way 1 runs 3 times slower than way 2 & 3 (~4s vs ~1.6s). Way3 is slightly faster than way 2.

Is this supposed to be or am I doing wrong? Besides, is there any way to get the device a tensor currently resides on?

what you want to do is:

noise =

That will be the fastest.

1 Like

did you mean ?

What does actually do? Why way1 is so slow?



you should avoid the CPU -> GPU copy, that was slow. Also use torch’s build_in function if it’s accessible, even though it won’t cost much to transform between numpy.ndarray and torch.Tensor.

as far as I know, has 3 usages:

  1., size2, …)
  2. or
  3. which would create tensor, then you can resize like