If you have an input Tensor a, you should replace torch.FloatTensor(2,2) by a.new(2,2) to create a Tensor of the same type as a.
If you want to created tensor to be zeroed out, you can do b = a.new(2,2).zero_().
This will work for a being any type (cuda included).
If your second tensor already exist, you can also look into the type_as method to change the type of a tensor to the type of another tensor.
In you case it was not working because torch.Tensor.new(X.data, torch.zeros(2, 2)) is equivalent, for X.data being a cuda.FloatTensor to torch.cuda.FloatTensor(torch.FloatTensor(2,2).zero_()) meaning that you try to create a cuda tensor from a cpu tensor which is not allowed.
After trying out with x.new(), it turns out it’s much slower than creating a Variable on CPU then moving it into GPU.
For example, I experimented with 3 ways of adding noise to input variable x:
Way 1: noise =x.data.new(np.random.rand(*x.size()))
Way 2: noise =torch.from_numpy(np.random.rand(*x.size()).astype(np.float32)).cuda()
Way 3: noise = x.data.clone().normal_()
And then x.data += noise
For batch size = 128, on my model way 1 runs 3 times slower than way 2 & 3 (~4s vs ~1.6s). Way3 is slightly faster than way 2.
Is this supposed to be or am I doing wrong? Besides, is there any way to get the device a tensor currently resides on?
you should avoid the CPU -> GPU copy, that was slow. Also use torch’s build_in function if it’s accessible, even though it won’t cost much to transform between numpy.ndarray and torch.Tensor.
as far as I know, tensor.new has 3 usages:
tensor.new(size1, size2, …)
tensor.new(Tensor) or tensor.new(ndarray)
tensor.new() which would create tensor, then you can resize like a.new().resize_as_(b)