Efficient way to accumulate item into 1D tensor and not in a list

jpainam · December 12, 2018, 8:27am

Hi!
Most of the time; i do

my_list = []  # will create the list on the CPU
for x in some_tensor:  # elements of some_tensor are on GPU
     v =  process(x) # Return a tensor(val, device='cuda:0')
     my_list.append(v)

# convert my_list back into a tensor ---> and move the list back to the GPU
my_list = torch.from_numpy(my_list)
my_list = my_list.to(device)

Is it possible to create an empty tensor my_list = torch.cuda.FloatTensor(...) and directly use it as an accumulator? If yes, how to initialize the size of the list when i don’t know the size.
Using a list and then converting the list back to a tensor and moving the tensor back on the GPU is overloading. (I guess). I assume creating a numpy list in the forward function happens on the CPU.
How can I accumulate tensor and do the work while on the GPU?
Thank you.

JuanFMontesinos · December 12, 2018, 8:39am

I’m sorry but I don’t understand your problem.

You iterate over a tensor? for? In fact when you do that torch enables iteration over the batch size but it’s not usually useful… Numpy does not enable iterators over np.arrays

Anyway you cannot create tensors in the forward pass without breaking the backpropagation.

jpainam · December 12, 2018, 8:59am

Thank you.

I was trying to generalize my question. I’m not creating a new tensor in the forward and i don’t intend to iterate through my batch.
My problem is, how do I accumulate tensors in a new tensor. I don’t want to use mlist = [] and .append which I assume run on the CPU. That basically means moving my data from the GPU to the CPU, accumulate and moving them back. Is there a way to directly accumulate tensors in another tensor.

This usually happens in a loop.

JuanFMontesinos · December 12, 2018, 9:10am

There is no I guess. If you think about it, a list is agnostic to backpropagation. It does not appear in the graph cut. You could, for example, accumulate tensors of the same dimensions on a new dimension using torch.concatenate, however this would create a joint in the graph cut at the time of backpropagating that may be undesirable for you .
I cannot say 100% it’s not possible but according to pytorch working, which follows the flow of data to backpropagate, it’d always create this joint unless developers created a “gpu list” which I unknown at the time of replying.

Anyway I find this topic very interesting. Based on the code i’ve seen everone uses python lists.

jpainam · December 12, 2018, 9:16am

Thanks, this is exactly what i’m worry about. Is there an alternative to python lists when we are dealing with tensor running on GPU?
An example

    def forward(self, xx, neg):
        # xx shape [View, B, C, H, W]
        # neg shape [B, C, H, W]

        xx = xx.transpose(0, 1)
        combined_views = []
        for v in xx:
            v = self.base(v)
            v = self.avgpool(v)
            v = v.view(v.size(0), -1)
            combined_views.append(v)
        p_view = combined_views[0]
        for i in range(1, len(combined_views)):
            p_view = torch.max(p_view , combined_views[i])

where is this combined_views located?

JuanFMontesinos · December 12, 2018, 9:21am

I cannot confirm the issue of moving back to cpu and to gpu.

However since you are using cuda tensors they are allocated in the gpu.
import torch

a=torch.rand(100).cuda()

l=[a,a,a]

l[0].device
Out[4]: device(type='cuda', index=0)

However I unknow if, as you said, to create this list it is moving back to cpu and then to gpu

albanD · December 12, 2018, 10:15am

Hi,

If your original list contains Tensors as your first post indicate, you can just call torch.cat() or torch.stack() on it to concatenate the tensors into a new one. All elements of the list should be on the same device and the concatenated version will be on the same device as the Tensors in the list.

drevicko · March 11, 2020, 6:40am

Unless I’m mistaken, it’s located on CPU, but it’s a list of references to objects whose data is on GPU. All the data calculations in your code are done on GPU.