Can I put cuda tensors into python sets/lists without unloading them from GPU?

I want to check if I can use python lists/sets to reference cuda tensors already on GPU, without unloading and reloading to and from GPU.

My situation is like the following:

  1. I am training a LSTM model with time series data.
  2. The entire data set came from about 100 days worth of data.
  3. I put each day’s data into separate cuda tensors, loaded them onto GPU and then put them all into a python set like this,
    *data_set = {}
    *for date in dates: data_set[date] = torch.from_numpy(data[date]).type(torch.cuda.FloatTensor)

I assumed that given the entire data set size is less than GPU memory, everything is loaded on the GPU’s memory and my python set (data_set ) contains only the pointers to cuda tensors which are already on the GPU.
That is, I assumed that while training I can feed each day’s data into my net like
output = my_net(data_set[date]) while maintaining cuda tensors on GPU.

Would this work ? Or would the python set, data_set, create a CPU hard copy of cuda tensors and reload them onto GPU when called like data_set[date]?

( One of the reasons that I am asking this is that recently I moved from Titan-X to V100 to increase the training speed but the training is currently taking significantly longer to my surprise. So I suspect that there could be a CPU/GPU data transfer bottleneck in the way that I set up and that bottleneck determines the total training time… )

Million thanks in advance,