Use two separate dataloaders or use slicing?

I have two tensors of shape (16384,3,224,224) each. I need to multiply these two together. Obviously these two tensors are too big to fit in GPU ram. So I want to know, how should I go about this, divide them in smaller batches using slicing [i:i+btach_size,:,:,:] or should I use two separate dataloaders? I am confused, how to use two different dataloader together?
What would be the best way to do this?

I suggest you to define your own dataset so that you can use only one loader while taking batches from both tensors. For example, you can use something like this:

import torch
size = 16
a = torch.randn(size,3,224,224)
b = torch.randn(size,3,224,224)

class Dataset(
    def __init__(self, a, b):
        assert len(a)==len(b)
        self.a = a
        self.b = b
    def __len__(self):
        return len(self.a)
    def __getitem__(self, i):
        return self.a[i], self.b[i]

dataset = Dataset(a, b)
loader =, batch_size=4)

for a, b in loader:
    print(a.shape, b.shape)

There might be better solutions though.

1 Like

@albanD any suggestions

Well if the data don’t fit on GPU, you’ll most likely spend more time sending the data to the GPU than doing actual compute. So I guess you can simply do this on the CPU.

If you do need the GPU, the proposal above looks ok. Maybe use IterableDataset if you have performance issues. (But again if you don’t do anything else with it but a simple mulitplicaition, it is expected that you spend all your time sending stuff to the GPU).

1 Like

Thank you so much! I was looking for exactly this answer. I’m new to deep learning, can you tell me more about when to use gpu and when to work on cpu alone with examples

In general, you should use the GPU if you can perform most ops on the GPU (or at least keep transfer low). And you only perform ops on large Tensors.