How to distinguish datasets spawned by the dataloader moudule?

chener · March 18, 2024, 2:08am

If both the dataset and dataloader moudules are used as the document, the spawned datasets are the same. But for my situation, in dataset getitem function, I used a model to preprocess the data. If I don’t specify the cuda index, all the models used in dataset for preprocessing will occupy the same gpu. So, how can I get something like index in each dataset.init function to assign different gpus to the models in the dataset.

ptrblck · March 18, 2024, 3:24pm

You could try to use the worker id and use it to move the model to the corresponding device.

chener · March 19, 2024, 1:29am

Hi, how can I get the worker id, any links suggested?

ptrblck · March 19, 2024, 10:57pm

This code should work inside the Dataset.__getitem__:

    def __getitem__(self, index):
        worker_info = torch.utils.data.get_worker_info()
        if worker_info:
            worker_id = worker_info.id
            print('worker_id {} calling with index {}'.format(worker_id, index))