Passing a dictionary to getitem

Hi,
How do I get my getitem function to accept a dictionary, e.g.:

from torch.utils.data import Dataset
class DatasetCustomGetitem(Dataset):
    ....

def __getitem__(self, data: dict)
    ....

The Dataset is wrapped with torch DataLoader.
I need it only for inference.

Thanks

Need more information on your use-case. How is this being integrated into your training run?

I edited my question

It would help me if I could see how you propose using it in your training loop. The Dataset class is meant to give everything you need for one training instance. It doesn’t make sense to me, without more information, why you would want to pass something else to it (that is different for each item, otherwise you would make it a member of your Dataset and pass it in the constructor.) But I expect you do have a good reason, but without knowing it I can’t work out what the best solution is.

Generally, you are not going to be able to easily pass a custom object to Dataset::getItem as it is used within the machinery of the DataLoader class.

I need to implement it as part of a complex pipeline I have no control on…
When training, at initialisation the Dataset receives a list of folders. getitem works “as usual”: receiving an index, depicting the relevant folders, reading files (image, csvs…), does some data processing and outputs it.
At inference, I cannot do any reading; Instead I want to directly get the data (in the form of a dictionary, which always have the same keys, with their values’ being a numpy array.).

I’m not particularly sure I can help, but it is possible this will help you:

from torch.utils.data import Dataset
class DatasetCustomGetitem(Dataset):
    def __init__(self, data_dict):
       self.data_dict = data_dict

    def __getitem__(self, idx)
       return self.data_dict[idx]

    def __len__(self):
       return len(self.data_dict.keys())

Thanks for the quick response. I thought of this solution, but this will force me to store a lot of data in memory. I wonder if there’s a more elegant solution, e.g. writing a custom data fetcher.

Then just store all the elements onto disk, and have getItem load the nth one.

The Dataset class was designed to solve these problems, I think.

I’m not allowed to do any reading.
In your first suggestion, would erasing the dict key from self.data_dict can create some error?
For example:

    def __getitem__(self, idx)
       data= self.data_dict[idx]
       # do stuff on the data
      delete self.data_dict[idx]
      return data