Store original data (e.g., text, image) along with tensor data in Pytorch Dataloader

Currently, I am using TensorDataset followed by DataLoader to load my dataset like below:

tensor_loader = TensorDataset(x_input_ids,x_seg_ids,x_atten_masks,y)
data_loader = DataLoader(tensor_loader, shuffle=True, batch_size=batch_size)

I now want to also store original (text) data along with the tensor data in the data_loader like below:

tensor_loader = TensorDataset(x_input_ids,x_seg_ids,x_atten_masks,y, x_input_strs)

Note: x_input_strs is text data corresponding to x_input_ids but it fails since TensorDataset allows only tensors. I also tried something like this:

tensor_loader = Dataset(x_input_ids,x_seg_ids,x_atten_masks,y, x_input_strs)

But it gives the following error:

TypeError: object.__new__() takes exactly one argument (the type to instantiate)

Any suggestions are appreciated.

TensorDataset only accepts *tensors: Tensor as argument(s) and Dataset only takes one argument.

One possible solution is to zip your inputs into tuples and pass them to Dataset. Then in your __getitem__ you can decide whether you would like to return a tuple including your original data or discard it.