What are your thoughts on providing an option to name the individual tensors in the TensorDataset, similarly to how named tensors allow dimension naming?
The goal of this post is to form an early understanding of the sentiment towards the idea.
I believe that naming tensors in TensorDataset would be a logical extension to naming tensor dimension and would provide similar benefits to datasets as named dimensions provide to tensors.
Could you post an example, how this naming would be used in a
I’m thinking of something along the lines of:
from torch.utils.data import dataset
a = torch.randn(5, 10)
b = torch.randn(5,15)
named_tensor_dataset = dataset.TensorDataset(a,b,names=('embeddings', 'labels'))
# printing tensor names helps understand what data
# I'm dealing with and the order of that data
>>> ('embeddings', 'labels')
for example in named_tensor_dataset:
y = SampleNN(example.embeddings)
err = criterion(y, example.labels)
Thanks for the example.
An easy way of “named samples” would be to return a
dict in your
Dataset and use the
key as the name.
Not sure, if a custom class implementation would work, and I would need to verify it.
I was rather thinking of returning a
collections.namedtuple, since it would allow to keep the current access by index in addition to access by name.
What do you think about the value of adding the feature to PyTorch?
How would the batching work with a
Wouldn’t the sampler be required to create the batch as a
namedtuple, so that you could index it inside the
DataLoader loop as e.g.:
for batch in loader:
x, y = batch.data, batch.target
I’m not sure why would the sampler need to create a
namedtuple, since obtaining the number of
namedtuples in the
Dataset would suffice for generating random indices over them.
For batching the
DataLoader should created a new
namedtuple such that each element in the orginal corresponds to a tensor of batch size in the new one (as you have illustrated in the example).