Create custom dataset from tensors

Sayyed_Ali_Mousavi · May 27, 2021, 5:00pm

Hello everyone. I am new to pytorch. I have a program that produce tensors and labels of them. I want to create a dataset (perhaps a .pt file) and use it for training. But how should i do it? How every tensor match its label? Thanks a lot.

ptrblck · May 28, 2021, 8:11am

In case you have already created the data and target tensors, you could use torch.utils.data.TensorDataset to create the dataset.

You, as the creator of the dataset, would have to make sure that each create data sample matches its target.

Sayyed_Ali_Mousavi · May 28, 2021, 8:37am

Thank you very much. I create two empty list and in an iterative procedure append every data to first list and its target to second. Say, features=[tensor_1, tensor_2, …] and targets=[label_of_tensor_1, label_of_tensor_2, …]. Then I use TensorDataset(features, targets). Is this a correct way? I give an Error: “valueerror: only one element tensors can be converted to python scalars”! Can you help me more?

ptrblck · May 28, 2021, 8:40am

Yes, this is generally the right approach.
Could you check, if all the data and target tensors have the same shape, and thus a tensor creation via:

features = [tensor1, tensor2, ...]
features = torch.stack(features)

would work?
I guess the new error is raised because of unexpected shapes, but am unsure which operation raises it.

Sayyed_Ali_Mousavi · May 28, 2021, 9:04am

I’m very sorry (In previous reply I writed error that arised after some additional code) Error is This: “AttributeError: ‘list’ object has no attribute 'size”. And simple code that write for examination:
My code:
my_x = [torch.rand(2,2),torch.rand(2,2)] # list of tensors
my_y = [torch.rand(1), torch.rand(1)] # list of targets
my_dataset = TensorDataset(my_x,my_y).

I apologize if my questions are childish. Thanks a lot.

ptrblck · May 28, 2021, 9:07am

Ah OK, that would fit my expectation of the error and this would work:

my_x = torch.stack([torch.rand(2,2),torch.rand(2,2)])
my_y = torch.stack([torch.rand(1), torch.rand(1)])
my_dataset = torch.utils.data.TensorDataset(my_x,my_y)

The error is raised, because tensors are expected, while you are passing lists.

Sayyed_Ali_Mousavi · May 28, 2021, 9:31am

Thank you very much. It works. And for final question, I would to save my_x and my_y for further use. (or perhaps save my list of tensors and targets that was created). How to save and load them after? (If I do not save them, I will have to create two large of lists with append and it takes a long time).

ptrblck · May 28, 2021, 9:34am

You can save and load tensors via torch.save(tensor, path) and torch.load(tensor, path), respectively. This would allow you to load these tensors afterwards and create the TensorDataset directly without creating the lists.