How can I define my own dataset with the API

raymondlucky · March 8, 2019, 9:43am

In Python version I do something like this:

train_data = torch.utils.data.TensorDataset(X_train, y_train)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True)

X_train and y_train are (samples, features) and (samples, target) Tensors, respectively.

When I comes to C++, I can not find the same way to define my own dataset. I already read the official tutorial(https://pytorch.org/tutorials/advanced/cpp_frontend.html), but there is little information about how to define our own (X, y) training dataset because it uses the MNIST dataset that comes with the C++ frontend.

I did find the TensorDataset C++ API, but how can I use it? I tried to use it like the Python way, but it did not work:

torch::Tensor X_train = torch::eye(3);
torch::Tensor y_train = torch::randn({3, 2});
auto train_data = torch::data::datasets::TensorDataset(X_train);
auto data_loader = torch::data::make_data_loader(std::move(train_data));

raymondlucky · March 12, 2019, 9:00am

Could somebody please help?

Oli · March 12, 2019, 9:19am

Yeah could anyone help this guy out ? I tried finding resources on how this could be done but failed to find code that explained how to do it with a custom dataset. I did however found this project that loads the COCO dataset so maybe that could be of use

raymondlucky · March 12, 2019, 9:31am

I found this too. But it’s too complicated for my use.
Still waiting for help.

raymondlucky · March 13, 2019, 10:15am

Could anyone help?
I Google this and literally read every page. Too little information about this.

ptrblck · March 13, 2019, 4:02pm

Maybe this post might be helpful.