Libtorch dataset issue

jackeown · April 26, 2020, 6:12pm

I’m trying to build a C++ libtorch model. Following examples I found online, I thought I should use a Dataset to make a data loader to load in batches of my data. However, in the examples I’ve seen online, the “get” method of a custom dataset returns Example<> which assumes that both the input and labels are tensors. What if I want them to be vector<vector<int>>? (A jagged 2d array)

Is this possible? If so, can you give a code example of doing this with libtorch?

Thanks,
Jack

Hartbook · April 26, 2020, 6:20pm

From the doc : Template Class StatefulDataset — PyTorch main documentation

Note that when subclassing a from StatefulDataset<Self, T>, the return type of get_batch(), which the subclass must override, will be optional (i.e. the type specified in the StatefulDataset specialization is automatically boxed into an optional for the dataset’s BatchType).

So you should define your dataset as

class YourDataset : public torch::data::datasets::StatefulDataset<YourDataset,std::vector<std::vector<int>>>

And in that case, get_batch will return c10::optional<std::vector<std::vector<int>>>

I took StatefulDataset as a base class because I don’t know what kind of dataset you need, but it should work the same for the other types.

jackeown · April 26, 2020, 10:53pm

Thanks for the reply!

In the examples that I found online (https://github.com/pytorch/examples/blob/master/cpp/custom-dataset/custom-dataset.cpp), a batch returned from a dataloader had data and target properties. When I use the approach you mention above, the returned batch is of type std::vector<MyClassForExample> This makes sense I guess…since that’s what I told it to return, I’m just curious about how the properties of the Example class (data and target) are available in a returned batch otherwise?

Does the map method of dataset have anything to do with that? What exactly does map do? Specifically what does it do with the Stack transform. Why do people need to “stack” their examples if the make_data_loader takes in the batch size as well…isn’t that the same kind of thing?

Thanks again,
Jack