I’m trying to build a C++ libtorch model. Following examples I found online, I thought I should use a Dataset to make a data loader to load in batches of my data. However, in the examples I’ve seen online, the “get” method of a custom dataset returns
Example<> which assumes that both the input and labels are tensors. What if I want them to be
vector<vector<int>>? (A jagged 2d array)
Is this possible? If so, can you give a code example of doing this with libtorch?
From the doc : https://pytorch.org/cppdocs/api/classtorch_1_1data_1_1datasets_1_1_stateful_dataset.html#exhale-class-classtorch-1-1data-1-1datasets-1-1-stateful-dataset
Note that when subclassing a from StatefulDataset<Self, T>, the return type of get_batch(), which the subclass must override, will be optional (i.e. the type specified in the StatefulDataset specialization is automatically boxed into an optional for the dataset’s BatchType).
So you should define your dataset as
class YourDataset : public torch::data::datasets::StatefulDataset<YourDataset,std::vector<std::vector<int>>>
And in that case, get_batch will return
I took StatefulDataset as a base class because I don’t know what kind of dataset you need, but it should work the same for the other types.
Thanks for the reply!
In the examples that I found online (https://github.com/pytorch/examples/blob/master/cpp/custom-dataset/custom-dataset.cpp), a batch returned from a dataloader had
target properties. When I use the approach you mention above, the returned batch is of type
std::vector<MyClassForExample> This makes sense I guess…since that’s what I told it to return, I’m just curious about how the properties of the
Example class (
target) are available in a returned batch otherwise?
Does the map method of dataset have anything to do with that? What exactly does map do? Specifically what does it do with the Stack transform. Why do people need to “stack” their examples if the make_data_loader takes in the batch size as well…isn’t that the same kind of thing?