well actually it is super easy . Just as in this example for the MNIST dataset, you can implement a torch::data::datasets::Dataset<Self, SingleExample>. Therefore, you need to override the get(size_t index) method from Dataset. What you need to do, is to get your data from somewhere and convert it into a Tensor, but this is up to you.
#include <torch/torch.h>
// You can for example just read your data and directly store it as tensor.
torch::Tensor read_data(const std::string& loc)
{
torch::Tensor tensor = ...
// Here you need to get your data.
return tensor;
};
class MyDataset : public torch::data::Dataset<MyDataset>
{
private:
torch::Tensor states_, labels_;
public:
explicit MyDataset(const std::string& loc_states, const std::string& loc_labels)
: states_(read_data(loc_states)),
labels_(read_data(loc_labels) { };
torch::data::Example<> get(size_t index) override;
};
torch::data::Example<> MyDataset::get(size_t index)
{
// You may for example also read in a .csv file that stores locations
// to your data and then read in the data at this step. Be creative.
return {states_[index], labels_[index]};
}
Then, you want to generate a data loader from it, just do
// Generate your data set. At this point you can add transforms to you data set, e.g. stack your
// batches into a single tensor.
auto data_set = MyDataset(loc_states, loc_labels).map(torch::data::transforms::Stack<>());
// Generate a data loader.
auto data_loader = torch::data::make_data_loader<torch::data::samplers::SequentialSampler>(
std::move(data_set),
batch_size);
// In a for loop you can now use your data.
for (auto& batch : data_loader) {
auto data = batch.data;
auto labels = batch.target;
// do your usual stuff
}
Hopefully this helps, although I don’t know the kind of data you are trying to read in.
Hi, I found that the example only contains the data and target, how can i do while my data contains many components. (for example, the sentence simlilarity classfication dataset, every item of this dataset contains 2 sentences and a label, for this dataset, I would like to define sentence1, sentence2 and label rather than image and labels)
How can I do that? thanks!
some python code are follow:
Thank you for your example how to use libtorch to create own datasets- loaders. Ive followed your example and created a read_data() function that returns a tensor from a csv file by first creating a vector then flatten the vector and then creating tensors in the right shape by using from_blob.
Here is the output of my input and output vectors and tensors (each row is an observation):
Unfortunately if I use the class MyDataset in the main function I get the error:
a cast to abstract class “MyDataset” is not allowed: – pure virtual function “torch::data::datasets::BatchDataset<Self, Batch, BatchRequest>::size [with Self=MyDataset, Batch=std::vector<torch::data::Example<at::Tensor, at::Tensor>, std::allocator<torch::data::Example<at::Tensor, at::Tensor>>>, BatchRequest=c10::ArrayRef<size_t>]” has no overriderC/C++(389)
Im using the class like this: auto data_set = MyDataset(input_loc, output_loc);
Can please someone help me out.
EDIT / SOLUTION:
Ok I solved it by also overriding the size() method like this:
Something that I also had to do to resolve all compilation errors was to pass the data_loader inside the for range loop by pointer like this:
for (auto& batch: *data_loader) { ... };
Otherwise it would not compile.
I really miss more Tutorials for the C++ API in the Documentation area. How Can I contribute to add more tutorials for the C++ API so that beginners like me dont have this issues?
these files give you complete example
~/pytorch/torch/csrc/api/include/torch/data/example.h
~/pytorch/torch/csrc/api/include/torch/data/datasets/mnist.h
~/pytorch/torch/csrc/api/src/data/datasets/mnist.cpp
~/pytorch/test/cpp/api/integration.cpp