Hello,
I have created a custom dataset. Within get(), I recorded the sample index as well as the filename in std::vector.
torch::data::Example<> CustomDataset::get(size_t index) {
torch::Tensor tData;
torch::Tensor tLabel;
std::string tPath;
....
//set tPath to an image filename. set tLabel as well.
auto mat = cv::imread(tPath);
...
tData = torch::from_blob(mat.data, {mat.rows, mat.cols, 3}, torch::kByte);
tData = tData.to(torch::kFloat);
tData = tData.permute({2, 0, 1}); // Channels x Height x Width
path_.push_back(tPath);
index_.push_back(index);
return {tData.clone(), tLabel.clone()};
}
Since I can’t change get() signature, I rely on accessing CustomDataset in the data loading loop in order to access members of CustomDataset. In below example, I set the batch size to 10. During runtime, batch.data.size(0) is 10. However customDataset.path_.size() is 0.
auto customDataset = CustomDataset();
auto dataLoader = torch::data::make_data_loader<torch::data::samplers::SequentialSampler>(
customDataset//std::move(customDataset)
.map(torch::data::transforms::Normalize<>({0.5, 0.5, 0.5},{0.5,0.5,0.5}))
.map(torch::data::transforms::Stack<>()),
torch::data::DataLoaderOptions()
.batch_size(10)
.workers(actionParameters.nThreads)
.drop_last(true));
for (auto& batch : *dataLoader) {
std::cout << "score::runAction(): customDataset.path_.size() = " << customDataset.path_.size() << std::endl;
std::cout << "score::runAction(): batch.data.size(0) = " << batch.data.size(0) << std::endl;
...
customDataset.path_.clear();
customDataset.index_.clear();
}
My impression is that dataloader created a copy of customDataset after make_data_loader(). The original customDataset object is no longer needed. Thus std::move() semantics is more efficient.
I saw this post in Python API discussing dataloader reset the state of dataset.
This post has @ptrblck suggested solution using loader.dataset
to access the embedded dataset.
for epoch in range(2):
for idx, data in enumerate(loader):
print('Epoch {}, idx {}, data.shape {}'.format(epoch, idx, data.shape))
if epoch==0:
loader.dataset.set_use_cache(True)
I wonder what is the equivalent in the C++ world. How can I access the associated custom dataset after make_data_loader()?
Thank you very much.