TL;DR: What’s the best way in C++ to convert the batch-size-length list of std::vector returned by a dataloader into a tensor, where the first dimension is batch_size and the remaining dimensions are the inputs in the Example.data ?
I have a numerical dataset for learning sequence, where the input dataset and output is a Tensor of size N x SEQLEN x INPUT, and the output is N x SEQLEN x OUTPUT. In Python, I use a TensorDataset, passing both 3D tensors, without issue. However, it appears TensorDataset in C++ doesn’t (yet?) support a labelled dataset (libtorch 1.5).
So I created as custom dataset class and tried to use it with a dataloader (for batching, random, shuffling examples, etc…):
class CustomDataset : public torch::data::datasets::Dataset<CustomDataset> {
using Example = torch::data::Example<torch::Tensor, torch::Tensor>;
torch::Tensor X, Y;
public:
CustomDataset(const torch::Tensor& Xs, const torch::Tensor& Ys) : X(Xs), Y(Ys) {}
Example get(size_t index) {
return {X[index], Y[index]};
}
torch::optional<size_t> size() const {
return X.size(0);
}
};
However, when I create a dataloader using this dataset, in my training loop I get a std::vector. My network’s forward function expects a tensor of shape BATCH_SIZE x SEQLEN x INPUT.
What is the best way to convert this list of Examples into a tensor of the right shape? Or is there a better way to do this?