How can I split the input data into training and validation in the C++ API?

tmaric · December 10, 2021, 8:40pm

I would like to be able to do k-fold validation or use random sampling to split the data into training and validation sets, but I’m having trouble finding the functions in the C++ API, can anyone help?

tom · December 11, 2021, 10:37am

The low-level sampling functions should be there, I don’t think torch.distributions has an equivalent in C++. However, in terms of process I’d recommend to assign each datapoint to a fold ahead of time.

Best regards

Thomas

tmaric · December 11, 2021, 5:45pm

Thanks! I found tensor_split but the problem is that my data have geometric ordering, for example, data stored at points surrounding a sphere. Constructing a fold manually would mean I’ve selected a sub-set of the sphere’s surface for testing, which won’t work. Random sampling would be better… if there aren’t libtorch native functions for that, I can use STL and hack something, but it would be nice to have something like a random_split(const at::tensor &, 0.1) giving std::vector<at::tensor> that splits the data randomly into 90% training, 10% validation sets.