Data Subsampling for Hyperparameter Optimisation

hH1sG0n3 · December 8, 2020, 4:03pm

Fundamentally, under what circumstance is it reasonable to do HPO only on a subsample of the training set.

I am using Population Based Training with RayTune to optimise hparameters for a sequence model. My dataset consists of 20M sequences and was wondering if it would make sense to optimise over a subsample due to restricted time budget.