How to use Skorch for data that does not fit into memory?

As you may know, you can use datasets or even your own dataloader with skorch but this is a bit problematic in conjunction with GridSearchCV or other parameter searches since they expect indexable inputs (torch datasets aren’t). For this reason skorch has a SliceDataset. You can use it as follows:

gs = GridSearchCV(mySkorchNet, params, ...)

ds = MyCustomDataset()
X_sl = SliceDataset(ds, idx=0)
y_sl = SliceDataset(ds, idx=1)

gs.fit(X_sl, y_sl)

What SliceDataset does is to emulate index operations so that GridSearchCV can compute the train/validation split and slice the data properly without loading all of the data, so basically a torch dataset you can slice.

2 Likes