How to use Skorch for data that does not fit into memory?

How to use Skorch if your data does not fit into memory for grid-search? I have a dataloader that returns mini-batches and I want to do a grid-search to find the best hyper-parameters for my model.

As you may know, you can use datasets or even your own dataloader with skorch but this is a bit problematic in conjunction with GridSearchCV or other parameter searches since they expect indexable inputs (torch datasets aren’t). For this reason skorch has a SliceDataset. You can use it as follows:

gs = GridSearchCV(mySkorchNet, params, ...)

ds = MyCustomDataset()
X_sl = SliceDataset(ds, idx=0)
y_sl = SliceDataset(ds, idx=1)

gs.fit(X_sl, y_sl)

What SliceDataset does is to emulate index operations so that GridSearchCV can compute the train/validation split and slice the data properly without loading all of the data, so basically a torch dataset you can slice.

2 Likes

Thanks, I managed to make it work using the sliceDataset

hey how do you perform grid search please guide me.

Hi @Thabang_Lukhetho, I tried following the solution provided above but it didn’t work for me. Can I ask how did you managed to make it work using sliceDataset?

Can you post the error message or problem you are getting?

1 Like

Hi @Thabang_Lukhetho, many thanks for your reply.

I have actually posted my question here in PyTorch Forums but haven’t found a solution yet. You can see my code along with the error message for this here: How to use PyTorch's DataLoader together with skorch's GridSearchCV

@Muhammad_Izaz have a look at this tutorial, I also followed it,