Efficient prediction for large dataset

I’ve trained model on ~28,000 samples and wondering what the recommended / efficient way to run predictions on this many samples might be? If the input samples are in a tensor called x_data I’ve found that simply running:

with torch.no_grad():
   pred = model(x_data)

results in very high memory growth and doesn’t complete. I’m able to run the above with batches of 1,000 samples inside a loop using tensor slicing, but it feels rather inelegant.

I’m curious to understand both why the memory footprint grows well beyond what the “pred” tensor would be for this number of samples and also if there’s a standard practice here like using a data loader with batch sizes etc. I went through the tutorial but only see the workflow through saving off the trained model.

Adding something more for feedback. This approach using a data loader does work, albeit slowly at a batch size of 100. I think choosing the batch size optimally would depend somewhat on understanding the drivers of memory usage in the prediction on a batch. (e.g why does memory diverge when the entire dataset prediction is attempted in one step) I’ve seen some references to retention of the computational graph for each sample and so forth.

spec_ds = TensorDataset(x_data)
pred_bs = 100
spec_dl = DataLoader(spec_ds, batch_size=pred_bs)

with torch.no_grad():
    pred = torch.tensor([])
    for batch, X in enumerate(spec_dl):
        pred = torch.cat((pred, model(X[0])), dim=0)