What is the correct way to use DS and DL in predict?

optimoose · November 23, 2020, 7:25am

I have some code for a predict method (shown below) that I got to work after some trial and error. Is there some way to simplify this code? I am running the predict method in batches because if I don’t too much memory is used up. Still, I do not understand why I have to use the innermost loop. Instead of the innermost loop I tried batch_preds = self.model.forward(xb), but that fails. Why? Is there a better way to do this? Here is the code:

def predict(self, X):
    if self.gpuid is not None:
        device = torch.device(f"cuda:{self.gpuid}")
    else:
        device = torch.device("cuda")
    self.model.to(device)
    self.model.eval()
    with torch.no_grad():
        X = torch.tensor(X).float().to(device)
        predict_ds = TensorDataset(X)
        if self.predict_by_batch:
            predict_dl = DataLoader(predict_ds, batch_size=self.batch_size)
            preds = []
            for xb in predict_dl:
                for x in xb:
                    batch_preds = self.model.forward(x)
                    batch_preds = batch_preds.to('cpu')
                    preds.extend(list(batch_preds.numpy()))
            preds = np.asarray(preds)
        else:
            preds = self.model.forward(X).to('cpu').numpy()
    return np.squeeze(preds)

I should add that it seems to run slowly.

optimoose · November 24, 2020, 5:36am

So I am really new at pytorch and did not really know where to start. For my problem all of the data fits into memory so just using simple indexing of the batches worked. That runs very fast.

ptrblck · November 25, 2020, 9:22am

The inner loop should not be necessary. What kind of error are you seeing?