Can I use low end GPU like GTX1030 to run prediction of a large/huge model?

I’m training a deep learning model that need more than 8GB of GPU RAM. And I wonder can that trained model be used to predict on 2GB of RAM GPU like GTX 1030, so I can run test while training?

You could try to perform the inference on your second GPU and see, if if has enough memory:

device = 'cuda:1' # assuming your GTX1030 is cuda:1
model = model.to(device)
with torch.no_grad():
    for data, target in val_loader:
        data = data.to(device)
        target = target.to(device)
        output = model(data)
        ...

If you want to run the prediction simultaneously while training, you could save the current model checkpoints, load them in the prediction script and run the inference part.