Using trained model for prediction with variable input batch size


So i was wondering:
I’ve trained my model with a batch size of, lets say 128.
Now when i’m using my model in an application i might not always get 128 data points each time i want to predict something.
Is there a possibility let the model predict only on the number of data points I’m feeding it, even though I’ve trained it for larger sizes?
My first thought would be to use a padding method and just pad 0 tensors.
And I’m guessing the same question would apply the other way around:
Is there an integrated method to split larger inputs into batches of 128 and then handle the last batch, that probably isn’t 128?
But is there an integrated method in Pytorch, or would i have to write it myself?

I am not sure why you would need 128 batch size input at prediction. Much of the time one would need to do prediction for a single sample (batch size of 1). Yes, some operations depend on the batch size like the BatchNorm, but during prediction, pytorch uses the running averages (computed during training) for the parameters for evaluation.