Multiple Models Using the Same Data Optimization

I’m trying to optimize a deployed application.

The data I’m working with is a list of strings that is already loaded into the application as a variable before calling the dataloader.

I’m using 9 models total. All are pertained AlbertForSequenceClassifier.

I’ve tried a few variations of different num_worker counts and adding additional CPUs to the deployment.

Every-time Num_Workers=0 beats the performance by a large margin.

The larger the data the smaller the speed increase but on my smallest data sample it went from 10.89s seconds to .27 seconds.


dataloader = DataLoader(
    collate_fn=partial(prepare_sample, tokenizer=tokenizer),

results = []
for batch in dataloader:
    input_ids, attention_mask = batch
    input_ids =
    attention_mask =

    with torch.no_grad():
        logits = model(input_ids, attention_mask=attention_mask)[0]
        _, pred = torch.max(logits, dim=1)

prediction =, dim=0).detach().cpu().numpy().tolist()

For each model I call the above function which iterates over the data and returns the prediction. So the same data is being loaded 9 times.

The majority of the execution time is on “logits = model(input_ids, attention_mask=attention_mask)[0]”

I’m new to using pytorch so there are probably some obvious changes here that I’m unaware of.