Multiple Models Using the Same Data Optimization

I’m trying to optimize a deployed application.

The data I’m working with is a list of strings that is already loaded into the application as a variable before calling the dataloader.

I’m using 9 models total. All are pertained AlbertForSequenceClassifier.

I’ve tried a few variations of different num_worker counts and adding additional CPUs to the deployment.

Every-time Num_Workers=0 beats the performance by a large margin.

The larger the data the smaller the speed increase but on my smallest data sample it went from 10.89s seconds to .27 seconds.


dataloader = DataLoader(
    collate_fn=partial(prepare_sample, tokenizer=tokenizer),

results = []
for batch in dataloader:
    input_ids, attention_mask = batch
    input_ids =
    attention_mask =

    with torch.no_grad():
        logits = model(input_ids, attention_mask=attention_mask)[0]
        _, pred = torch.max(logits, dim=1)

prediction =, dim=0).detach().cpu().numpy().tolist()

For each model I call the above function which iterates over the data and returns the prediction. So the same data is being loaded 9 times.

The majority of the execution time is on “logits = model(input_ids, attention_mask=attention_mask)[0]”

I’m new to using pytorch so there are probably some obvious changes here that I’m unaware of.

This seems fine to me if your bottleneck is the model execution then it’s unclear that data loader improvements will make a huge difference

My suggestion is to profile your code which will let you know if there’s any major issues