Multiple Models Using the Same Data Optimization

I’m trying to optimize a deployed application.

The data I’m working with is a list of strings that is already loaded into the application as a variable before calling the dataloader.

I’m using 9 models total. All are pertained AlbertForSequenceClassifier.

I’ve tried a few variations of different num_worker counts and adding additional CPUs to the deployment.

Every-time Num_Workers=0 beats the performance by a large margin.

The larger the data the smaller the speed increase but on my smallest data sample it went from 10.89s seconds to .27 seconds.

Code:

dataloader = DataLoader(
    dataset=dataset,
    shuffle=False,
    sampler=None,
    batch_size=128,
    collate_fn=partial(prepare_sample, tokenizer=tokenizer),
    num_workers=0)

results = []
for batch in dataloader:
    input_ids, attention_mask = batch
    input_ids = input_ids.to(model.device)
    attention_mask = attention_mask.to(model.device)

    with torch.no_grad():
        logits = model(input_ids, attention_mask=attention_mask)[0]
        _, pred = torch.max(logits, dim=1)
        results.append(pred)

prediction = torch.cat(results, dim=0).detach().cpu().numpy().tolist()

For each model I call the above function which iterates over the data and returns the prediction. So the same data is being loaded 9 times.

The majority of the execution time is on “logits = model(input_ids, attention_mask=attention_mask)[0]”

I’m new to using pytorch so there are probably some obvious changes here that I’m unaware of.