Libtorch C++ batch prediction in parallel

Hi, the pytorch model can be converted into C++ and forward() prediction is also successful. However, I need to speed up the forward prediction by processing batches of inputs in parallel.

Specifically, say batch_size = 32, inputs contains num_samples = 64, then we have num_batches = batch_size / num_samples = 2, A and B. If using a for-loop to iterate and do forward(A), forward(B) sequentially, it will be slow. To speed up, how can we do forward(A) and forward(B) in parallel?

Is below enough?

#pragma omp parallel
#pragma omp for
    for(i = 0; i < num_batches; i++)
        //do model->forward(batch[i]);

Does anyone have a minimal C++ example to do that?

1 Like