Running inference on multiple Images on a Single on Device GPU using pytorch mobile

nbansal90 · March 13, 2024, 10:28pm

Use Case: I currently have a model_dynamic_quantization.ptl, which I have created using pytorch mobile using the following steps.

pth_path = '../models/net_g_latest.pth'
checkpoint = torch.load(pth_path)
model.load_state_dict(checkpoint['params'])
model.eval()

# dynamic quantization
model_quantized = torch.quantization.quantize_dynamic(model, qconfig_spec={torch.nn.Linear}, dtype=torch.qint8)

# Convert to torch mobile (ptl)
example = torch.rand(1, 3, 128, 128)
traced_script_module = torch.jit.trace(model_quantized, example)
traced_script_module_optimized = optimize_for_mobile(traced_script_module)

# Save the model
traced_script_module_optimized._save_for_lite_interpreter("model_dynamic_quantization.ptl")

I am looking to run this model on my end device, which has GPU as the compute. I am wondering if there is a way to parallelize model inference over multiple images on the device(GPU) using the above model.

Any help/suggestion is appreciated.
Regards!