Use Case: I currently have a model_dynamic_quantization.ptl
, which I have created using pytorch mobile
using the following steps.
pth_path = '../models/net_g_latest.pth'
checkpoint = torch.load(pth_path)
model.load_state_dict(checkpoint['params'])
model.eval()
# dynamic quantization
model_quantized = torch.quantization.quantize_dynamic(model, qconfig_spec={torch.nn.Linear}, dtype=torch.qint8)
# Convert to torch mobile (ptl)
example = torch.rand(1, 3, 128, 128)
traced_script_module = torch.jit.trace(model_quantized, example)
traced_script_module_optimized = optimize_for_mobile(traced_script_module)
# Save the model
traced_script_module_optimized._save_for_lite_interpreter("model_dynamic_quantization.ptl")
I am looking to run this model on my end device, which has GPU as the compute. I am wondering if there is a way to parallelize model inference over multiple images on the device(GPU) using the above model.
Any help/suggestion is appreciated.
Regards!