"Automatic parallelization of models onto multiple GPUs"

Hello,
I have a traced model which i use in C++.
The documentation clearly states: “Automatic parallelization of models onto multiple GPUs like torch.nn.parallel.DataParallel”.

However I do not see this happening on a multi-gpu device. How do i explicitly force the auto module = torch::jit::load(s_model_name, torch::kCUDA); module to be defined like it would be in pure PyTorch?

Thanks.

Anyone knows the answer to this?