Efficient multiple function call

I have a function to test some functionality of the training, and I want to repeat the testing for many times.
How can I do that efficiently so that it runs in parallel and fast on the GPUs?
For instance, how to replace this small block:

N = 100
for i in range(N):
    result = test_training()

Hi @LLlearner,

Have a look at torch.func.vmap to efficiently vectorize your function call, the docs are here: torch.func.vmap — PyTorch 2.0 documentation