The strangest bug in model.forward() for multiple batches

Hello,

I’m running a pytorch model with cuda.

My code gets as input about 100 images, divide them into groups with the size of BATCH_SIZE, and send to the model batch by batch.

In the first batch: the model.forward() function running time is very high (>1000 ms)

In the second batch: I get the several occurrences of error message:

[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\codegen\cuda\manager.cpp:336] Warning: FALLBACK path has been taken inside: torch::jit::fuser::cuda::runCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable export PYTORCH_NVFUSER_DISABLE=fallback

  • (function runCudaFusionGroup)*

but the detection is good.

up from the third batch the model.forward() runs as accepted: no errors, low running time (~ 10 ms) and good detection.

This strange behavior occurred regardless the batch size, the images sent to the model, or their number.

Any idea?
Thank you in advanced!

Based on the warning you are using an older PyTorch release <=1.13 and are using the TorchScript workflow. Update to any recent PyTorch release and use torch.compile instead.

thank you for your answer!
I upgraded the pytorch version, and now I don’t get the error messages. however, the running time issue is the same:

first batch running time: ~10,000 ms
second batch running time: ~3,000 ms
up from third batch: <10 ms

as for model.compile - is is supported in the cpp?

thanks!

Good to hear the warning disappeared. Initial iterations could be slower due to the optimization of the computation graph.

No, torch.compile is not directly exposed in the C++ API.