The strangest bug in model.forward() for multiple batches

Ella_Mendelson · December 5, 2024, 11:44am

Hello,

I’m running a pytorch model with cuda.

My code gets as input about 100 images, divide them into groups with the size of BATCH_SIZE, and send to the model batch by batch.

In the first batch: the model.forward() function running time is very high (>1000 ms)

In the second batch: I get the several occurrences of error message:

[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\codegen\cuda\manager.cpp:336] Warning: FALLBACK path has been taken inside: torch::jit::fuser::cuda::runCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable export PYTORCH_NVFUSER_DISABLE=fallback

(function runCudaFusionGroup)*

but the detection is good.

up from the third batch the model.forward() runs as accepted: no errors, low running time (~ 10 ms) and good detection.

This strange behavior occurred regardless the batch size, the images sent to the model, or their number.

Any idea?
Thank you in advanced!

ptrblck · December 5, 2024, 3:12pm

Based on the warning you are using an older PyTorch release <=1.13 and are using the TorchScript workflow. Update to any recent PyTorch release and use torch.compile instead.

Ella_Mendelson · December 8, 2024, 1:13pm

thank you for your answer!
I upgraded the pytorch version, and now I don’t get the error messages. however, the running time issue is the same:

first batch running time: ~10,000 ms
second batch running time: ~3,000 ms
up from third batch: <10 ms

as for model.compile - is is supported in the cpp?

thanks!

ptrblck · December 8, 2024, 2:57pm

Good to hear the warning disappeared. Initial iterations could be slower due to the optimization of the computation graph.

No, torch.compile is not directly exposed in the C++ API.