say I’m testing some codes and monitoring the GPU memory allocated with torch.cuda.max_memory_allocated()
Edit: I delete the misleading toy example.
The situation is that I run a benchmark over several different configurations and log their execution time, GPU memory footprint, etc.
And the issue is when one of the configuration leads to CUDA OOM, the benchmark of next configuration always produces error memory stats (I know it’s wrong by exchanging the execution order). How can we avoid such thing?
A typical error message like
...
T=150 U=20 V=5000 N=1 time=2.70 memory=243
T=150 U=20 V=5000 N=16 time=50.05 memory=3842
T=150 U=20 V=5000 N=32 time=128.20 memory=7684
T=150 U=20 V=5000 N=64 time=276.37 memory=15360
T=150 U=20 V=5000 N=128 error=CUDA out of memory. ...
T=1500 U=300 V=50 N=1 time=16.21 memory=23044 <- Error occurs, this should be much smaller
T=1500 U=300 V=50 N=16 time=78.82 memory=5763
T=1500 U=300 V=50 N=32 time=209.08 memory=11520
T=1500 U=300 V=50 N=64 time=398.86 memory=23044
The repo can be found here maxwellzh/warp-rnnt: CUDA-Warp RNN-Transducer (github.com)
in pytorch_binding/benchmark.py