Test_adaptive_avg_pool2d_nhwc deadline in test_quantized.py

When I build pytorch (successfully) and execute the tests, things go well until encountering test test_adaptive_avg_pool2d_nhwc which is consistently giving a “DeadlineExceeded” error … it usually takes from 230-260 ms, exceeding the 200ms deadline, e.g.:

hypothesis.errors.DeadlineExceeded: Test took 256.44ms, which exceeds the deadline of 200.00ms

What is the approximate time magnitude that this test should take? I’m wondering if the 200ms is intended as a generous time buffer for something that should take significantly less, or is it perhaps not generous enough given that the test does otherwise complete? Many tests specify “@no_deadline” but not this one; should we consider adding it?

Note this is a ppc64le build/test.

So on my laptop (4x i7-5600U CPU @ 2.60GHz in cpuinfo), the test seems to run in ~95ms (with the most lazy, inaccurate measurement and running it once or 11x to get some handle on the invocation overhead).

tv@aComp:~/python/pytorch/pytorch$ python3 test/test_quantized.py TestQuantizedOps.test_adaptive_avg_pool2d_nhwc
.
----------------------------------------------------------------------
Ran 1 test in 0.115s

OK
$ python3 test/test_quantized.py TestQuantizedOps.test_adaptive_avg_pool2d_nhwc{,,,,,,,,,,}
...........
----------------------------------------------------------------------
Ran 11 tests in 1.050s

OK

I don’t know what factor you would suppose to exist between my laptop and your computer.
We would likely want to know if the test suddenly regressed 2x in performance, so turning off the deadline might not be the best thing, but maybe you could have a decorator turning it off or making it more generous for your specific architectures. (These are just some thoughts, I don’t know about any “official policy” or anything.)

Best regards

Thomas

Thomas, Thanks for this comparison point. If you’re getting ~95ms on a laptop, that tells me the 200ms deadline is not unreasonable and executing above that level does flag a possible performance concern. Now in this case, the execution environment is in a conda env within a Ubuntu docker container on the OSUOSL lab on a shared server, so it’s not an environment optimized for performance.
In fact, this is literally the exact ppc64le CI which is directly linked to from the pytorch github home page README.md labelled as 3.6 and “Linux (ppc64le) GPU”, which points to the CI at: https://powerci.osuosl.org/job/pytorch-linux-cuda92-cudnn7-py3-mpi-build-test-gpu/ (in spite of misleading URL name, it’s on cuda10; we’ll fix that URL once it is running cleanly).

Anyhow, this testcase appears new as of pytorch 1.3 and I’m going to try to get a build running and tested on our own local systems to see how the execution of this test compares. (Of course, we’d then like to get the public CI version showing a successful test completion too.)