I am experiencing a strange issue regarding input size. While my network runs fine with 3D input size 255x255x8, reaching 256 will result in an error (cuDNN error: CUDNN_STATUS_EXECUTION_FAILED) even after halving the third dimension. At first I would have thought this was a memory issue, but 255x255x8 is a much larger input than 256x256x4. In fact I can increase the size to 255x255x10 and it will still run.
Is anyone aware of any specific reason for this oddly specific threshold?
I downsized the network to the minimum required to reproduce the error. This code works fine on CPU with the same input shapes, but fails to work on GPU if two dimensions are >= 256 and the third >=4. 256x256x100 works on CPU without any issues. I should probably send a bug report on github