Is diluted conv net requires more computation?

So I have two networks one with diluted conv2d layer and one without diluted conv2d layer.
When I run non-diluted network, I only see 16% of Volatile GPU-Util from GPU usage and it’s progress is decent, (meaning fast).
However, when I just add dilation (dilation=True), it actually uses 100% Volatile GPU-Util and the progress is much slower.

I just got curios if this behavior is expected because it is such a huge difference for a single flag change

The time spent on the device (and thus the performance) not only depends on the theoretical number of operations, but also on the implemented algorithm. E.g. if you are using cudnn, you might see a difference between highly optimized kernels for common use cases and “uncommon” cases.

I’m not sure, if you are using cudnn for your workloads and thus cannot comment on the applied kernels internally for your convs.