3d depthwise convolution is too slow?

I’ve test normal 2d convolution and depthwise 2d conv, the latter latter is faster.
However, when move to 3d, the depthwise 3d convolutin is about 10 times slower than normal 3d conv, when depthwise 3d convolution will be optimized ?
Or whether I used a wrong configuration ?
I am using pytorch 0.4.1 and cuda 8.0.