Reduce GPU memory cost in building custom Convolution layer

The native implementations are mentioned in this post, which might be helpful in writing your custom implementation.