I started developing a library that implements common DL operations in OpenCL.
It is somewhat similar to cudnn/miopen with addition of providing a library for inference and basic training.
https://github.com/artyom-beilis/dlprimitives
The project is in very early stages but it is already:
- Outperforms existing OpenCL DL implementations: plaidml and caffe/opencl-branch by 150-200% on Nvidia and AMD platforms: https://github.com/artyom-beilis/dlprimitives/blob/master/docs/summary.md
- It is validated on alexnet, resnet18/50, vgg16 and mobilenet on nVidia, AMD and Intel GPUs
I’m looking for a way to create a custom backend for pytorch. It is clear for me that it is lots of work, but it is technical part. The big, critical and complex part is writing good high performance kernels that worth something is already in very good shape.
I looked into “out-of-source” backend but the documentation is lacking and I wonder if there is some template and some minimal useful backend that I can “rewrite” for OpenCL.
Any tutorials and pointers will be appreciated