Out of tree OpenCL backend now easy to install

I started working on OpenCL out of tree backend for pytoch about 1/2 a year ago. As a backbone I used dlprimitives for computations - many efficient deep learning operations, including efficient GEMM, efficient convolution algorithms using integrated gemm with image access, winograd convolution for 3x3 and depthwise separable convolution.

It gives good performance: around 60% of CUDA/CUDNN performance for resnet18 for training and around 80% for inference. It tested on various nVidia, AMD and Intel embedded GPUs.

Now recently many improvements for out-of-tree backend support were added to pytorch nightly and now I don’t need to use custom pytorch version and just use pytorch that is installed using pip.

Using nightly pytorch build the installation procedure is very simple you only need OpenCL API installed and optional (but recommended) SQLite library development files installed - for kernel caching.

Of course the project is still in early stages and there are likely many operators not implemented yet and probably some bugs running around :-)

I also checked against torch 1.13 and it is compatible as well but instead of using device name like ocl you need to use privateuseone and it works.

In nightly version you can call

torch.utils.rename_privateuse1_backend('ocl')

And use ‘ocl:0’ as more readable device name