PyTorch on Embedded Hardware

Hi,

Background: My students and I currently build a small model sized autonomous car.
At this moment we have Hardware form Texas Instrument (TI AM572x) with 2x ARM Cortex-A15 (Linux based), 2x DSP (C6000) and 4 x EVA(Vector Processors) like used in new BeagleBone AI (BeagleBoard.org - AI)

Currently we can run PyTorch and the libtorch(we use the C++ Backend) on the Cortex-A15 Cores in Linux (Extremely slow the ARM Prozessors are not designend to execute code like this)
The Vector Processors or the DSP on this Platform was original designed to run code linke this.

Problem:
Texas Instrument support only OpenCL as Middelware for Computing on the Veotor Cores and on the DSPs. CUDA is not supported by now.
I know there no plan to Support OpenCL in the future but:
Is out there some working solution to run the PyTorch CUDA Code on a OpenCL Hardware?
We found some Projekt but we are not sure this would work(GitHub - hughperkins/coriander: Build NVIDIA® CUDA™ code for OpenCL™ 1.2 devices).

Best,
Andy

@Werner2005
Can you create an issue in our github repo, we can have some CUDA experts answer this question there.