Is there interest in getting an optimized kernel for Hopper GPU for the DLRM embedding kernel?
Wanted to check before I open a PR? Basically, it makes use of some hopper specific features to improve performance.
Is there interest in getting an optimized kernel for Hopper GPU for the DLRM embedding kernel?
Wanted to check before I open a PR? Basically, it makes use of some hopper specific features to improve performance.