I have a RTX 2080 ti and i am totally new to CUDA. Is there is a benchmark to test the speedup of tensorcores compared to normal cuda cores with pytorch ? And is there is any good book recommendation for the topic ?
If I recall correctly, Tensor core only work for half precision floats.
You can see the apex repo from nvidia about how to do half precision floats.