Hi all,
According to wikipedia(https://en.wikipedia.org/wiki/GeForce_10_series) fp16 should be 32 times slower than fp32.
However on my GPU(1080) I observe that fp16 is about 2 times faster.
It runs mostly matrix-vector multiplication (torch.mv)
I reaserched internet and found some tests on Pascal arch which proves speedup for fp16 with no further explanation.
Can someone explain that?