FP16 speed with pascal cards (1060/1070/1080/1080Ti)

Hi all,
According to wikipedia(https://en.wikipedia.org/wiki/GeForce_10_series) fp16 should be 32 times slower than fp32.
However on my GPU(1080) I observe that fp16 is about 2 times faster.
It runs mostly matrix-vector multiplication (torch.mv)
I reaserched internet and found some tests on Pascal arch which proves speedup for fp16 with no further explanation.

Can someone explain that?

Checked same for 1080Ti.
It also about 2 times faster