Pytorch with ROCm - Benchmarks

Hi.

I’ve successfully build Pytorch 1.0 with ROCm following the instructions here :

I’m struck by the performances gap between nvidia cards and amds. Especially when you take into account those benchs made on CIFAR10 and Tensorflow :
http://blog.gpueater.com/en/2018/04/23/00011_tech_cifar10_bench_on_tf13/

I’ve experienced with a 580 radeon, and a 1080 Ti. The bench says about 30% performance drop from the nvidia to the amd, but I’m seeing more like a 85% performance drop ! I’m able to process at full gpu utilization about 9/10 times more batches per second with the nvidia card than with the amd. And if you look at the specs of the cards, the amd card isn’t supposed to be that worse to me.

I’m not running it on cifar, since the benchmark is even worse there (but the utilization of the amd card can’t go above 15% on the small model proposed by pytorch here https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html)

Is this supposed to be normal at least for the moment or is your guess that I’ve missed something in the build ? I do feel that it could be normal since the benchs on TF show that the framework utilized is pretty important for the performances, but such a difference is weird to me even with this taken into account.