amp_C fused kernels unavailable

divinho · November 5, 2021, 10:17am

Geting this with pytorch 1.8. It also says

you may get better performance by installing NVIDIA’s apex

But this confuses me because I thought apex was integrated into pytorch nowadays?

Could I fix this problem just by upgrading pytorch?

tom · November 5, 2021, 10:30am

It would be good if you could give more context. This is from fairseq?

The mixed precision parts have been merged (though PyTorch 1.8 is so last year… ), but apex also has some unrelated optimizations, including kernels for operating on multiple tensors (I think).
The question always is whether these optimizations are worth having in PyTorch (which has >2000 functions last someone counted, which is quite a burden at times) or whether they’re only when you want to race implementations against each other.

You could install apex and compare performance before and after to find out. If it is a large difference, there might be an argument for getting these into PyTorch. If it is just to squeeze out the large percent of performance, maybe not as much.

Best regards

Thomas

divinho · November 5, 2021, 11:41am

Thanks for the reply!

Yeah it’s fairseq; wav2vec pretraining. Trying to speed it up.

Stupid question: If I install apex I presume I then have to change the code to use that and not pytorch’s amp package?

tom · November 5, 2021, 12:48pm

As far as I understand, the AMP (now from PyTorch) and the fused kernels (still from apex) are largely independent even if the module is called amp_C, so you would only need to install apex.
In general my impression was that fairseq is relatively well maintained and the people have good access to PyTorch know-how, so their code would do the right thing.

Best regards

Thomas