Volta tensor core pytorch

zeneofa · May 18, 2018, 5:07pm

Hi,

There was previous topic on this: Pytorch on V100 GPU, but I am quite a laymen when it comes to pytorch and cuda, and the discussion seemed inconclusive to me. Is there a way to guarantee that pytorch is using the tensor cores on a volta chip?

P

ngimel · May 18, 2018, 8:14pm

Pytorch is using tensor cores on volta chip as long as your inputs are in fp16 and the dimensions of your gemms/convolutions satisfy conditions for using tensor cores (basically, gemm dimensions are multilple of 8, or, for convolutions, batch size and input and output number of channels is multiple of 8).

guo_tang · June 13, 2018, 11:10pm

Which version of pytorch is supporting this? I am using nvidia docker pytorch 18.05-py3, it has pytorch 0.4, does it support tensor cores? Or I have to compile the latest pytorch?

Thanks!

smth · June 14, 2018, 1:40am

@guo_tang 0.4 supports tensor cores

guo_tang · June 14, 2018, 4:32pm

Tried resnet50 training with fp16 input on amazon aws V100 instance, following example from https://github.com/csarofeen/examples.git. nvidia docker image pytorch 18.05-py3.

It is about 30% faster comparing with float32 training. I had my expectation set to 500%. Long way to go before the software can use all the hardware potentials.

NVCaffe float16 seems faster than pytorch, but I ran quickly into memory problem since my data set is huge (120GB for the time being, keeps on growing ), and NVCaffe seems mapping all dataset into virtual address space via lmdb.

Lyken17 · September 11, 2018, 1:16am

Hi @smth, I look at the release notes and it seems only RNN ops are supported https://github.com/pytorch/pytorch/pull/3409. Does other models like convolution support it?

smth · September 11, 2018, 3:38pm

yes, convolution ops use tensor core for a long time now.

Rizhao_Cai · March 2, 2019, 3:07am

Hello. I am using PyTorch 1.0. It also supports tensor cores, right? Do I have to make some change on code to enable TensorCores?

Rizhao_Cai · March 2, 2019, 4:09am

Hello. I still have some questions. When coding with PyTorch, if we execute half(), we set the module and input in FP16. This is what I get from “Mix-precision for training PyTorch”(https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#pytorch)

My questions are:

When we have trained a model (without Mixed-Precision), can we accelerate the inference by transferring the model to FP16 (by calling .half() ) ?
2.For using Tensor Cores for acceleration of inference, must I also use mixed-precision for training a model?
3.Once we have the model in FP16, the PyTorch will automatically use Tensor Cores? Are there any prerequisites? Like CUDA version >= 9, cuDNN enabled, etc.

Thanks A LOT!