Volta tensor core pytorch


There was previous topic on this: Pytorch on V100 GPU, but I am quite a laymen when it comes to pytorch and cuda, and the discussion seemed inconclusive to me. Is there a way to guarantee that pytorch is using the tensor cores on a volta chip?


Pytorch is using tensor cores on volta chip as long as your inputs are in fp16 and the dimensions of your gemms/convolutions satisfy conditions for using tensor cores (basically, gemm dimensions are multilple of 8, or, for convolutions, batch size and input and output number of channels is multiple of 8).


Which version of pytorch is supporting this? I am using nvidia docker pytorch 18.05-py3, it has pytorch 0.4, does it support tensor cores? Or I have to compile the latest pytorch?


@guo_tang 0.4 supports tensor cores

1 Like

Tried resnet50 training with fp16 input on amazon aws V100 instance, following example from https://github.com/csarofeen/examples.git. nvidia docker image pytorch 18.05-py3.

It is about 30% faster comparing with float32 training. I had my expectation set to 500%. Long way to go before the software can use all the hardware potentials.

NVCaffe float16 seems faster than pytorch, but I ran quickly into memory problem since my data set is huge (120GB for the time being, keeps on growing ), and NVCaffe seems mapping all dataset into virtual address space via lmdb.

Hi @smth, I look at the release notes and it seems only RNN ops are supported https://github.com/pytorch/pytorch/pull/3409. Does other models like convolution support it?

yes, convolution ops use tensor core for a long time now.


Hello. I am using PyTorch 1.0. It also supports tensor cores, right? Do I have to make some change on code to enable TensorCores?

Hello. I still have some questions. When coding with PyTorch, if we execute half(), we set the module and input in FP16. This is what I get from “Mix-precision for training PyTorch”(https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#pytorch)

My questions are:

  1. When we have trained a model (without Mixed-Precision), can we accelerate the inference by transferring the model to FP16 (by calling .half() ) ?
    2.For using Tensor Cores for acceleration of inference, must I also use mixed-precision for training a model?
    3.Once we have the model in FP16, the PyTorch will automatically use Tensor Cores? Are there any prerequisites? Like CUDA version >= 9, cuDNN enabled, etc.

Thanks A LOT!