"New Tensor Cores eligible operator found" when running tensorboard on output of pytorch profiler

divinho · March 14, 2023, 1:21am

What does this message mean? It happens when I run tensorboard --logdir on the output of a pytorch profile.

In my case the full message is: New Tensor Cores eligible operator found: 'aten::thnn_conv2d_backward'!

Is this saying tensor cores aren’t being used when they could be or something?

ptrblck · March 14, 2023, 1:26am

I’m not sure of you are using DLProf or which library raises the warning so also don’t know how to interpret it. However, in case you are using an Ampere+ GPU TensorCores could be used via TF32 in cuDNN and should be used when mixed-precision training is used.

divinho · March 14, 2023, 1:32am

Yes I’m using A100 GPUs. I’m using torch.profiler and prof.export_chrome_trace("trace.json").

I am using mixed precision training. I also notice many kernels being launched for a single conv layer forward call, which seems wrong to me (why can’t the conv operation be done in one kernel?), which is why I’m asking myself if things are somehow setup suboptimally by not the right kernels being used.

ptrblck · March 14, 2023, 5:33am

If you are using torch.backends.cudnn.benchmark = True each new workload will profile multiple kernels which could be shown in the profile.
If that’s not the case, please share a minimal and executable code snippet showing the unexpected behavior.