What does this message mean? It happens when I run tensorboard --logdir on the output of a pytorch profile.
In my case the full message is:
New Tensor Cores eligible operator found: 'aten::thnn_conv2d_backward'!
Is this saying tensor cores aren’t being used when they could be or something?
I’m not sure of you are using DLProf or which library raises the warning so also don’t know how to interpret it. However, in case you are using an Ampere+ GPU TensorCores could be used via
TF32 in cuDNN and should be used when mixed-precision training is used.
Yes I’m using A100 GPUs. I’m using torch.profiler and
I am using mixed precision training. I also notice many kernels being launched for a single conv layer forward call, which seems wrong to me (why can’t the conv operation be done in one kernel?), which is why I’m asking myself if things are somehow setup suboptimally by not the right kernels being used.
If you are using
torch.backends.cudnn.benchmark = True each new workload will profile multiple kernels which could be shown in the profile.
If that’s not the case, please share a minimal and executable code snippet showing the unexpected behavior.