Hi,

I have a problem concerning the inference of models with Conv3d layers on Nvidia (V100) and AMD (MI100) GPU’s. It seems like the tensor cores are not utilised at all by this layer. I tried again with a fully connected layer and could verify the utilisation using nvprof. Below, I try to provide a minimal code example (speed.py)

```
import numpy as np
import torch
import torch.nn as nn
class test_network(nn.Module):
def __init__(self, window_size=11):
super().__init__()
self.fc1 = nn.Linear(7*7*7, 128)
self.relu = nn.ReLU()
def forward(self, X):
return self.relu(self.conv11(X))
class test_network(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv3d(in_channels=1, out_channels=8, kernel_size=(3,3,3), padding=(1,0,0))
self.relu = nn.ReLU()
def forward(self, X):
return self.relu(self.fc1(X))
test_net = test_network()
number_elements = 1000000
number_loops = 50
dtype=torch.float16
device=("cuda" if torch.cuda.is_available() else "cpu")
print("Cuda is available:", torch.cuda.is_available())
test_net = test_net.to(dtype=dtype, device=device)
with torch.no_grad():
for i in range(number_loops):
torch.cuda.synchronize()
test = torch.ones((number_elements,1,7,7,7), dtype=dtype, device=device)
out = test_net(test)
torch.cuda.synchronize()
del test
```

For using this script, just (un-)comment the model that is to be tested and adapt the input size (either (number_elements,1,7,7,7) for the convolutional network or (number_elements,7 * 7 * 7) for the fully connected network).

I then test the utilisation (Nvidia machine) using

`sudo -E /usr/local/cuda/bin/nvprof -m tensor_precision_fu_utilization /usr/bin/python3 speed.py`

My setup: Nvidia V100 PCle, Cuda Driver version 12.2, latest PyTorch (pip) installation.

Below you can find a screenshot of nvprof for the Conv3d model. (Unfortunately, as a new user, I cannot embed more than one image in my post).

(Conv model)

I have not explicitly tested something like nvprof on the AMD machines, however I see similar numbers in terms of TFLOPs, so I assume the problem originates from the same source.

Any help is much appreciated.

Many thanks, kind regards,

Christian