GPU Utilization

Hello everyone,

I’m training a CNN, that looks like

model = nn.Sequential(
sl.FlattenTime(),
nn.Conv2d(2, 8, kernel_size=3, padding=1, bias=False),
# sl.IAFSqueeze(batch_size=batch_size, min_v_mem=-1),
nn.MaxPool2d(2),
# sl.SumPool2d(2),
nn.Conv2d(8, 16, kernel_size=3, padding=1, bias=False),
# sl.IAFSqueeze(batch_size=batch_size, min_v_mem=-1),
nn.MaxPool2d(2),
# sl.SumPool2d(2),
nn.Conv2d(16, 32, kernel_size=3, padding=1, bias=False),
# sl.IAFSqueeze(batch_size=batch_size, min_v_mem=-1),
nn.MaxPool2d(2),
# sl.SumPool2d(2),
nn.Conv2d(32, 64, kernel_size=3, padding=1, bias=False),
# sl.IAFSqueeze(batch_size=batch_size, min_v_mem=-1),
nn.MaxPool2d(2),
# sl.SumPool2d(2),
nn.Conv2d(64, 10, kernel_size=2, padding=0, bias=False),
# sl.IAFSqueeze(batch_size=batch_size, min_v_mem=-1),
nn.Flatten(),
sl.UnflattenTime(batch_size=batch_size),
).to(device)

sl is a module named sinabs that is based on pytorch and can be used with it.
It also utilizes the gpu the same as pytorch.
When I’m training the model the cuda utilisation graph on task manager shows spikes every few seconds of 50% usage.
Screenshot 2024-03-25 213919
It’s worth mentioning that every spike of usage matches the iteration step in the training loop.

It also takes long to train.
My GPU is 1060 6gb and torch.cuda.is_available() returns True.

Also when using nvidia-smi it says 0% utilization most of the time and again it shows spikes of 100% utilization every few seconds…

You could profile your code via e.g. Nsight Systems to narrow down the bottlenecks and to isolate which part of the code is responsible for the low utilization.