Hello everyone, I’ve been thinking on how to obtain GPU performance metrics in the fast way, however, I couldn’t find a way to do so (I’m training ResNet-50 on CIFAR-10, nothing fancy)
The function I use to get GPU utilization and memory utilization values is the following :
def collect_gpu_statistics():
gpu_statistics = !nvidia-smi --query-gpu=utilization.gpu,utilization.memory --format=csv
gpu_util, memory_util = [int(x) for x in gpu_statistics[1].split(' ')[::2]]
return gpu_util, memory_util
and I call it like that in the training loop :
for train_step, (image, label) in enumerate(train_dataloader):
optimizer.zero_grad()
image = image.cuda()
label = label.cuda()
prediction = model(image)
loss = criterion(prediction, label.squeeze())
loss.backward()
optimizer.step()
scheduler.step()
gpu_util, memory_util = collect_gpu_statistics()
### do something with these metrics
## since the reduction is happening in CrossEntropyLoss itself
average_meter.update(loss.item(), 1)
However, it slows down my code from 16s per epoch to >40s per epoch, which is unacceptable
Any best practices on this topic?
P.S.
I’m using jupyter-notebook for the training, that’s why you see “!nvidia-smi …” in collect_gpu_statistics() function