Pytorch Profiler not profiling GPU on Windows

ChowderII · March 17, 2022, 4:52am

Hello everyone,

I’m new here, hopefully I write this in the correct way.

I’ve recently gotten to use PyTorch’s profiler but I can’t seem to see any activity on my GPU as far as the profiler is concerned. Currently I’m running the example as seen on this guide. The code runs no problem and compiles. I can see activity on my GPU and the CUDA graph in task manager (showing specifically the CUDA graph, I did my homework) is showing activity when I run the code so it clearly is not PyTorch nor CUDA my problem.

Here is the code I use to create the model and start the profiler:

def main():
    transform = T.Compose(
        [T.Resize(224),
         T.ToTensor(),
         T.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
    train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
    train_loader = torch.utils.data.DataLoader(train_set, batch_size=32, shuffle=True, num_workers=4)

    device = torch.device("cuda:0")
    model = torchvision.models.resnet18(pretrained=True).to(device)
    criterion = torch.nn.CrossEntropyLoss().to(device)
    optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
    model.train()

    def train(data):
        inputs, labels = data[0].to(device), data[1].to(device)
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    with torch.profiler.profile(
            schedule=torch.profiler.schedule(wait=1, warmup=1, active=3),
            on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'),
            record_shapes=True,
            profile_memory=True,
            with_stack=True
    ) as prof:
        for step, batch_data in enumerate(train_loader):
            if step >= 200:  # tried increasing this because maybe it wasn't running long enough?
                break
            train(batch_data)
            prof.step()  # Need to call this at the end of each step to notify profiler of steps' boundary.


if __name__ == '__main__':
    main()

And yet when I open TensorBoard at the appropriate location I see only this:

And the GPU Kernel element is completely missing from the side bar as well.

I don’t understand what I am doing wrong in this case. Is this just a Windows problem or am I using the profiler incorrectly?

Thank you for your help!
ChowderII

Autumnii · July 13, 2022, 1:35pm

Hi Chowderll, have you found the solution? I’m using Windows too and that’s the same problem with me. Just CPU and other, no GPU.

KuSi833 · September 6, 2022, 10:32pm

Same problem here. Am also on Windows and can’t see the GPU summary.

mfatih7 · September 29, 2022, 10:29am

Hello

I am using WIN10, and I can’t see any GPU Kernel option on tensorboard View list.

Piadina · October 27, 2022, 10:12pm

Hi, I am experiencing the very same problem on Windows 10 and my model is definitely running on the GPU.

Any clue?

lyclyc121 · November 25, 2022, 9:17am

I have the same problem, is anyone care and solve this?

rojas561 · August 22, 2023, 5:38pm

Same issue here. Any idea on this @ptrblck? Your profile seems to pop up in all the relevant forms I’ve seen throughout the years so here’s to hoping you know what’s going on.

ptrblck · August 22, 2023, 5:51pm

Sorry, but I’m neither familiar with Windows nor with the native PyTorch profiler and am using Nsight Systems to profile models.
You could try to use it too: Getting Started with Nsight Systems | NVIDIA Developer

rojas561 · August 22, 2023, 6:18pm

I’ll check it out, thank you!

J.S_Ye · August 29, 2023, 1:00pm

Have anyone solved this problem or work out some clues? I have encountered the same issue.

haitao_jaing · April 29, 2024, 2:55am

Maybe I’m late, but has anyone solved this problem?
my environment：
cuda11.8
win10
i512500k
rtx4070ti

Ankur_Singh · August 15, 2024, 1:24am

Was anyone able to resolve this? I’m facing the same issue.

Chen_Zejun · October 1, 2024, 1:32am

Hi, all
The CUPTI seems have issue working on Windows platform, so the CUPTI is anyway disabled when building on Windows, that is why you don’t have any GPU op info traced. It looks like the kineto supports Windows?
Thank you.

vattha · February 20, 2025, 3:27am

I have also the same problem.

neoncube · April 9, 2025, 8:55am

As someone who is able to profile GPU usage on Windows, I’d be happy to help troubleshoot

I recently posted some GPU profiling code that worked for me: Model() uses GPU but backwards() doesn't - #3 by neoncube

It looks like your code isn’t passing activities=[profiler.ProfilerActivity.CPU, profiler.ProfilerActivity.CUDA] to torch.profiler.profile(), which I thought was required.

Also, for me, passing both activities=[profiler.ProfilerActivity.CPU, profiler.ProfilerActivity.CUDA] and with_stack=True crashed the process with no error, so you might want to try removing record_shapes, profile_memory, and with_stack.

I’d also be curious to see if calling prof.key_averages().table(sort_by='cuda_time_total', row_limit=10 or prof.key_averages().table(sort_by='cpu_time_total', row_limit=10)works and if this is just an issue exporting to Tensorboard, specifically.

Jadiker · May 25, 2025, 7:31pm

I’m currently using Windows 10 and torch 1.13.1+cu117 on an ROG Zephyrus G14, and can confirm that print(prof.key_averages().table(sort_by="self_cuda_time_total", row_limit=10)) shows Self CUDA time total: 749.103ms for the following script:

import torch
from torch.profiler import profile, ProfilerActivity

x = torch.randn(4000, 4000, device='cuda')
y = torch.randn(4000, 4000, device='cuda')

with profile(
    activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA]
) as prof:
    z = torch.matmul(x, y)
    torch.cuda.synchronize()  # ensure kernel finishes

prof.export_chrome_trace("cuda_matmul_trace.json")

print(prof.key_averages().table(sort_by="self_cuda_time_total", row_limit=10))

However, when I load the output .json file in chrome://trace, there’s no GPU events. Additionally, I uploaded the json file itself to ChatGPT which verified that no GPU events were included in the trace.

Thus, this does seem to be an issue in the export, rather than the tracking.

v01d · June 11, 2025, 4:17pm

For me, a restart helped

ng_xing · June 22, 2025, 4:22pm

In my case, the issue was resolved after ensuring that the PyTorch CUDA version matches the CUDA version of my device. You can check your PyTorch version using pip show torch and the cuda version using nvidia-smi. For example, if your PyTorch version is 2.6.0+cu128, your cuda version should be CUDA 12.8.