GPU usage profile for model training

Is there any API/MPI I can use to see the real-time profile when I load a model to GPU for training?

I generally use torch.cuda.memory_allocated to check how much memory is taken up by my tensors (models, input etc).

Thank you for the response. Do you know any method that I can get the profile for each layer? not just the memory usage but also execution time, etc.

I dont know of anything specific to PyTorch. But I have used tools suggested here https://stackoverflow.com/questions/582336/how-can-you-profile-a-script