Profile GPU usage of a model during inference

Hi everyone,

I trained and serialized a segmentation model in PyTorch and now I need to profile the maximum GPU memory occupied by the model in a C++ application. Can somebody tell me what is the correct way to profile GPU usage in this case?

Suppose that my executable looks like this:

int main() {

   // Setup
   const torch::Device device = torch::Device(torch::kCUDA, 0);
   c10::InferenceMode guard; // See: https://pytorch.org/cppdocs/notes/inference_mode.html

   // Load model
   torch::jit::script::Module model;
   std::string model_path = "path/to/model.pt";
   try {
       model = torch::jit::load(model_path, device);
       model.eval();
   }
   catch (const c10::Error& e) {
       std::cerr << "error loading the model\n";
   }

   // Build a tensor to feed the model with.
   int batch_dimension = 1;
   auto tensor = torch::zeros({ batch_dimension, 3, 512, 512 }).to(device);
   std::vector<torch::jit::IValue> inputs;
   inputs.emplace_back(tensor);

   // Forward pass
   auto output = model.forward(inputs).toTensor();
   torch::cuda::synchronize();

   return 0;
}

Seems like nvprof might be useful in your case.
Suppose ./a.out is the executable, you can profile it using nvprof ./a.out

more about nvprof here - Profiler :: CUDA Toolkit Documentation