Hi everyone,
I trained and serialized a segmentation model in PyTorch and now I need to profile the maximum GPU memory occupied by the model in a C++ application. Can somebody tell me what is the correct way to profile GPU usage in this case?
Suppose that my executable looks like this:
int main() {
// Setup
const torch::Device device = torch::Device(torch::kCUDA, 0);
c10::InferenceMode guard; // See: https://pytorch.org/cppdocs/notes/inference_mode.html
// Load model
torch::jit::script::Module model;
std::string model_path = "path/to/model.pt";
try {
model = torch::jit::load(model_path, device);
model.eval();
}
catch (const c10::Error& e) {
std::cerr << "error loading the model\n";
}
// Build a tensor to feed the model with.
int batch_dimension = 1;
auto tensor = torch::zeros({ batch_dimension, 3, 512, 512 }).to(device);
std::vector<torch::jit::IValue> inputs;
inputs.emplace_back(tensor);
// Forward pass
auto output = model.forward(inputs).toTensor();
torch::cuda::synchronize();
return 0;
}