Reduce GPU Memory blocked by context

Thanks for the reply! So do I understand correctly that this memory will always be occupied one way or another if I want to use cuda functionality, no matter if it’s via pytorch or if I’d e.g. export the models to onnx and use them from c++?