I’m using torch.jit.trace to export a vov99 model to torchscript Module on a A100 with 80GB gpu memory.
When I inference with the torch original torch.nn.Module model, it works well and only consumes around 2GB gpu memory. However when I trace the model with the same example input tensor, it runs out of 80GB gpu memory.
here is my code:
backbone = Net(cfgs) checkpoint = load_checkpoint( backbone, args.weights, map_location='cuda', strict=False, logger=logging.Logger(__name__, logging.ERROR) ) backbone.cuda() backbone.eval() # nn.Module inference tooks around 2GB gpu memory example_forward_input = torch.rand(1, 3, 640, 1600).cuda() original_feats = backbone(example_forward_input) for r in original_feats: print(r.shape) time.sleep(5) # Trace a specific method and construct `ScriptModule` with # here a OOM error raised: RuntimeError: CUDA out of memory. # Tried to allocate 142.00 MiB (GPU 0; 79.33 GiB total capacity; 78.28 GiB already allocat # ed; 7.81 MiB free; 78.81 GiB reserved in total by PyTorch) If reserved memory is >> # allocated memory try setting max_split_size_mb to avoid fragmentation. # See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF backbone_module = torch.jit.trace(backbone, example_forward_input) backbone_module.save('outputs/traced_backbone.pt')