I’m using torch.jit.trace to export a vov99 model to torchscript Module on a A100 with 80GB gpu memory.
When I inference with the torch original torch.nn.Module model, it works well and only consumes around 2GB gpu memory. However when I trace the model with the same example input tensor, it runs out of 80GB gpu memory.
here is my code:
backbone = Net(cfgs)
checkpoint = load_checkpoint(
backbone, args.weights, map_location='cuda', strict=False,
logger=logging.Logger(__name__, logging.ERROR)
)
backbone.cuda()
backbone.eval()
# nn.Module inference tooks around 2GB gpu memory
example_forward_input = torch.rand(1, 3, 640, 1600).cuda()
original_feats = backbone(example_forward_input)
for r in original_feats:
print(r.shape)
time.sleep(5)
# Trace a specific method and construct `ScriptModule` with
# here a OOM error raised: RuntimeError: CUDA out of memory.
# Tried to allocate 142.00 MiB (GPU 0; 79.33 GiB total capacity; 78.28 GiB already allocat
# ed; 7.81 MiB free; 78.81 GiB reserved in total by PyTorch) If reserved memory is >>
# allocated memory try setting max_split_size_mb to avoid fragmentation.
# See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
backbone_module = torch.jit.trace(backbone, example_forward_input)
backbone_module.save('outputs/traced_backbone.pt')