I am trying Vtoonify model but in inference the memory keep increasing after every couple of images. In order to lower the consumption I set a standard size for input image like 320*320 so all tensors are of same size but still it keep consuming the memory. I tried dynamic quantization but results are very bad . Here is my code for quantization:
MODEL_REPO = 'PKUWilliamYang/VToonify'
vtoonify = VToonify(backbone='dualstylegan')
vtoonify.load_state_dict(torch.load(huggingface_hub.hf_hub_download(MODEL_REPO,
'models/vtoonify_d_cartoon/vtoonify_s026_d0.5.pt'),
map_location=lambda storage, loc: storage)["g_ema"], strict=False)
vtoonify.eval()
quantized_model = torch.quantization.quantize_dynamic(vtoonify, {nn.Linear}, dtype=torch.qint8)
quantized_model.qconfig = torch.ao.quantization.get_default_qconfig('x86')
requires_grad(quantized_model.generator, False)
requires_grad(quantized_model.res, False)
torch.save(
{
#"g": g_module.state_dict(),
#"d": d_module.state_dict(),
"g_ema": quantized_model.state_dict(),
},
'./converted_dynamic_vtoonify_s026_d0.5.pt'
)
This should be the result
But after quantization I get this output:
I tried mixed precision and also using with torch.no_grad()
but got no luck.
I started with 4GB GPU then moved to 8gb and now 12gb still it manages to eat all of the memory.