Is it possible to used shared GPU memory while loading Stable diffusion model?

Dip_Deb · January 21, 2023, 1:30pm

I have 8GB memory graphics card. When I’m running Stable diffusion (using torch.float16) if the model big then my dedicated memory become full and showed error like
CUDA out of memory. Tried to allocate 2.32 GiB (GPU 0; 8.00 GiB total capacity; 5.66 GiB already allocated; 0 bytes free; 6.41 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

By searching through internet I found to set data_type to torch.float16 which does not help all the time.

But GPU shared memory is not using. Is there any way to use shared GPU memory to bypass the out of memory error?

tiramisuNcustard · January 21, 2023, 4:33pm

If you have set pin_memory = True then set it to False in your Dataloader and see if that helps.

Dip_Deb · January 22, 2023, 1:32pm

I’m actually using diffusers library where i didn’t find any pin_memory variable.

tiramisuNcustard · January 22, 2023, 4:10pm

pin_memory is used to allocate space in GPU RAM for faster data transfer. If there is a way to do the same thing in the diffusers library then you might want to check that. Here is something that I found on the project’s github page, I hope this is the library you are using. https://github.com/huggingface/diffusers

“If you are limited by GPU memory, you might want to consider chunking the attention computation in addition to using fp16. The following snippet should result in less than 4GB VRAM.”

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe = [pipe.to](http://pipe.to)("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
pipe.enable_attention_slicing()
image = pipe(prompt).images[0]