I don’t know much about Pytorch, I’m just using it for stable diffusion, but I’m facing an annoying issue. I am plagued by OOM errors, despite having more than enough VRAM for what I’m doing.
OutOfMemoryError: CUDA out of memory. Tried to allocate 2.29 GiB (GPU 0; 24.00 GiB total capacity; 10.43 GiB already allocated; 12.05 GiB free; 10.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
As you can see pytorch tries to allocate much less memory than what is free. Sometimes it can fail to allocate even smaller chunks of memory (~1GiB), when more than 18GiB are free.
Pytorch seems incapable of reaching 50% vram utilization, it always crashes before. And it’s not normal, others don’t seem to be getting this issue.
The supporter of the app couldn’t help, so I am turning to here instead.
It happens around the scaled_dot_product_attention there:
def sdp_attnblock_forward(self, x):
h_ = x
h_ = self.norm(h_)
q = self.q(h_)
k = self.k(h_)
v = self.v(h_)
b, c, h, w = q.shape # pylint: disable=unused-variable
q, k, v = (rearrange(t, ‘b c h w → b (h w) c’) for t in (q, k, v))
dtype = q.dtype
q, k, v = q.float(), k.float(), v.float()
q = q.contiguous()
k = k.contiguous()
v = v.contiguous()
out = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=0.0, is_causal=False)
out = out.to(dtype)
out = rearrange(out, ‘b (h w) c → b c h w’, h=h)
out = self.proj_out(out)
return x + out
This is the version I am using on python 3.10.11:
10:18:37-117553 INFO Torch 2.0.1+cu118
10:18:37-129554 INFO Torch backend: nVidia CUDA 11.8 cuDNN 8700
Nvidia-smi also seems to agree with the numbers given, and shows plenty of memory free.
I don’t know what info you would need to help elucidate the issue. Just ask and I will do my best to provide.