RuntimeError: HIP out of memory


I’m using pytorch with an AMD card and rocm; I can train my model but when I try to detect something with it I run into an out of memory error:

RuntimeError: HIP out of memory. Tried to allocate 138.00 MiB (GPU 0; 7.98 GiB total capacity; 1.55 GiB already allocated; 6.41 GiB free; 1.57 GiB reserved in total by PyTorch)

It seems to me however that there is memory available, so why it fails to allocate the memory?
Is it a rocm bug?


I rebooted the system and now it works, I guess there was some core component that crashed and the system was in an unclean state.