Thanks for noticing this post!
I have an image denoising network (it is a CNN) that takes an image of size
1*H*W as input and generates a smoothed image version of size
1*H*W. Now I have four NVIDIA RTX 3090 GPUs and 100 large 8K images of size
1*7680*4320. I want to run a test on them. However, the memory of one GPU seems not enough for such a high resolution. The following shows my requirements and some efforts/questions:
Q1. I want to test all of the 100 large 8K images, i.e., all the images should be passed through my network to get their denoised (smoothed) versions.
Q2. It seems that the test on GPU requires very much memory (maybe about 40~80GB). One 3090 GPU only has about 24~25GB of memory, so I can not accomplish the test on one 3090 GPU.
Q3. I have tried a test on the CPU. Although the memory problem may be avoided, the test speed is too low (about 10 minutes per image). Therefore, I hope I can test the images on GPUs.
Q4. The images should not be split into smaller patches or blocks. In other words, I want to pass the complete (whole) 8K images through my network.
Q5. I have activated all four 3090 GPUs and tried the modules of
torch.nn.DataParallel. However, it seems that only the first GPU is utilized. Is there a way to take full use of all the four 3090 GPUs, to support the test of large 8K images?
I have tried
torch.no_grad(), and so on. However, it seems that the memory is still not enough.
I do not know if there are some mechanisms to test one image on the four GPUs or perform calculations on GPUs with the help of local host memory.
My mind may be in a muddle now. So the above things may be not well-organized.
Any suggestion is welcomed.
I am still searching, thinking, and trying …
Thanks a lot for reading such a long post!