I’m trying to use Conv2D as a blur filter for a data matrix with a size of NxN.
After that, I will try to do the same method for a 3D data matrix with a size of NxNxN.

So to start with the Conv2D:
I initialize a matrix of 200x200 with random values [0,1] with min=0, max=1, mean=0.5
I expected that after several (>100) iterations, the min, max, and mean values of the matrix would be ~0.5
But I got something else. The values can go even below 0 or greater than 1.
Can you tell me what I am doing wrong with the filter setup?

from timeit import default_timer as timer
import torch
N = 100
matSize = 200
mat = torch.rand((matSize, matSize))
c = torch.nn.Conv2d(in_channels=1, out_channels=1, padding=1, kernel_size=3, groups=1)
c.weight = torch.nn.Parameter(torch.ones_like(c.weight)/9)
mat = torch.unsqueeze(mat, axis=0)
print("Initial result: min={:.3f} //max={:.3f} //mean={:.3f}".format(mat.min(), mat.max(), mat.mean()))
start = timer()
for i in range(N):
mat = c(mat)
print("Final result: min={:.3f} //max={:.3f} //mean={:.3f}".format(mat.min(), mat.max(), mat.mean()))
print("Time for {} iterations: {:.5f} sec".format(N, timer() - start))

Now i have a problem with my memory.
I converted the code to work with conv3D. As you can see, I ran this convolution several times (>500). After several iterations, I get this error:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 6.00 GiB total capacity; 5.04 GiB already allocated; 0 bytes free; 5.28 GiB reserved in total by PyTorch)

The first question is, why am I getting this error? I overwrite the matrix/tensor mat in each iteration and do not allocate a new one.

Second question, how can I free the memory? I tried with torch.cuda.empty_cache() after several iterations. But the memory is still getting filled.

from timeit import default_timer as timer
import torch
N = 500
matSize = 200
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)
## Test with PyTorch
print("Testing 3D with Torch ")
kernel_size = 3
mat = torch.rand((matSize, matSize, matSize))
c = torch.nn.Conv3d(in_channels=1, out_channels=1, padding=1, kernel_size=kernel_size, groups=1, bias=False)
c.weight = torch.nn.Parameter(torch.ones_like(c.weight) / (kernel_size ** 3))
mat = mat.to(device)
c = c.to(device)
mat = torch.unsqueeze(mat, axis=0)
print("initial result: min={:.3f} //max={:.3f} //mean={:.3f}".format(mat.min(), mat.max(), mat.mean()))
start = timer()
for i in range(N):
mat = c(mat)
if i % 100 == 0:
print("Empty cache")
torch.cuda.empty_cache()
print("Final result: min={:.3f} //max={:.3f} //mean={:.3f}".format(mat.min(), mat.max(), mat.mean()))
print("Time for {} iterations: {:.5f} sec".format(N, timer() - start))