transforms.GaussianBlur behaving weird

I have a batch of masks m with float values that are either 1.0 or 0.0 that I want to blur in order to get smoother edges. I use transforms.GaussianBlur with kernelsize 3 and sigma 0.5, but it behaves weird and outputs values >1. (This should not happen as the max value in the mask is 1.0 and the weights of the kernel obv sum up to 1). I looked into the function and realized that the “mistake” occurs in nn.functional.conv2d, prior to the convolution everything seems right. I tried to implement the kernel myself and interestingly got another result (which is also wrong).

How to approach this? Any help is highly appreciated.

Here is the code to replicate, first print returns tensor(1.0005, device=‘cuda:0’), second one returns tensor(0.9997, device=‘cuda:0’) I can provide the tensor m.pt if needed. Please let me know how in that case. Weird behavior also occurs for sigma=1.9844247102737427 or sigma=0.5011510252952576. Note that I only used the first mask in the batch, but the behavior also occurs when using the whole batch.

    m = torch.load("m.pt")
    m = (m > 0)*1.0

    gb = transforms.GaussianBlur(3, sigma=0.5)
    mask = gb(m.unsqueeze(1)[0])
    print(mask.max())

    # own kernel
    x = torch.linspace(-1, 1, steps=3)
    pdf = torch.exp(-0.5 * (x / 0.5).pow(2))
    kernel1d = pdf / pdf.sum()

    kernel2d = torch.mm(kernel1d[:, None], kernel1d[None, :])
    kernel2d = kernel2d.view(1,1,3,3).repeat(1, 1, 1, 1).to("cuda")

    padding = [3 // 2, 3 // 2, 3 // 2, 3 // 2]

    img = m[0].view(1,1,128,128)

    img = torch.nn.functional._pad(img, padding, mode="reflect")
    img = torch.nn.functional.conv2d(img, kernel2d)

    print(img.max())

You might be running into these small errors due to the limited floating point precision, but I would expect to see smaller errors.
If you are using an Ampere GPU, could you disable TF32 via torch.backends.cudnn.allow_tf32 = False to avoid the internal rounding for a performance penalty and see if this reduces the error?

Hey, thanks for your reply! I am indeed running on an Ampere GPU; i set torch.backends.cudnn.allow_tf32 = False and the errors reduced a bit, interestingly the output for transforms.GaussianBlur is still off, while the output of my manual implementation is correct, guess I will just go with the code under # own kernel. Thanks!

Experimented some more, now i get negative values after convolution, will return with more info

I cannot seem to solve the problem on my own, the kernel has only positive values so does m but the output of the convolution is (slightly) negative values in it.

Are you getting the negative values using the GaussianBlur transformation or your custom method?
In the former case, which setup are you using? Could you post the output of python -m torch.utils.collect_env?

They occur with the custom method. Here the requested info:

Collecting environment information…
PyTorch version: 1.10.2
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.9

Python version: 3.7.0 (default, Oct 9 2018, 10:31:47) [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-5.13.0-30-generic-x86_64-with-debian-bullseye-sid
Is CUDA available: True
CUDA runtime version: 11.6.112
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 510.47.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.1
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.2
[pip3] pytorch-lightning==1.5.10
[pip3] torch==1.10.2
[pip3] torch-tb-profiler==0.3.1
[pip3] torchaudio==0.10.2
[pip3] torchmetrics==0.7.2
[pip3] torchvision==0.11.3
[conda] _pytorch_select 0.1 cpu_0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py37h7f8727e_0
[conda] mkl_fft 1.3.1 py37hd3c417c_0
[conda] mkl_random 1.2.2 py37h51133e4_0
[conda] numpy 1.21.2 py37h20f2e39_0
[conda] numpy-base 1.21.2 py37h79a1101_0
[conda] pytorch 1.10.2 py3.7_cuda11.3_cudnn8.2.0_0 pytorch
[conda] pytorch-lightning 1.5.10 pypi_0 pypi
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch-tb-profiler 0.3.1 pypi_0 pypi
[conda] torchaudio 0.10.2 py37_cu113 pytorch
[conda] torchmetrics 0.7.2 pypi_0 pypi
[conda] torchvision 0.11.3 py37_cu113 pytorch

I cannot reproduce the issue using your code from here:

x = torch.linspace(-1, 1, steps=3)
pdf = torch.exp(-0.5 * (x / 0.5).pow(2))
kernel1d = pdf / pdf.sum()

kernel2d = torch.mm(kernel1d[:, None], kernel1d[None, :])
kernel2d = kernel2d.view(1,1,3,3).repeat(1, 1, 1, 1).to("cuda")

padding = [3 // 2, 3 // 2, 3 // 2, 3 // 2]

for _ in range(100):
    img = torch.rand((1,1,128,128)).float().cuda()
    
    img = torch.nn.functional._pad(img, padding, mode="reflect")
    img = torch.nn.functional.conv2d(img, kernel2d)
    
    print(img.min(), img.max())

Like this it also does not reproduce the issue for me. Can I share a file with you? Its a batch of 32 masks with dims 128x128. With those I get the error if i perform the convolution on the entire batch like this:

img = m.view(-1, 1, 128,128)
img = torch.nn.functional._pad(img, padding, mode="reflect")
img = torch.nn.functional.conv2d(img, kernel2d)

Here, the max value is 1.0 and the min value -2.9e-7. If I do it like this:

for mask in m:
    img = mask.view(1,1,128,128)
    img = torch.nn.functional._pad(img, padding, mode="reflect")
    img = torch.nn.functional.conv2d(img, kernel2d)

The min is 0. and the max is 0.9997