How to prevent inf while working with exponential

Bled_Clement · March 28, 2020, 4:48pm

Hi there,

I’m trying to create a function in a network with trainable parameters. In my function I have an exponential that for large tensor values goes to infinity. What would the best way to avoid this be?

The function is as follows:

step1 = Pss-(k*Pvv)
step2 = step1*s
step3 = torch.exp(step2)
step4 = torch.log10(1+step3)
step5 = step4/s

#or equivalently
# train_curve = torch.log(1+torch.exp((Pss-k*Pvv)*s))/s

If it makes it easier to understand, the basic function is log10(1+e^(x-const)*10)/10. The exponential inside the log gets too big and goes to inf.

I think I might have to normalize my tensor x, and this would mean normalizing the constants and the rest of the function also. Would someone have any thoughts on the best way to go about this?

Thanks so much.

KFrank · March 28, 2020, 6:27pm

Hi Clement!

Please adapt the log-sum-exp trick to your specific use case.

(Also, of course, although related, log10() is not the same as
log() (“natural log,” log base-e").

Best.

K. Frank

Bled_Clement · March 28, 2020, 7:27pm

I tried this but couldn’t get it to work for my application. I’m going to give it another go, and possibly modify the source code to use base 10. Should it be as simple as replacing torch.log with torch. logsumexp()?

KFrank · March 28, 2020, 8:24pm

Hello Clement!

Please feel free to post a short, complete, runnable script
that has the original code you posted (that fails when the
tensor values become to large), together with your improved
attempt (using logsumexp() or whatever). If you show us
what fails or what the discrepancy is, someone here will
probably be able to help you out.

In isolation, such a direct replacement won’t work.

First, note that log10 (x) = log (x) / log (10).

Second, you wish to calculate
torch.log (1 + torch.exp (step2)). Because
torch.exp (0) = 1, this can be written as
torch.log (torch.exp (0) + torch.exp (step2)),
for which you can use torch.logsumexp().

Since you are working with tensors, I imagine that you would
add a new dimension of length 2 to your tensor. Along this
dimension, the first element would be that of your original
tensor, and the second element would be 0. You would then
apply torch.logsumexp() along the new dimension, e.g.,
torch.logsumexp (my_new_tensor, dim = 0), where
the extra new dimension added to create my_new_tensor is,
in this example, dim = 0.

Good luck!

K. Frank

Bled_Clement · March 28, 2020, 10:00pm

Hi K. Frank,

Thank you very much for your response. I’ve actually learned a bit from some of your other responses on the forum in previous work. I’ll give some more context to what I’m doing and more code. You can save time by skipping to the end to see what I’ve attempted. I think it could be too much to share everything needed to run a test of this module as an image and other modules are involved.

I’m training an autoencoder to denoise an image by

Using the autoencoder/UNET to output a blurry mask/residual image.
Subtracting this mask from the original image to get a noisy texture map.
Filtering this noisy texture map in the frequency domain with a Wiener filter.
Adding the output of (3) back onto the output of the autoencoder to restore detail to the final image. Basically, a two-part denoiser where the network implicitly learns to creates a mask over some parts of the image and lets the wiener filter take over.

I’m trying to make the Wiener filter learn a mapping for attenuating certain frequency magnitudes, instead of just making it a function of noise variance Pvv to signal power spectrum Pss.

Here is what I had before, where H is the filter vector to be applied to each value in the frequency block tensor:

            freq_block = torch.rfft(win_data_block, win_data_block.ndim, onesided=False)
            Pss = torch.abs(freq_block)**2
            Pss = torch.sum(Pss, 2)
            Pss = Pss.double()
            H = torch.max((Pss-Pvv), torch.zeros(Pss.size(), dtype=torch.double)) / Pss

            H = H.unsqueeze(2).repeat(1, 1, 2)

            filt_freq_block = H*freq_block

This is comparing the power spectrum magnitude of the signal to the noise power and penalises the signal for a high noise. Pss represents a 16x16 window (lets say between 0 and 30) and Pvv is the constant noise power (lets say 1.5). I want to make a new frequency coring function based on a function with trainable parameters Here is the filter I am working with, with the problem we are discussing. Most of the code is dealing with indexing and figuring out boundaries so I’ll put the part I’m struggling with in bold:



import numpy as np
import torch

def add_noise(img, std):
    noise = torch.randn(img.size()) * (std)
    noisy_image = img + noise
    return noisy_image


def wiener_3d(I, noise_std, block_size,k,s):

    width = I.shape[1]
    height = I.shape[0]
    IR = torch.zeros(height, width, dtype=torch.float64)
    # if(len(list(I.shape)) >= 3):
    #    frames = I.shape[2]
    # else:
    #   bt = 1
    bt = 1
    bx = block_size
    by = block_size

    hbx = bx/2
    hby = by/2
    hbt = bt/2

    sx = (width + hbx - 1)/hbx
    sy = (height + hby - 1)/hby

    win = torch.ones(by, bx, bt)
    win1x = torch.cos((torch.arange(-hbx + .5, hbx - .5 + 1)/bx) * np.pi)
    win1y = torch.cos((torch.arange(-hby + .5, hby - .5 + 1)/by) * np.pi)
    win1t = torch.cos((torch.arange(-hbt + .5, hbt - .5 + 1)/bt) * np.pi)

    for x in range(bx):
        for y in range(by):
            for t in range(bt):
                win[y, x, t] = win1y[y]*win1x[x]*win1t[t]

    if(bt == 1):
        win = torch.squeeze(win)

    Pvv = torch.mean(torch.pow(win, 2))*torch.numel(win)*(noise_std**2)
    Pvv = Pvv.double()
    bx0 = torch.range(0, bx-1)
    by0 = torch.range(0, by-1)

    for x in range(0, int((hbx*sx)), int(hbx)):
        for y in range(0, int((hby*sy)), int(hby)):
            # print(x,y)
            tx = np.arange(x-hbx+1, x+hbx+1)
            validx = np.arange(np.maximum(-tx[0], 0), bx - np.maximum((tx[-1]-width+1), 0))
            cx = np.minimum(np.maximum(tx, 0), width-1)
            validx = validx.astype(int)
            rcx = torch.as_tensor(tx[validx], dtype=torch.long)
            bcx = torch.as_tensor(bx0[validx], dtype=torch.long)

            ty = np.arange(y-hby+1, y+hby+1)
            validy = np.arange(np.maximum(-ty[0], 0), by - np.maximum((ty[-1]-width+1), 0))
            cy = np.minimum(np.maximum(ty, 0), width-1)
            validy = validy.astype(int)
            rcy = torch.as_tensor(ty[validy], dtype=torch.long)
            bcy = torch.as_tensor(by0[validy], dtype=torch.long)

            cy = torch.as_tensor(cy, dtype=torch.long)
            cx = torch.as_tensor(cx, dtype=torch.long)
            data_block = torch.index_select(I, 0, cy)
            data_block = torch.index_select(data_block, 1, cx)

            mean_block = torch.mean(data_block)
            win_data_block = (data_block - mean_block)*win

            freq_block = torch.rfft(win_data_block, win_data_block.ndim, onesided=False)
            Pss = torch.abs(freq_block)**2
            Pss = torch.sum(Pss, 2)
            Pss = Pss.double()

            step1 = Pss-(k*Pvv)
            step2 = step1*s
            step3 = torch.exp(step2)
            zeros = torch.zeros_like(step3)
            step4 = torch.logsumexp([step3, zeros], 0)
            step5 = step4/s

            # train_curve = torch.log(1+torch.exp((Pss-k*Pvv)*s))/s
            H = torch.max(step5, torch.zeros(Pss.size(), dtype=torch.double))  
            #H = H / normalised_Pss
            # H[H != H] = 0
            H = H.unsqueeze(2).repeat(1, 1, 2)

            filt_freq_block = H*freq_block
            filt_data_block = torch.irfft(filt_freq_block, win_data_block.ndim, onesided=False)
            filt_data_block = (filt_data_block + mean_block*win) * win
            # hbt = torch.round(hbt)

            filt_data_block = torch.index_select(filt_data_block, 0, bcy)
            filt_data_block = torch.index_select(filt_data_block, 1, bcx)
            IR[rcy[0]:rcy[-1] + 1, rcx[0]:rcx[-1] + 1] = IR[rcy[0]:rcy[-1] + 1, rcx[0]:rcx[-1] + 1] + filt_data_block
    return IR

and again the part I’m struggling with


            step1 = Pss-(k*Pvv)
            step2 = step1*s
            step3 = torch.exp(step2)
            zeros = torch.zeros_like(step3)
            step4 = torch.logsumexp([step3, zeros], 0)
            step5 = step4/s

            # train_curve = torch.log(1+torch.exp((Pss-k*Pvv)*s))/s
            H = torch.max(step5, torch.zeros(Pss.size(), dtype=torch.double))  
            H = H.unsqueeze(2).repeat(1, 1, 2)

            filt_freq_block = H*freq_block

I’m not calling the function properly but I thought I’d show you what I’ve done so far. I don’t 100% understand what you mean when you say add a new dimension of length 2.

Thanks a million,
Clément

KFrank · March 28, 2020, 11:26pm

Hello Clement!

I’m suggesting that you post a test script just for your log-exp
calculation.

Something like:

import torch

sample_tensor =  something_like_torch_randn()

original_result = torch.log (1 + torch.exp (sample_tensor))

new_result = something_with_torch_logsumexp (sample_tensor)

This should be a complete, runnable script (that is simple,
totally self-contained, and doesn’t depend on any of your
other modules or outside data). If your new_result doesn’t
run, we can see the errors and make suggestions. If it does
run, but doesn’t give the right answer, we can look for the
cause of the discrepancy.

(Your original_result will be our baseline for comparison.
It should be correct for moderate tensor values. It’s okay if it
becomes inf for large tensor values because that’s the issue
we are trying to address.)

Best.

K. Frank

Bled_Clement · June 29, 2020, 12:41pm

Hi Frank, apologies I never got back to you, I was nearing a deadline at the time. Your answer did help me understand the maths, thank you!!