Network not learning a specific parameter

I’m creating an autoencoder that uses convolutions to learn some parameters, and then uses native PyTorch functions to re-create the input. But when I test the network I find that it fails to learn one of the parameters used to create the input images completely.

I was wondering if anyone could see why this would happen?

Here is what the testing true vs. prediction plots look like for the 4 parameters I care about. The bottom left is the one that appears to go weird (and is called V_SCALE_LENGTH in the model below).

semantic_AE_REMOTE

NOTE: I also see that the top righ panel shows a parameter that is not well learned but that’s okay as this is a smaller part of a larger network that I’ve isolated into this example because of the problem of the bottom left parameter. I also know that in the model below, the bottom left parameter needs to use both the right-hand parameters above to create recreate the input. But I still see this parameter not learning when I tighten up that parameter in the top right.

In my full-sized network I see much poorer performance than this where the parameter goes to the maximum value allowed by the hard tanh activation function at the end of the linear module. At least here I’m seeing a sort-of increase from left to right on the true vs predicted plot. Which makes me think I’m missing something in the data handling that maybe someone can see…

The 2D input images are created using the following function:

def return_cube(batch):
    """ CREATION OF CUBES WITH VARIOUS ORIENTATIONS """
    
    # Define the properties of the objects in the input cubes _________________
        
    # Z angle
    pos_ang = torch.tensor([np.random.uniform(-np.pi,np.pi,batch)]).view(-1)[:,None,None,None]
    # X angle
    inc_ang = torch.tensor([np.random.uniform(np.pi/18,np.pi/2,batch)]).view(-1)[:,None,None,None]
    # Where the velocity reaches it's highes. i.e. r(max(V))
    ah = torch.tensor([np.random.uniform(1,32,batch)]).view(-1)[:,None,None,None]
    # The maximum velocity reached. i.e. max(V(r)) <-- THIS IS THE TROUBLE MAKER!
    Vh = torch.tensor([np.random.uniform(0.1,1,batch)]).view(-1)[:,None,None,None]
    
    batch_size = batch # The input batch size to the network
    
    # Create radius cubes _____________________________________________________

    l = torch.arange(0 - 63/2., (63/2.)+1)
    yyy, xxx, zzz = torch.meshgrid(l,l,l)   
    xxx, yyy, zzz = xxx.repeat(batch_size,1,1,1), yyy.repeat(batch_size,1,1,1), zzz.repeat(batch_size,1,1,1)
    
    # Rotate the cubes ________________________________________________________
    
    rr, xx, yy, RADIUS_XY, zz = rotation_all(xxx,yyy,zzz,inc_ang,pos_ang)
    
    # Create BRIGHTNESS cube to make the object a thin disk rather than _______
    # a sphere ________________________________________________________________
    
    BRIGHTNESS = torch.exp(-RADIUS_XY/16) * torch.exp(-torch.abs(zz)/2) 
    
    # NOTE ____________________________________________________________________
    # 16 is arbitrarily chosen to allow the network to learn Z_ANGLE better
    # 2 is chosen to make the z brightness drop off really fast to emulate 
    # a thin disk
    # _________________________________________________________________________
    
    # Get the max radii in the X and Y cubes __________________________________
    
    xmax = xx.max(1)[0].max(1)[0].max(1)[0][:,None,None,None] 
    ymax = yy.max(1)[0].max(1)[0].max(1)[0][:,None,None,None] 
    
    # Create the Velocity cubes V(r) __________________________________________
    
    v_rr = (2/pi) * (1.2732395447351628) * torch.atan((pi*rr)/ah) # Normalised atan function
    
    # Change Velocity cubes to line-of-sight velocity cubes along the z axis __
    
    thetas = torch.atan2(yy/ymax, xx/xmax) - pi/2
    thetas = torch.sin(thetas)
    thetas = thetas * torch.sin(inc_ang) 
    VELOCITY = v_rr * Vh * thetas
    
    # Weight by the brightness cubes to make a thin disk not a sphere _________
    
    VELOCITY = VELOCITY * BRIGHTNESS
    
    # Flatten along the Z axis to make the 2D velocity image __________________
    
    VELOCITY = VELOCITY.sum(axis=3)
    VELOCITY = VELOCITY.unsqueeze(1)        
    
    # Normalise between 0 and 1 (across dataset not individually) _____________
    VELOCITY = VELOCITY / VELOCITY.max(1)[0].max(1)[0].max(1)[0][:,None,None,None]
    VELOCITY *= Vh
           
    return VELOCITY,pos_ang,inc_ang,ah,Vh

And the model looks as follows:

import torch
from math import pi as pi
  
#=============================================================================#
#/////////////////////////////////////////////////////////////////////////////#
#=============================================================================#

class CAE(torch.nn.Module):
    """ PYTORCH CONVOLUTIONAL AUTOENCODER WITH FUNCTIONAL DECODER """
               
    def __init__(self, nodes, xxx=None, yyy=None, zzz=None):
        super().__init__()
    
        self.xxx = xxx
        self.yyy = yyy
        self.zzz = zzz
        
        self.nodes = nodes
        self.conv1 = torch.nn.Conv2d(1,16,3,1,padding=1)
        self.conv2 = torch.nn.Conv2d(16,32,3,1,padding=1)
        self.conv3 = torch.nn.Conv2d(32,64,3,1,padding=1)
        self.conv4 = torch.nn.Conv2d(64,128,3,padding=1)
        self.pool = torch.nn.MaxPool2d(2)

        self.lc1 = torch.nn.Linear(2048,256)
        self.lc2 = torch.nn.Linear(256,self.nodes)
        self.relu = torch.nn.ReLU()
        self.hardtanh = torch.nn.Hardtanh(min_val=-1,max_val=1.)
    
    def VELOCITY_encoder(self,x):
        """ FEATURE EXTRACTION THROUGH CONVOLUTIONAL LAYERS """
        
        x = self.conv1(x)
        x = self.pool(x)
        x = self.conv2(x)
        x = self.relu(x)
        x = self.pool(x)        
        x = self.conv3(x)
        x = self.relu(x)
        x = self.pool(x)
        x = self.conv4(x)
        x = self.relu(x)
        x = self.pool(x)
        x = x.view(int(x.size()[0]),-1)
        return x
    
    def encoder_linear(self,x):
        """ LINEARLY CONNECTED LAYERS FOR REGRESSION OF REQUIRED PARAMETERS """
        
        x = self.lc1(x)
        x = self.relu(x)
        x = self.lc2(x)
        x = self.hardtanh(x)
        return x
    
    def recover_angle(self, theta_1, theta_2):
        """ RECOVERING THE ANGLE AFTER SCALE_LENGTH 2D ROTATION USING EULER MATRICES """

        pos = torch.atan2(theta_1,theta_2) 
        return pos
        
    def create_velocity_values(self, radius_tensor):
        """ GET THE LINE OF SIGHT VELOCITYOCITIES FOR EACH VOXEL IN THE RADIUS CUBE """

        vel = (2/pi)*(1.273239)*(torch.atan(pi*radius_tensor)) # 1.273... = 1/arctan(1)
        return vel
    
    def de_regularise(self,tensor,minimum,maximum):
        """ RECOVER THE PARAMETERS OUT OF THE NORMALISED RANGE [-1, 1] """
        
        tensor=tensor+1.
        tensor=tensor*(maximum-minimum)/2.
        tensor=tensor+minimum
        return tensor
              
    def rotation_all(self,x,y,z,X_ANGLE,Z_ANGLE):
        """ SCALE_LENGTH function for rotating 3d particles about the z and then x axes """
        
        _theta = Z_ANGLE + pi/2 # Correct Z_ANGLE for Pythonic rotations
        
        xmax = x.max(1)[0].max(1)[0].max(1)[0][:,None,None,None]
        ymax = y.max(1)[0].max(1)[0].max(1)[0][:,None,None,None] 
        
        xx = ((x/xmax)*torch.cos(_theta) - (y/ymax)*torch.sin(_theta))*xmax # Z-Rotation of x coords
        _y = ((x/xmax)*torch.sin(_theta) + (y/ymax)*torch.cos(_theta))*ymax # Z-Rotation of y coords  
        
        _ymax = _y.max(1)[0].max(1)[0].max(1)[0][:,None,None,None]
        zmax = z.max(1)[0].max(1)[0].max(1)[0][:,None,None,None]    
        
        yy = ((_y/_ymax)*torch.cos(X_ANGLE) - (z/zmax)*torch.sin(X_ANGLE))*_ymax # X-Rotation of y coords
        zz = ((_y/_ymax)*torch.sin(X_ANGLE) + (z/zmax)*torch.cos(X_ANGLE))*zmax # X-Rotation of z coords
        
        RADIUS_ARRAY = torch.sqrt((xx**2)+(yy**2)+(zz**2)) # Get the cube of radii R(x,y,z)
        XY_RADIUS_ARRAY = torch.sqrt((xx**2)+(yy**2)) # Get the cube of xy radii only R(x,y)
                
        return RADIUS_ARRAY, xx, _y, XY_RADIUS_ARRAY, torch.abs(zz)
    
    def cube_maker(self, x, VELOCITY_INPUT, shape):
        """ GENERATE THE OUTPUT CUBE USING PYTORCH FUNCTIONS """  

        Z_ANGLE = self.recover_angle(x[:,0].clone(),x[:,1].clone())[:,None,None,None]    ### Poition angle
        X_ANGLE = self.de_regularise(x[:,2].clone(),pi/18,pi/2)[:,None,None,None]   ### Inclination
        V_SCALE_LENGTH = self.de_regularise(x[:,3].clone(),1,32)[:,None,None,None]     ### DM halo scale length
        MAX_VELOCITY = self.de_regularise(x[:,4].clone(),0.1,1)[:,None,None,None]       ### Maximum velocity allowed
                                             
        # Create radius cube __________________________________________________

        rr_t_v, xx, yy, XY_RADIUS_ARRAY, zz = self.rotation_all(self.xxx,self.yyy,self.zzz,X_ANGLE,Z_ANGLE) 
                
        # Create BRIGHTNESS cube to make the object  __________________________
        # a thin disk rather than a sphere ____________________________________
        
        BRIGHTNESS = torch.exp(-XY_RADIUS_ARRAY/16) * torch.exp(-torch.abs(zz)/2) 
        # NOTE ________________________________________________________________
        # 16 is arbitrarily chosen to allow the network to learn Z_ANGLE better
        # 2 is chosen to make the z brightness drop off really fast to emulate 
        # a thin disk
        # _____________________________________________________________________
                     
        # Create VELOCITY cube ________________________________________________
        
        xmax = xx.max(1)[0].max(1)[0].max(1)[0][:,None,None,None]
        ymax = yy.max(1)[0].max(1)[0].max(1)[0][:,None,None,None] 
            
        # Convert to line-of-sight VELOCITY along axis_3 ______________________
        
        VELOCITY = self.create_velocity_values(rr_t_v/V_SCALE_LENGTH)                                     
        thetas = torch.atan2(yy/ymax, xx/xmax) - pi/2
        thetas = torch.sin(thetas)
        thetas = thetas * torch.sin(X_ANGLE) 
        VELOCITY = VELOCITY * thetas 
               
        # Create velocity map _________________________________________________
        
        VELOCITY = VELOCITY * BRIGHTNESS # Weight by the brightness to make a disk

        VELOCITY = VELOCITY.sum(axis=3)
        VELOCITY = VELOCITY.unsqueeze(1)
              
        VELOCITY = VELOCITY / VELOCITY.max(1)[0].max(1)[0].max(1)[0][:,None,None,None]
        VELOCITY = VELOCITY * MAX_VELOCITY
               
        # Mask output to match input shape ____________________________________
                
        VELOCITY[VELOCITY_INPUT==0] = 0
        
        V_MAX = VELOCITY.max(1)[0].max(1)[0].max(1)[0]
        
        return VELOCITY, V_MAX
    
    def test_encode(self,x):
        """ CREATE ENCODINGS FOR TESTING AND DEBUGGING """
        
        output = self.VELOCITY_encoder(x)
        output = self.encoder_linear(output)
        Z_ANGLE = self.recover_angle(output[:,0].clone(),output[:,1].clone())
        X_ANGLE = self.de_regularise(output[:,2].clone(),pi/18,pi/2)        
        V_SCALE_LENGTH = self.de_regularise(output[:,3].clone(),1,32)      
        MAX_VELOCITY = self.de_regularise(output[:,4].clone(),0.1,1)      
        output = torch.tensor([Z_ANGLE,X_ANGLE,V_SCALE_LENGTH,MAX_VELOCITY])
        return output
        
    def forward(self, x):
        """ CREATE SCALE_LENGTH FORWARD PASS THROUGH THE REQUIRED MODEL FUNCTIONS """
        
        output = self.VELOCITY_encoder(x)
        output = self.encoder_linear(output)
        output = self.cube_maker(output,x,shape=64)
        return output
    
#=============================================================================#
#/////////////////////////////////////////////////////////////////////////////#
#=============================================================================#

The training procedure is pretty straight forward:

import numpy as np
import os
import torch
from tqdm import tqdm

from model import CAE

os.environ["CUDA_VISIBLE_DEVICES"] = "0"
torch.cuda.benchmark=True
torch.cuda.fastest =True
device = torch.device("cuda")
torch.cuda.empty_cache()

#_____________________________________________________________________________#
#~~~ SET UP DIRECTORIES   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
#_____________________________________________________________________________#

save_directory = 'save_folder/'
model_directory = 'save_folder/models/'

#_____________________________________________________________________________#
#_____________________________________________________________________________#
    
def training(model:torch.nn.Module,batch_size,epochs,loss_function,initial_lr,
             save_dir=None,model_dir=None,gradients=False,testing=False):
    
    #_________________________________________________________________________#
    #~~~ SET UP THE MODEL   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
    #_________________________________________________________________________#
    
    model = model.to(device).to(torch.float)
    
    for epoch in range(epochs):
        
        print("Epoch {0} of {1}" .format( (epoch+1), epochs))
        optim = torch.optim.Adam(model.parameters(),learning_rate(initial_lr,epoch))

        #_____________________________________________________________________#
        #~~~ CREATE A BATCH  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
        #_____________________________________________________________________#
        
        results = return_cube(batch_size)
        mom1s = results[0]                        
        mom1s[mom1s!=mom1s]=0
        mom1s = mom1s.to(device).to(torch.float)        

        #_____________________________________________________________________#
        #~~~ TRAIN THE MODEL   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
        #_____________________________________________________________________#
        
        model.train(True) 
        
        running_loss = []
        
        for _ in tqdm(range(10)):
            
            prediction, vmax = model(mom1s)
            loss = loss_function(prediction, mom1s)
            prediction.retain_grad()
            optim.zero_grad(); loss.backward(); optim.step();
            running_loss.append(loss.detach().cpu())
                        
        print("\n Mean loss: %.6f" % np.mean(running_loss),
                  "\t Loss std: %.6f" % np.std(running_loss),
                  "\t Learning rate: %.6f:" % learning_rate(initial_lr,epoch))
        print('_'*73)
                 
        #_____________________________________________________________________#
        #~~~ CREATE ANY PLOTS   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
        #_____________________________________________________________________#
        
        if (save_dir is not None) and (epoch == epochs-1):
            plotter(prediction, mom1s, save_directory)
            
        #_____________________________________________________________________#
        #~~~ PRINT AVERAGE LAYER GRADIENTS   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
        #_____________________________________________________________________#
        
        if gradients is True:
            layers = []
            av_grads = []
            for n, p in model.named_parameters():
                if (p.requires_grad) and ("bias" not in n):
                    layers.append(n)
                    av_grads.append(p.grad.abs().mean())
            for i in range(len(layers)):
                print(layers[i],' grad: ',av_grads[i])
                         
        #_____________________________________________________________________#
        #~~~ SAVE THE MODEL   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
        #_____________________________________________________________________#
        
        if (model_dir is not None) and (epoch == epochs-1):
            torch.save(model.state_dict(),model_dir+'semantic_AE_.pt')  

        del mom1s
        
#_____________________________________________________________________________#
#_____________________________________________________________________________#
                
batch_size = 8

### RUN THE TRAINING PROCEDURE HERE AND PROVIDE THE NETWORK WITH THE REQUIRED
### AUXILIARY 3D X, Y AND Z TENSORS        

### Create the auxiliary arrays
l = torch.arange(0 - 63/2., (63/2.)+1)
yyy, xxx, zzz = torch.meshgrid(l,l,l)

xxx, yyy, zzz = xxx.repeat(batch_size,1,1,1), yyy.repeat(batch_size,1,1,1), zzz.repeat(batch_size,1,1,1)
xxx = xxx.to(device).to(torch.float)
yyy = yyy.to(device).to(torch.float)
zzz = zzz.to(device).to(torch.float)

#_____________________________________________________________________________#
#_____________________________________________________________________________#

model = CAE(5,xxx,yyy,zzz) # Instantiate the model with 5 learnable parameters

### Train the model
training(model,batch_size=batch_size,epochs=100,loss_function=torch.nn.MSELoss(),
         initial_lr=1e-4,model_dir=model_directory,save_dir=save_directory,gradients=False)

#_____________________________________________________________________________#
#_____________________________________________________________________________#

Using these homemade plotting functions / adaptive learning rates / helpful functions:

def plotter(predicted, true, out_dir):    
    """ PLOTTING THE PREDICTED AND TRUE VELOCITY FIELDS (AT END OF TRAINING) """
    
    predicted = predicted.detach().cpu().numpy()
    true = true.detach().cpu().numpy()

    for i in range(predicted.shape[0]):
        plt.figure()
        
        plt.subplot(121)
        true_i = true[i,0,:,:]
        plt.imshow(true_i,cmap='jet', vmax=true_i.max(), vmin=true_i.min())
        plt.colorbar(fraction=0.046, pad=0.04)
        plt.title('Predicted')
        
        plt.subplot(122)
        pred_i = predicted[i,0,:,:]
        plt.imshow(pred_i,cmap='jet', vmax=true_i.max(), vmin=true_i.min())
        plt.colorbar(fraction=0.046, pad=0.04)
        plt.colorbar('True')
        
        plt.tight_layout()
        plt.savefig(out_dir+str(i)+'.png')
    
    plt.close('all')
    
    return
def recover_pos(theta1,theta2):
    
    """ RECOVERING THE ANGLE AFTER A 3D ROTATION USING EULER MATRICES """
    
    angle = np.rad2deg(np.arctan2(theta1,theta2)) + 180
    
    return angle

The testing procedure is pretty simple too, only using the function test_encode in the model. I even wrote it in a light loop version in case the RAM is too small on the testing machine:

import numpy as np
import pandas as pd
import torch
from tqdm import tqdm

from model import CAE

batch_size = 1
l = torch.arange(0 - 63/2., (63/2.)+1)
yyy, xxx, zzz = torch.meshgrid(l,l,l)

xxx, yyy, zzz = xxx.repeat(batch_size,1,1,1), yyy.repeat(batch_size,1,1,1), zzz.repeat(batch_size,1,1,1)
xxx = xxx.to(torch.float)
yyy = yyy.to(torch.float)
zzz = zzz.to(torch.float)

model = CAE(5,xxx,yyy,zzz)
model.load_state_dict(torch.load(model_path+'semantic_AE_.pt'))
model = model.cpu()
model.train(False)
print("Model cast to CPU")

#_____________________________________________________________________________#
#_____________________________________________________________________________#

for s in tqdm(range(10)):

    batch_size = 100
    results = return_cube(batch_size)
    
    mom1s = results[0]
    mom1s[mom1s!=mom1s]=0
    mom1s = mom1s.to(torch.float)   
    
    target = torch.stack(results[1:],1)
    target = target.squeeze(-1).squeeze(-1).squeeze(-1)
    
    #_____________________________________________________________________________#
    #_____________________________________________________________________________#
    
    predictions = []
    for j in range(mom1s.shape[0]):  
                
        prediction = model.test_encode(mom1s[j].unsqueeze(0))
        prediction = prediction.detach().numpy()
        predictions.append(prediction)
    
    predictions = np.vstack(predictions)
    
    dfp = pd.DataFrame(predictions)
    dft = pd.DataFrame(np.vstack(target))
    
    df = pd.concat([dfp,dft],axis=1)
    df.columns = [0,1,2,3,4,5,6,7]
    
    df.to_pickle('pickle_files/'+str(np.random.uniform(0,1,1)[0])+'.pkl')

#_____________________________________________________________________________#
#_____________________________________________________________________________#

import glob
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
from tqdm import tqdm

#_____________________________________________________________________________#
#_____________________________________________________________________________#

path = 'pickle_files/'

#_____________________________________________________________________________#
#_____________________________________________________________________________#

names = [os.path.basename(x) for x in glob.glob(path+'*.pkl')]

#_____________________________________________________________________________#
#_____________________________________________________________________________#

fig, axs = plt.subplots(2, 2, figsize=(6,5))

for n in tqdm(names):

    df = pd.read_pickle(path+n)
    
    predictions = df[[0,1,2,3]].values.astype(float)
    target = df[[4,5,6,7]].values.astype(float)
    
    pos_t = np.rad2deg(target[:,0])
    pos_p = np.rad2deg(predictions[:,0])
        
    phi_t = np.rad2deg(target[:,1])
    phi_p = np.rad2deg(predictions[:,1])
    index = np.where(phi_t<25)[0]
    
    axs[0, 0].scatter(pos_t,pos_p,s=1,c='k')
    axs[0, 0].scatter(pos_t[index],pos_p[index],s=1,c='r')
    axs[0, 1].scatter(phi_t,phi_p,s=1,c='k')
    axs[1, 0].scatter(target[:,2], predictions[:,2],s=1,c='k')
    axs[1, 0].scatter(target[index,2], predictions[index,2],s=1,c='r')
    axs[1, 1].scatter(target[:,3],predictions[:,3],s=1,c='k')
    
axs[0, 0].set(xlabel= r'$\theta_{pos} \, (^{\circ}) $', ylabel= r'$\theta_{pos,pred} \, (^{\circ}) $')   
axs[0, 1].set(xlabel= r'$\phi_{inc} \, (^{\circ})$', ylabel= r'$\phi_{inc,pred} \, (^{\circ})$') 
axs[1, 1].set(xlabel= r'$v_{h} \, (km\,s^{-1})$', 
   ylabel= r'$v_{h,pred} \, (km\,s^{-1})$') 
axs[1, 0].set(xlabel= r'$a_{h}$', ylabel= r'$a_{h,pred}$') 
    
plt.tight_layout()
plt.savefig('semantic_AE_results.png')

I realise this is a lot of code but I’ve pulled this network apart for weeks and can’t seem to find a way to help it learn this one parameter, so any help on this matter would be really really appreciated!

I’ve checked that the gradients are not zero during training and seen that this parameter appears to be pushed higher and higher when using X_ANGLEs closer to 90 degrees which is weird.