I’m creating an autoencoder that uses convolutions to learn some parameters, and then uses native PyTorch functions to re-create the input. But when I test the network I find that it fails to learn one of the parameters used to create the input images completely.
I was wondering if anyone could see why this would happen?
Here is what the testing true vs. prediction plots look like for the 4 parameters I care about. The bottom left is the one that appears to go weird (and is called V_SCALE_LENGTH
in the model below).
NOTE: I also see that the top righ panel shows a parameter that is not well learned but that’s okay as this is a smaller part of a larger network that I’ve isolated into this example because of the problem of the bottom left parameter. I also know that in the model below, the bottom left parameter needs to use both the right-hand parameters above to create recreate the input. But I still see this parameter not learning when I tighten up that parameter in the top right.
In my full-sized network I see much poorer performance than this where the parameter goes to the maximum value allowed by the hard tanh activation function at the end of the linear module. At least here I’m seeing a sort-of increase from left to right on the true vs predicted plot. Which makes me think I’m missing something in the data handling that maybe someone can see…
The 2D input images are created using the following function:
def return_cube(batch):
""" CREATION OF CUBES WITH VARIOUS ORIENTATIONS """
# Define the properties of the objects in the input cubes _________________
# Z angle
pos_ang = torch.tensor([np.random.uniform(-np.pi,np.pi,batch)]).view(-1)[:,None,None,None]
# X angle
inc_ang = torch.tensor([np.random.uniform(np.pi/18,np.pi/2,batch)]).view(-1)[:,None,None,None]
# Where the velocity reaches it's highes. i.e. r(max(V))
ah = torch.tensor([np.random.uniform(1,32,batch)]).view(-1)[:,None,None,None]
# The maximum velocity reached. i.e. max(V(r)) <-- THIS IS THE TROUBLE MAKER!
Vh = torch.tensor([np.random.uniform(0.1,1,batch)]).view(-1)[:,None,None,None]
batch_size = batch # The input batch size to the network
# Create radius cubes _____________________________________________________
l = torch.arange(0 - 63/2., (63/2.)+1)
yyy, xxx, zzz = torch.meshgrid(l,l,l)
xxx, yyy, zzz = xxx.repeat(batch_size,1,1,1), yyy.repeat(batch_size,1,1,1), zzz.repeat(batch_size,1,1,1)
# Rotate the cubes ________________________________________________________
rr, xx, yy, RADIUS_XY, zz = rotation_all(xxx,yyy,zzz,inc_ang,pos_ang)
# Create BRIGHTNESS cube to make the object a thin disk rather than _______
# a sphere ________________________________________________________________
BRIGHTNESS = torch.exp(-RADIUS_XY/16) * torch.exp(-torch.abs(zz)/2)
# NOTE ____________________________________________________________________
# 16 is arbitrarily chosen to allow the network to learn Z_ANGLE better
# 2 is chosen to make the z brightness drop off really fast to emulate
# a thin disk
# _________________________________________________________________________
# Get the max radii in the X and Y cubes __________________________________
xmax = xx.max(1)[0].max(1)[0].max(1)[0][:,None,None,None]
ymax = yy.max(1)[0].max(1)[0].max(1)[0][:,None,None,None]
# Create the Velocity cubes V(r) __________________________________________
v_rr = (2/pi) * (1.2732395447351628) * torch.atan((pi*rr)/ah) # Normalised atan function
# Change Velocity cubes to line-of-sight velocity cubes along the z axis __
thetas = torch.atan2(yy/ymax, xx/xmax) - pi/2
thetas = torch.sin(thetas)
thetas = thetas * torch.sin(inc_ang)
VELOCITY = v_rr * Vh * thetas
# Weight by the brightness cubes to make a thin disk not a sphere _________
VELOCITY = VELOCITY * BRIGHTNESS
# Flatten along the Z axis to make the 2D velocity image __________________
VELOCITY = VELOCITY.sum(axis=3)
VELOCITY = VELOCITY.unsqueeze(1)
# Normalise between 0 and 1 (across dataset not individually) _____________
VELOCITY = VELOCITY / VELOCITY.max(1)[0].max(1)[0].max(1)[0][:,None,None,None]
VELOCITY *= Vh
return VELOCITY,pos_ang,inc_ang,ah,Vh
And the model looks as follows:
import torch
from math import pi as pi
#=============================================================================#
#/////////////////////////////////////////////////////////////////////////////#
#=============================================================================#
class CAE(torch.nn.Module):
""" PYTORCH CONVOLUTIONAL AUTOENCODER WITH FUNCTIONAL DECODER """
def __init__(self, nodes, xxx=None, yyy=None, zzz=None):
super().__init__()
self.xxx = xxx
self.yyy = yyy
self.zzz = zzz
self.nodes = nodes
self.conv1 = torch.nn.Conv2d(1,16,3,1,padding=1)
self.conv2 = torch.nn.Conv2d(16,32,3,1,padding=1)
self.conv3 = torch.nn.Conv2d(32,64,3,1,padding=1)
self.conv4 = torch.nn.Conv2d(64,128,3,padding=1)
self.pool = torch.nn.MaxPool2d(2)
self.lc1 = torch.nn.Linear(2048,256)
self.lc2 = torch.nn.Linear(256,self.nodes)
self.relu = torch.nn.ReLU()
self.hardtanh = torch.nn.Hardtanh(min_val=-1,max_val=1.)
def VELOCITY_encoder(self,x):
""" FEATURE EXTRACTION THROUGH CONVOLUTIONAL LAYERS """
x = self.conv1(x)
x = self.pool(x)
x = self.conv2(x)
x = self.relu(x)
x = self.pool(x)
x = self.conv3(x)
x = self.relu(x)
x = self.pool(x)
x = self.conv4(x)
x = self.relu(x)
x = self.pool(x)
x = x.view(int(x.size()[0]),-1)
return x
def encoder_linear(self,x):
""" LINEARLY CONNECTED LAYERS FOR REGRESSION OF REQUIRED PARAMETERS """
x = self.lc1(x)
x = self.relu(x)
x = self.lc2(x)
x = self.hardtanh(x)
return x
def recover_angle(self, theta_1, theta_2):
""" RECOVERING THE ANGLE AFTER SCALE_LENGTH 2D ROTATION USING EULER MATRICES """
pos = torch.atan2(theta_1,theta_2)
return pos
def create_velocity_values(self, radius_tensor):
""" GET THE LINE OF SIGHT VELOCITYOCITIES FOR EACH VOXEL IN THE RADIUS CUBE """
vel = (2/pi)*(1.273239)*(torch.atan(pi*radius_tensor)) # 1.273... = 1/arctan(1)
return vel
def de_regularise(self,tensor,minimum,maximum):
""" RECOVER THE PARAMETERS OUT OF THE NORMALISED RANGE [-1, 1] """
tensor=tensor+1.
tensor=tensor*(maximum-minimum)/2.
tensor=tensor+minimum
return tensor
def rotation_all(self,x,y,z,X_ANGLE,Z_ANGLE):
""" SCALE_LENGTH function for rotating 3d particles about the z and then x axes """
_theta = Z_ANGLE + pi/2 # Correct Z_ANGLE for Pythonic rotations
xmax = x.max(1)[0].max(1)[0].max(1)[0][:,None,None,None]
ymax = y.max(1)[0].max(1)[0].max(1)[0][:,None,None,None]
xx = ((x/xmax)*torch.cos(_theta) - (y/ymax)*torch.sin(_theta))*xmax # Z-Rotation of x coords
_y = ((x/xmax)*torch.sin(_theta) + (y/ymax)*torch.cos(_theta))*ymax # Z-Rotation of y coords
_ymax = _y.max(1)[0].max(1)[0].max(1)[0][:,None,None,None]
zmax = z.max(1)[0].max(1)[0].max(1)[0][:,None,None,None]
yy = ((_y/_ymax)*torch.cos(X_ANGLE) - (z/zmax)*torch.sin(X_ANGLE))*_ymax # X-Rotation of y coords
zz = ((_y/_ymax)*torch.sin(X_ANGLE) + (z/zmax)*torch.cos(X_ANGLE))*zmax # X-Rotation of z coords
RADIUS_ARRAY = torch.sqrt((xx**2)+(yy**2)+(zz**2)) # Get the cube of radii R(x,y,z)
XY_RADIUS_ARRAY = torch.sqrt((xx**2)+(yy**2)) # Get the cube of xy radii only R(x,y)
return RADIUS_ARRAY, xx, _y, XY_RADIUS_ARRAY, torch.abs(zz)
def cube_maker(self, x, VELOCITY_INPUT, shape):
""" GENERATE THE OUTPUT CUBE USING PYTORCH FUNCTIONS """
Z_ANGLE = self.recover_angle(x[:,0].clone(),x[:,1].clone())[:,None,None,None] ### Poition angle
X_ANGLE = self.de_regularise(x[:,2].clone(),pi/18,pi/2)[:,None,None,None] ### Inclination
V_SCALE_LENGTH = self.de_regularise(x[:,3].clone(),1,32)[:,None,None,None] ### DM halo scale length
MAX_VELOCITY = self.de_regularise(x[:,4].clone(),0.1,1)[:,None,None,None] ### Maximum velocity allowed
# Create radius cube __________________________________________________
rr_t_v, xx, yy, XY_RADIUS_ARRAY, zz = self.rotation_all(self.xxx,self.yyy,self.zzz,X_ANGLE,Z_ANGLE)
# Create BRIGHTNESS cube to make the object __________________________
# a thin disk rather than a sphere ____________________________________
BRIGHTNESS = torch.exp(-XY_RADIUS_ARRAY/16) * torch.exp(-torch.abs(zz)/2)
# NOTE ________________________________________________________________
# 16 is arbitrarily chosen to allow the network to learn Z_ANGLE better
# 2 is chosen to make the z brightness drop off really fast to emulate
# a thin disk
# _____________________________________________________________________
# Create VELOCITY cube ________________________________________________
xmax = xx.max(1)[0].max(1)[0].max(1)[0][:,None,None,None]
ymax = yy.max(1)[0].max(1)[0].max(1)[0][:,None,None,None]
# Convert to line-of-sight VELOCITY along axis_3 ______________________
VELOCITY = self.create_velocity_values(rr_t_v/V_SCALE_LENGTH)
thetas = torch.atan2(yy/ymax, xx/xmax) - pi/2
thetas = torch.sin(thetas)
thetas = thetas * torch.sin(X_ANGLE)
VELOCITY = VELOCITY * thetas
# Create velocity map _________________________________________________
VELOCITY = VELOCITY * BRIGHTNESS # Weight by the brightness to make a disk
VELOCITY = VELOCITY.sum(axis=3)
VELOCITY = VELOCITY.unsqueeze(1)
VELOCITY = VELOCITY / VELOCITY.max(1)[0].max(1)[0].max(1)[0][:,None,None,None]
VELOCITY = VELOCITY * MAX_VELOCITY
# Mask output to match input shape ____________________________________
VELOCITY[VELOCITY_INPUT==0] = 0
V_MAX = VELOCITY.max(1)[0].max(1)[0].max(1)[0]
return VELOCITY, V_MAX
def test_encode(self,x):
""" CREATE ENCODINGS FOR TESTING AND DEBUGGING """
output = self.VELOCITY_encoder(x)
output = self.encoder_linear(output)
Z_ANGLE = self.recover_angle(output[:,0].clone(),output[:,1].clone())
X_ANGLE = self.de_regularise(output[:,2].clone(),pi/18,pi/2)
V_SCALE_LENGTH = self.de_regularise(output[:,3].clone(),1,32)
MAX_VELOCITY = self.de_regularise(output[:,4].clone(),0.1,1)
output = torch.tensor([Z_ANGLE,X_ANGLE,V_SCALE_LENGTH,MAX_VELOCITY])
return output
def forward(self, x):
""" CREATE SCALE_LENGTH FORWARD PASS THROUGH THE REQUIRED MODEL FUNCTIONS """
output = self.VELOCITY_encoder(x)
output = self.encoder_linear(output)
output = self.cube_maker(output,x,shape=64)
return output
#=============================================================================#
#/////////////////////////////////////////////////////////////////////////////#
#=============================================================================#
The training procedure is pretty straight forward:
import numpy as np
import os
import torch
from tqdm import tqdm
from model import CAE
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
torch.cuda.benchmark=True
torch.cuda.fastest =True
device = torch.device("cuda")
torch.cuda.empty_cache()
#_____________________________________________________________________________#
#~~~ SET UP DIRECTORIES ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
#_____________________________________________________________________________#
save_directory = 'save_folder/'
model_directory = 'save_folder/models/'
#_____________________________________________________________________________#
#_____________________________________________________________________________#
def training(model:torch.nn.Module,batch_size,epochs,loss_function,initial_lr,
save_dir=None,model_dir=None,gradients=False,testing=False):
#_________________________________________________________________________#
#~~~ SET UP THE MODEL ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
#_________________________________________________________________________#
model = model.to(device).to(torch.float)
for epoch in range(epochs):
print("Epoch {0} of {1}" .format( (epoch+1), epochs))
optim = torch.optim.Adam(model.parameters(),learning_rate(initial_lr,epoch))
#_____________________________________________________________________#
#~~~ CREATE A BATCH ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
#_____________________________________________________________________#
results = return_cube(batch_size)
mom1s = results[0]
mom1s[mom1s!=mom1s]=0
mom1s = mom1s.to(device).to(torch.float)
#_____________________________________________________________________#
#~~~ TRAIN THE MODEL ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
#_____________________________________________________________________#
model.train(True)
running_loss = []
for _ in tqdm(range(10)):
prediction, vmax = model(mom1s)
loss = loss_function(prediction, mom1s)
prediction.retain_grad()
optim.zero_grad(); loss.backward(); optim.step();
running_loss.append(loss.detach().cpu())
print("\n Mean loss: %.6f" % np.mean(running_loss),
"\t Loss std: %.6f" % np.std(running_loss),
"\t Learning rate: %.6f:" % learning_rate(initial_lr,epoch))
print('_'*73)
#_____________________________________________________________________#
#~~~ CREATE ANY PLOTS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
#_____________________________________________________________________#
if (save_dir is not None) and (epoch == epochs-1):
plotter(prediction, mom1s, save_directory)
#_____________________________________________________________________#
#~~~ PRINT AVERAGE LAYER GRADIENTS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
#_____________________________________________________________________#
if gradients is True:
layers = []
av_grads = []
for n, p in model.named_parameters():
if (p.requires_grad) and ("bias" not in n):
layers.append(n)
av_grads.append(p.grad.abs().mean())
for i in range(len(layers)):
print(layers[i],' grad: ',av_grads[i])
#_____________________________________________________________________#
#~~~ SAVE THE MODEL ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
#_____________________________________________________________________#
if (model_dir is not None) and (epoch == epochs-1):
torch.save(model.state_dict(),model_dir+'semantic_AE_.pt')
del mom1s
#_____________________________________________________________________________#
#_____________________________________________________________________________#
batch_size = 8
### RUN THE TRAINING PROCEDURE HERE AND PROVIDE THE NETWORK WITH THE REQUIRED
### AUXILIARY 3D X, Y AND Z TENSORS
### Create the auxiliary arrays
l = torch.arange(0 - 63/2., (63/2.)+1)
yyy, xxx, zzz = torch.meshgrid(l,l,l)
xxx, yyy, zzz = xxx.repeat(batch_size,1,1,1), yyy.repeat(batch_size,1,1,1), zzz.repeat(batch_size,1,1,1)
xxx = xxx.to(device).to(torch.float)
yyy = yyy.to(device).to(torch.float)
zzz = zzz.to(device).to(torch.float)
#_____________________________________________________________________________#
#_____________________________________________________________________________#
model = CAE(5,xxx,yyy,zzz) # Instantiate the model with 5 learnable parameters
### Train the model
training(model,batch_size=batch_size,epochs=100,loss_function=torch.nn.MSELoss(),
initial_lr=1e-4,model_dir=model_directory,save_dir=save_directory,gradients=False)
#_____________________________________________________________________________#
#_____________________________________________________________________________#
Using these homemade plotting functions / adaptive learning rates / helpful functions:
def plotter(predicted, true, out_dir):
""" PLOTTING THE PREDICTED AND TRUE VELOCITY FIELDS (AT END OF TRAINING) """
predicted = predicted.detach().cpu().numpy()
true = true.detach().cpu().numpy()
for i in range(predicted.shape[0]):
plt.figure()
plt.subplot(121)
true_i = true[i,0,:,:]
plt.imshow(true_i,cmap='jet', vmax=true_i.max(), vmin=true_i.min())
plt.colorbar(fraction=0.046, pad=0.04)
plt.title('Predicted')
plt.subplot(122)
pred_i = predicted[i,0,:,:]
plt.imshow(pred_i,cmap='jet', vmax=true_i.max(), vmin=true_i.min())
plt.colorbar(fraction=0.046, pad=0.04)
plt.colorbar('True')
plt.tight_layout()
plt.savefig(out_dir+str(i)+'.png')
plt.close('all')
return
def recover_pos(theta1,theta2):
""" RECOVERING THE ANGLE AFTER A 3D ROTATION USING EULER MATRICES """
angle = np.rad2deg(np.arctan2(theta1,theta2)) + 180
return angle
The testing procedure is pretty simple too, only using the function test_encode
in the model. I even wrote it in a light loop version in case the RAM is too small on the testing machine:
import numpy as np
import pandas as pd
import torch
from tqdm import tqdm
from model import CAE
batch_size = 1
l = torch.arange(0 - 63/2., (63/2.)+1)
yyy, xxx, zzz = torch.meshgrid(l,l,l)
xxx, yyy, zzz = xxx.repeat(batch_size,1,1,1), yyy.repeat(batch_size,1,1,1), zzz.repeat(batch_size,1,1,1)
xxx = xxx.to(torch.float)
yyy = yyy.to(torch.float)
zzz = zzz.to(torch.float)
model = CAE(5,xxx,yyy,zzz)
model.load_state_dict(torch.load(model_path+'semantic_AE_.pt'))
model = model.cpu()
model.train(False)
print("Model cast to CPU")
#_____________________________________________________________________________#
#_____________________________________________________________________________#
for s in tqdm(range(10)):
batch_size = 100
results = return_cube(batch_size)
mom1s = results[0]
mom1s[mom1s!=mom1s]=0
mom1s = mom1s.to(torch.float)
target = torch.stack(results[1:],1)
target = target.squeeze(-1).squeeze(-1).squeeze(-1)
#_____________________________________________________________________________#
#_____________________________________________________________________________#
predictions = []
for j in range(mom1s.shape[0]):
prediction = model.test_encode(mom1s[j].unsqueeze(0))
prediction = prediction.detach().numpy()
predictions.append(prediction)
predictions = np.vstack(predictions)
dfp = pd.DataFrame(predictions)
dft = pd.DataFrame(np.vstack(target))
df = pd.concat([dfp,dft],axis=1)
df.columns = [0,1,2,3,4,5,6,7]
df.to_pickle('pickle_files/'+str(np.random.uniform(0,1,1)[0])+'.pkl')
#_____________________________________________________________________________#
#_____________________________________________________________________________#
import glob
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
from tqdm import tqdm
#_____________________________________________________________________________#
#_____________________________________________________________________________#
path = 'pickle_files/'
#_____________________________________________________________________________#
#_____________________________________________________________________________#
names = [os.path.basename(x) for x in glob.glob(path+'*.pkl')]
#_____________________________________________________________________________#
#_____________________________________________________________________________#
fig, axs = plt.subplots(2, 2, figsize=(6,5))
for n in tqdm(names):
df = pd.read_pickle(path+n)
predictions = df[[0,1,2,3]].values.astype(float)
target = df[[4,5,6,7]].values.astype(float)
pos_t = np.rad2deg(target[:,0])
pos_p = np.rad2deg(predictions[:,0])
phi_t = np.rad2deg(target[:,1])
phi_p = np.rad2deg(predictions[:,1])
index = np.where(phi_t<25)[0]
axs[0, 0].scatter(pos_t,pos_p,s=1,c='k')
axs[0, 0].scatter(pos_t[index],pos_p[index],s=1,c='r')
axs[0, 1].scatter(phi_t,phi_p,s=1,c='k')
axs[1, 0].scatter(target[:,2], predictions[:,2],s=1,c='k')
axs[1, 0].scatter(target[index,2], predictions[index,2],s=1,c='r')
axs[1, 1].scatter(target[:,3],predictions[:,3],s=1,c='k')
axs[0, 0].set(xlabel= r'$\theta_{pos} \, (^{\circ}) $', ylabel= r'$\theta_{pos,pred} \, (^{\circ}) $')
axs[0, 1].set(xlabel= r'$\phi_{inc} \, (^{\circ})$', ylabel= r'$\phi_{inc,pred} \, (^{\circ})$')
axs[1, 1].set(xlabel= r'$v_{h} \, (km\,s^{-1})$',
ylabel= r'$v_{h,pred} \, (km\,s^{-1})$')
axs[1, 0].set(xlabel= r'$a_{h}$', ylabel= r'$a_{h,pred}$')
plt.tight_layout()
plt.savefig('semantic_AE_results.png')
I realise this is a lot of code but I’ve pulled this network apart for weeks and can’t seem to find a way to help it learn this one parameter, so any help on this matter would be really really appreciated!
I’ve checked that the gradients are not zero during training and seen that this parameter appears to be pushed higher and higher when using X_ANGLEs closer to 90 degrees which is weird.