Taining conv2d weights to "reorder" input?

I am experimenting with convolutions operating on tabular data. Since it’s not clear in which order the columns should be arranged, I would like the network to try and figure that out. I am having some success with the following method ( Though I could be mistaken in my approach), however it tends to “blend” columns together for a new set of features rather than “reorder” them.

Here is the basic idea…

Given 20 features and a batch size of 100 ( so 100 rows with 20 columns) I am using Conv2d with a kernel size=(1, 20).

# features.size() = (100, 20)
reorder_conv = torch.nn.Conv2d( 1, 20, kernel_size=(1, 20), stride=(1, 20), padding=0, padding_mode='circular')

# This creates a 20 channel matrix. 
#    (I believe each kernel creates a different combination of the input features)
X = reorder_conv( features.view( 1, 1, 100, 20))
# X now has torch.Size([1, 20, 100, 1])

# Next I permute the 20 chan matrix back into a 100x20 single chan matrix
X = X.permute( 0, 3, 2, 1)
# X now has torch.Size([1, 1, 100, 20])

I “think” this is training to reorganize the feature columns, however the weights do not tend to learn their way to a tensor consisting of all 0.0’s with a single 1.0 for a connected column so that it is effectively in a new position. They tend to blend the columns into a new feature.

Is there a way to help guide the weights of the kernel toward something like [ 0, 0, 0, …, 1, 0, 0], where the 1 position is learned?

I’m not sure if applying torch.nn.functional.softmax() or something similar directly to the weights would even work with autograd. And even that does not really push the vector to a single 1.0 value with all others 0.0.

Hopefully this makes some sense. I would love input from anyone with ideas. I realize I may be way out of my depth here and appreciate any light that can be shined on the problem.

for a more complete “simple” example. Here I am attempting to train the column order to organize by the sorted order of the mean of the sample rows…

features = torch.randn( 100, 20)
reorder_conv = torch.nn.Conv2d( 1, 20, kernel_size=(1, 20), stride=(1, 20), padding=0, padding_mode='circular')
values, idxs = features.mean( dim=0).sort()
target = values
loss_fn = torch.nn.MSELoss()
opt = torch.optim.SGD( params=reorder_conv.parameters(), lr=0.003)

for e in range(10000):
    opt.zero_grad()
    X = reorder_conv( features.view( 1, 1, 100, 20))
    #- Attempt to use softmax breaks
    # reorder_conv.weight = torch.nn.functional.softmax( reorder_conv.weight, dim=3)
    X = X.permute( 0, 3, 2, 1)
    X = X.view( features.size( 0), -1)
    
    pred_order = X.mean( dim=0)
    loss = loss_fn( pred_order, target)
    loss.backward()
    opt.step()
    print( f"Loss: {loss.item():0.5f}", end=f"{' '*20}\r")
    
print()
print( f"Target: {target.detach().numpy()}")
print( f"Result: {pred_order.detach().numpy()}")
print( f"Orig  : {features.mean( dim=0).detach().numpy()}")

Results:

Loss: 0.00005                    
Target: [-0.06228492 -0.05032158 -0.02876155 -0.02502438 -0.00089725  0.00232369
  0.00522456  0.00910649  0.01671628  0.01779666  0.01802603  0.01869513
  0.02516624  0.04797455  0.06386539  0.07115232  0.09398358  0.13428809
  0.14287527  0.15994984]
Result: [-0.06227251 -0.04998548 -0.02197709 -0.02850414  0.00563689  0.0082012
  0.00796885  0.01247622  0.00972207  0.0249887   0.0144345   0.01844426
  0.01896985  0.03998128  0.06636048  0.05916643  0.08550397  0.12027416
  0.1472161   0.15002634]
Orig  : [ 0.15994984  0.09398358  0.00910649  0.04797455  0.06386539  0.00522456
  0.01869513  0.01671628  0.00232369  0.14287527  0.07115232 -0.05032158
 -0.02502438 -0.02876155 -0.06228492 -0.00089725  0.13428809  0.01802603
  0.01779666  0.02516624]

As you can see, it is largely getting the order of values from min to max, but the final numbers indicate that it is still a blend of the values and not training to fully use the kernel to reorder. Here is a snippet final values of the conv2d Weights.

print( reorder_conv.weight)
________
Parameter containing:
tensor([[[[-0.2007, -0.0209,  0.1824,  0.1710,  0.0499, -0.2082, -0.2107,
            0.1627,  0.1834, -0.1608, -0.2226,  0.1542, -0.1373,  0.1696,
           -0.1324,  0.2185,  0.1937, -0.0396,  0.0933, -0.0566]]],
...
        [[[-0.1570,  0.2174,  0.0273, -0.0623,  0.0259, -0.1620, -0.1682,
           -0.1578, -0.0718,  0.0248, -0.0454,  0.0058,  0.1150, -0.0468,
            0.1550,  0.1193,  0.1163, -0.1642, -0.1911,  0.1388]]]],
       requires_grad=True)

You can use nn.Parameter() to make the softmax work.

reorder_conv.weight = nn.Parameter(torch.nn.functional.softmax( reorder_conv.weight, dim=3))

However, I tried using your code, and I only got random results (with and without the softmax).

Interesting. I did have to run the training loop for 10000 iterations rather than the 1000 which was originally in my example (I’ll change that).

using the nn.Parameter() method does allow the code to run but seems to train the weights all toward [0.5]

For those who are interested, or could perhaps point out some error in my approach or suggest something better, I have this version of the example which seems to be working. It does not produce the exact same average values for each column, however it’s close, and it does arrange the columns into the correct order.

I created a forward hook for my module where I find the argmax() of the weights for the conv2d layer, then I set all the layer’s weights to 0.0s and set the argmax positioned weight to 1.0

I’m also adding a mean and std error of the weight values to my final loss in an effort to help guide the optimizer.

Keep in mind, I’m mainly mashing things together to see what fits. I don’t pretend to know the reasons why it acts the way it does, so if someone could lay some knowledge on me I’d be quite grateful.

Imports:

import torch
import numpy as np

The TestSorter() Class Module:

class TestSorter(torch.nn.Module):
    def __init__(self, feature_set_size: int, n_gauge_samples: int = 20, **kwargs):
        super().__init__()
        self.feature_set_size = feature_set_size
        self.n_gauge_samples = n_gauge_samples
        self.reorder_conv = torch.nn.Conv2d( 1, self.n_gauge_samples, 
                                             kernel_size=(1, self.n_gauge_samples), 
                                             stride=(1, self.n_gauge_samples), 
                                             padding=0, 
                                             padding_mode='circular',
                                             bias=True)
        
        self.apply( st.torchmodulelib.weights_init_xavier_uniform)
        self.reorder_conv.weight = torch.nn.Parameter(  self.reorder_conv.weight)
        self.register_forward_hook( self._forward_hook)
        
    @property
    def device( self):
        return next( self.parameters()).device

    def forward(self, features):
        X = self.reorder_conv( features.view( 1, 1, 100, 20))
        X = X.permute( 0, 3, 2, 1)
        X = X.view( features.size( 0), -1)
    
        pred_order = X.mean( dim=0)
        return pred_order
    
        
    def _forward_hook( self, module, grad_input, grad_output ):
        
        w_max_idx = self.reorder_conv.weight.argmax( dim=3).flatten()
        rows = torch.tensor( np.arange( self.reorder_conv.weight.size(0))).to( self.device)
        self.reorder_conv.weight.data = torch.zeros_like( self.reorder_conv.weight, requires_grad=True)
        self.reorder_conv.weight.data[rows, :, :, w_max_idx[ rows]] = 1.0

Generate some features and targets…

features = torch.randn( 100, 20).to( DEVICE)
values, idxs = features.mean( dim=0).sort()
target = values

Create the model, initialize the optimizer and declare a loss_fn:

model = TestSorter( feature_set_size=100, n_gauge_samples=20)
model.to( 'cuda:0')
loss_fn = torch.nn.MSELoss()
opt = torch.optim.Adam( params=model.parameters(), lr=0.005)

The Training Loop:

n_epochs = 1000
for e in range(n_epochs):
    opt.zero_grad()
     
    '''
        Work out the mean and std loss now because the weights will be set to all 0.0's and 1.0's in the forwad hook.
    '''
    w_mean = model.reorder_conv.weight.mean()
    w_std = model.reorder_conv.weight.std()

    '''
        0.05 is the mean for a 20 vector tensor with a single 1.0 and all the rest 0.0
        0.2185 is the std for a 20 vector tensor with a single 1.0 and all the rest 0.0
    '''
    mean_loss = torch.abs( 0.05 - w_mean)
    std_loss = torch.abs( 0.2185 - w_std)  
    
    '''
        Tries to predict the mean of each column while arranging the columns in ascending order
    '''
    X = model( features)
    
    loss = loss_fn( X, target)
    loss = loss + mean_loss + std_loss 
    
    loss.backward()
    
    '''
       Make sure we don't run the opt.step() on the last time through the loop or else the weights will again become adjusted
    '''
    if e < n_epochs - 1:
        opt.step()

    print( f"Loss: {loss.item():0.5f} Mean: {w_mean.item():0.5f} mean_loss: {mean_loss.item():0.5f} " +
           f"Std: {w_std.item():0.5f} std_loss: {std_loss.item():0.5f} " , end=f"{' '*20}\r")
    
print("\n\n")
print( f"Target: {target.detach().cpu().numpy()}")
print( f"Result: {X.detach().cpu().numpy()}")
print( f"Orig  : {features.mean( dim=0).detach().cpu().numpy()}")
print("\n\n")
print( f"Target Sort Order: {target.sort()[1].detach().cpu().numpy()}")
print( f"Pred Sort Order  : {X.sort()[1].detach().cpu().numpy()}")
print( f"Original Order   : {features.mean( dim=0).sort()[1].detach().cpu().numpy()}")

The Results:

Loss: 0.00008 Mean: 0.05006 mean_loss: 0.00006 Std: 0.21852 std_loss: 0.00002                     


Target: [-0.21455501 -0.12505591 -0.11113831 -0.05616664 -0.04925698 -0.0406472
 -0.01422752 -0.01222966 -0.0091071   0.00256702  0.00666515  0.01152864
  0.0465787   0.05148562  0.06742238  0.075505    0.08516987  0.10046279
  0.12081704  0.13162994]
Result: [-0.21451415 -0.12503105 -0.111105   -0.05615368 -0.04922976 -0.04066283
 -0.01421581 -0.0121876  -0.00916505  0.00260628  0.00668178  0.01157945
  0.0466274   0.05152398  0.0674272   0.0755178   0.08516815  0.10047579
  0.12079199  0.13162452]
Orig  : [-0.12505591  0.08516987  0.13162994 -0.21455501  0.0465787  -0.05616664
  0.01152864  0.075505    0.10046279 -0.01222966 -0.11113831  0.00666515
  0.12081704 -0.01422752  0.06742238  0.00256702  0.05148562 -0.0406472
 -0.0091071  -0.04925698]



Target Sort Order: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
Pred Sort Order  : [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
Original Order   : [ 3  0 10  5 19 17 13  9 18 15 11  6  4 16 14  7  1  8 12  2]

You can see that it has successfully predicted the correct order of the columns resulting in columns rearranging such that the mean of each column is in ascending order. The final mean values are close, but not quite exact. My guess is this may have something to do with the conv2d() layer including bias, however when I removed that things didn’t work so well. My brain broke down at that point.

A Snippet of the final conv2d.weight values [ kernel=( 1,20)]:

[[[[0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]]
 [[[0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]]
 [[[0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]]
 [[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]]
...
 [[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]]]
 [[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]]
 [[[1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]]
 [[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]]]
1 Like