Terrible performance in simple linear regression from sequence

dayala · January 20, 2022, 3:08pm

Hello! I have been doing some research regarding plane trajectories prediction. I want to test the performance of different network architectures with points taken every 30/60/90/120 seconds. In particular, from a sequence of the last n increments in three features (longitude, latitude, and height), I try to predict the next increment. Since in most examples the plane is following a relatively stright line, I expected results to be good with the exception of some complex cases where the plane is turning or doing some maneuver. However, the results are terrible overall.
Here is an example of input (sequence of 4 elements, each with 3 features):

xs_batch[0]
tensor([[-0.0156, -0.0226, -0.0750],
        [-0.0146, -0.0226, -0.0750],
        [-0.0161, -0.0226, -0.0750],
        [-0.0139, -0.0226, -0.0750]], device='cuda:0')

And an example of intended output (prediction of the next value for the 3 features):

ys_batch[0]
tensor([-0.0147, -0.0226, -0.0750], device='cuda:0')

I have tested first with a simple network that should at the very least learn that the predicted increment is very similar to the last one in the sequence:

# Function for creating generic blocks of linear layers with activations and optional dropout
def create_dense_block(input_size, output_size, hidden_sizes:list, dropout_rate=0.1):
    layers = []
    input_sizes = [input_size,] + hidden_sizes
    output_sizes = hidden_sizes + [output_size,]

    for iz, oz in zip(input_sizes, output_sizes):
        if(dropout_rate > 0):
            layers.append(nn.Dropout(dropout_rate))
        layers.append(nn.Linear(iz, oz))
        layers.append(nn.Tanh())
    return nn.Sequential(*layers)

# The feedforward architecture
class DenseNetwork(nn.Module): 
    def __init__(self, number_features, sequence_size):
        super(DenseNetwork, self).__init__()
        # block of linear layers that ends with an output of size 100
        self.fc1 = create_dense_block(number_features*sequence_size, 100, [500,500,400,200], dropout_rate=0)
        # Final linear layer without activation at the end to allow negative results
        self.fc2 = nn.Linear(100, number_features)

    def forward(self, x:torch.Tensor):
        shape = x.shape
        # reshaping to turn sequences of x elements with y features into x*y features
        x = x.reshape((shape[0], -1))
        x = self.fc1(x)
        x = self.fc2(x)
        return x

The training code is here:

    model = architectures.DenseNetwork(len(features_x), seq_size)
    loss_function = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), learning_rate)
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, verbose=True, threshold=1e-2)
    model.to(device)
    model.train()

    for epoch in range(num_epochs):
        for batch in loader_training:
            xs_batch:torch.Tensor = batch["inputs"]
            ys_batch:torch.Tensor = batch["labels"]
            xs_batch = xs_batch.to(device).float()
            ys_batch = ys_batch.to(device).float()
            model.zero_grad()
            out = model(xs_batch)
            loss = loss_function(out, ys_batch)
            loss.backward()
            optimizer.step()

The training data contains mora than 10.000 examples. I have tested with more and less layers, smaller and bigger, with higher and lesser learning rate, confirming that the shapes of the output and at every single step of the network are correct:

xs_batch.shape
torch.Size([128, 4, 3])

ys_batch.shape
torch.Size([128, 3])

out.shape
torch.Size([128, 3])

And yet the results are quite bad and bizarre at times, so I wonder if there is some detail related to parameters or similar that I’m missing.

KFrank · January 20, 2022, 4:19pm

Hi Daniel!

dayala:

However, the results are terrible overall.
…

def create_dense_block(input_size, output_size, hidden_sizes:list, dropout_rate=0.1):
    layers = []
    input_sizes = [input_size,] + hidden_sizes
    output_sizes = hidden_sizes + [output_size,]

    for iz, oz in zip(input_sizes, output_sizes):
        if(dropout_rate > 0):
            layers.append(nn.Dropout(dropout_rate))
        layers.append(nn.Linear(iz, oz))
        layers.append(nn.Tanh())

# The feedforward architecture

As written, create_dense_block() doesn’t return anything.

class DenseNetwork(nn.Module): 
        ...
        self.fc1 = create_dense_block(number_features*sequence_size, 100, [500,500,400,200], dropout_rate=0)
        ...

    def forward(self, x:torch.Tensor):
        ...
        x = self.fc1(x)

I’m surprised that your code runs rather than throwing an error.

You assign self.fc1 to the “result” of a no-return function, so
self.fc1 should have None assigned to it. So I would expect
x = self.fc1(x) to throw an error because you can’t call None
as if it were a function.

Best.

K. Frank

dayala · January 20, 2022, 4:53pm

It seems that i accidentally deleted it qhiñe formatting the code. There is a “return nn.Sequential(*layers)” at the end. Editing original mesaage to include it. Thanks for the notice.

googlebot · January 20, 2022, 5:31pm

I haven’t noticed any problems. Try SGD optimizer with big initial lr and ExponentialLR scheduler (there is no scheduler.step() in your code btw). Or optimizing without mini-batches (Rprop or LBFGS optimizers).

dayala · January 20, 2022, 7:32pm

No luck, the result is pretty much the same.

Also I will add: all features being used, as well as the predicted values, are normalised

googlebot · January 20, 2022, 9:12pm

well, I’d first plot training loss to ensure it is decreasing epoch-to-epoch and maybe identify failure type from it (too slow or too fast learning, bad lr schedule, tiny gradients [not evident per se, need deeper inspection] or code problems)

dayala · January 26, 2022, 1:06pm

Found the issue. Some samples had really anomalous values that had not been removed, caused a huge loss and influenced the training.

Thanks for all the replies!