Deep learning model architecture for smoothing data

First of all, perhaps that this forum is more for technical related questions, if so my question probably does not fit. If so, let me know. I am more asking if there is someone that has experience regarding this, as I am a beginner.

What I am trying to do

I want to create a model that smoothes my predictions. My predictions have a shape [num samples, 4, 7], where 4 is the sequence length and 7 is the number of classes. The class values sum to 100.

However, my predictions often fluctuate, predicting for example a value of 50 for class 5 at time step 1, and 89 for time step 2. In reality, a class rarely makes such extreme fluctuations. So, I want to smooth my predictions,

I have training data that has a similar shape [num samples, 4, 7]. I want to create a model that learns the behavior of classes using this data, and then applies that on my predictions, hopefully smoothing my results.

I understand that I can just average out results and smooth like that, but I am curious if I can use a deep learning model that understand underlying probabilities and indirectly more corrects as well as smoothes the predictions.

What I have tried

However, I am struggling to understand how one creates such an architecture. I have tried working with matrices as well as with LSTM:

class SmoothModel(nn.Module):
    def __init__(self, input_size, output_size):
        super(SmoothModel, self).__init__()
        self.input_size = input_size
        self.output_size = output_size

        # Initialize the cooccurrence matrix as a learnable parameter
        self.cooccurrence = nn.Parameter(torch.randn(input_size, output_size))
        
        # Initialize the transition matrix as a learnable parameter
        self.transition = nn.Parameter(torch.randn(input_size, output_size))

        # Softmax layer
        self.softmax = nn.Softmax(dim=-1)

    def forward(self, x):      
        sequences_updated = []

        # Update sequence based on transition and cooccurence matrix
        for i in range(x.shape[0]):

            # Cooccurence multiplication
            seq_list = []
            for j in range(x.shape[1]):
                predicted_cooc = x[i, j, :].unsqueeze(0) # shape [1, 7]
                updated_cooc = torch.matmul(predicted_cooc, self.cooccurrence)
                seq_list.append(updated_cooc)

            # Create to a sequence of 4 again, where the cooccurence is updated 
            seq = torch.cat(seq_list, dim=0) # create shape [4, 7]

            # Transition multiplication
            updated_seq = torch.matmul(seq, self.transition) # shape [4, 7]

            # Append the updated sequence
            sequences_updated.append(updated_seq.unsqueeze(0)) # append shape [1, 4, 7]

        # Create tensor with all updated sequences    
        updated_tensor = torch.cat(sequences_updated, dim=0) # dim = 0 is the number of samples
        
        # Output should sum to 100
        updated_tensor = self.softmax(updated_tensor) * 100
        
        return updated_tensor

My idea behind this model was that it would update my predictions based on learned cooccurrence and transition probabilities.

Another model I tried, but with LSTM:

class SmoothModel(nn.Module):
    def __init__(self, input_size, output_size, hidden_size = 64):
        super(SmoothModel, self).__init__()
        self.input_size = input_size
        self.output_size = output_size

        # Initialize the cooccurrence as a learnable parameter
       self.cooccurrence = nn.Linear(input_size, output_size)
        
        # Initialize the transition probability as a learnable parameter
        self.transition = nn.LSTM(input_size, hidden_size)
        self.transition_probability = nn.Linear(hidden_size, output_size)

        # Softmax layer
        self.softmax = nn.Softmax(dim=-1)

    def forward(self, x):      
        sequences_updated = []

        # Update sequence based on transition and cooccurence matrix
        for i in range(x.shape[0]):

            # Cooccurence multiplication
            seq_list = []
            for j in range(x.shape[1]):
                predicted_cooc = x[i, j, :].unsqueeze(0) # shape [1, 7]
                updated_cooc = self.cooccurrence(predicted_cooc)
                seq_list.append(updated_cooc)

            # Create to a sequence of 4 again, where the cooccurence is updated 
            seq = torch.cat(seq_list, dim=0) # create shape [4, 7]

            # Transition probability
            _, (hidden, _) = self.transition(seq.unsqueeze(0)) # shape [1, 4, hidden size]
            updated_seq = self.transition_probability(hidden[-1, :, :].unsqueeze(0)) # shape [1, 4, 7]

            # Append the updated sequence
            sequences_updated.append(updated_seq) # append

        # Create tensor with all updated sequences    
        updated_tensor = torch.cat(sequences_updated, dim=0) # dim = 0 is the number of samples
        
        # Output should sum to 100
        updated_tensor = self.softmax(updated_tensor) * 100
        
        return updated_tensor

I furthermore tried some variants on this, for example only updating time step per timestep and sort of Markov Chain theory. But current models don’t improve results.

Question

Does anyone have experience regarding this / know what theory/architecture I could be using? Or should I look at it a total different way?

I am happy to provide further (data) information if necessary!