First of all, perhaps that this forum is more for technical related questions, if so my question probably does not fit. If so, let me know. I am more asking if there is someone that has experience regarding this, as I am a beginner.

#### What I am trying to do

I want to create a model that smoothes my predictions. My predictions have a shape [num samples, 4, 7], where 4 is the sequence length and 7 is the number of classes. The class values sum to 100.

However, my predictions often fluctuate, predicting for example a value of 50 for class 5 at time step 1, and 89 for time step 2. In reality, a class rarely makes such extreme fluctuations. So, I want to smooth my predictions,

I have training data that has a similar shape [num samples, 4, 7]. I want to create a model that learns the behavior of classes using this data, and then applies that on my predictions, hopefully smoothing my results.

I understand that I can just average out results and smooth like that, but I am curious if I can use a deep learning model that understand underlying probabilities and indirectly more corrects as well as smoothes the predictions.

#### What I have tried

However, I am struggling to understand how one creates such an architecture. I have tried working with matrices as well as with LSTM:

```
class SmoothModel(nn.Module):
def __init__(self, input_size, output_size):
super(SmoothModel, self).__init__()
self.input_size = input_size
self.output_size = output_size
# Initialize the cooccurrence matrix as a learnable parameter
self.cooccurrence = nn.Parameter(torch.randn(input_size, output_size))
# Initialize the transition matrix as a learnable parameter
self.transition = nn.Parameter(torch.randn(input_size, output_size))
# Softmax layer
self.softmax = nn.Softmax(dim=-1)
def forward(self, x):
sequences_updated = []
# Update sequence based on transition and cooccurence matrix
for i in range(x.shape[0]):
# Cooccurence multiplication
seq_list = []
for j in range(x.shape[1]):
predicted_cooc = x[i, j, :].unsqueeze(0) # shape [1, 7]
updated_cooc = torch.matmul(predicted_cooc, self.cooccurrence)
seq_list.append(updated_cooc)
# Create to a sequence of 4 again, where the cooccurence is updated
seq = torch.cat(seq_list, dim=0) # create shape [4, 7]
# Transition multiplication
updated_seq = torch.matmul(seq, self.transition) # shape [4, 7]
# Append the updated sequence
sequences_updated.append(updated_seq.unsqueeze(0)) # append shape [1, 4, 7]
# Create tensor with all updated sequences
updated_tensor = torch.cat(sequences_updated, dim=0) # dim = 0 is the number of samples
# Output should sum to 100
updated_tensor = self.softmax(updated_tensor) * 100
return updated_tensor
```

My idea behind this model was that it would update my predictions based on learned cooccurrence and transition probabilities.

Another model I tried, but with LSTM:

```
class SmoothModel(nn.Module):
def __init__(self, input_size, output_size, hidden_size = 64):
super(SmoothModel, self).__init__()
self.input_size = input_size
self.output_size = output_size
# Initialize the cooccurrence as a learnable parameter
self.cooccurrence = nn.Linear(input_size, output_size)
# Initialize the transition probability as a learnable parameter
self.transition = nn.LSTM(input_size, hidden_size)
self.transition_probability = nn.Linear(hidden_size, output_size)
# Softmax layer
self.softmax = nn.Softmax(dim=-1)
def forward(self, x):
sequences_updated = []
# Update sequence based on transition and cooccurence matrix
for i in range(x.shape[0]):
# Cooccurence multiplication
seq_list = []
for j in range(x.shape[1]):
predicted_cooc = x[i, j, :].unsqueeze(0) # shape [1, 7]
updated_cooc = self.cooccurrence(predicted_cooc)
seq_list.append(updated_cooc)
# Create to a sequence of 4 again, where the cooccurence is updated
seq = torch.cat(seq_list, dim=0) # create shape [4, 7]
# Transition probability
_, (hidden, _) = self.transition(seq.unsqueeze(0)) # shape [1, 4, hidden size]
updated_seq = self.transition_probability(hidden[-1, :, :].unsqueeze(0)) # shape [1, 4, 7]
# Append the updated sequence
sequences_updated.append(updated_seq) # append
# Create tensor with all updated sequences
updated_tensor = torch.cat(sequences_updated, dim=0) # dim = 0 is the number of samples
# Output should sum to 100
updated_tensor = self.softmax(updated_tensor) * 100
return updated_tensor
```

I furthermore tried some variants on this, for example only updating time step per timestep and sort of Markov Chain theory. But current models donâ€™t improve results.

#### Question

Does anyone have experience regarding this / know what theory/architecture I could be using? Or should I look at it a total different way?

I am happy to provide further (data) information if necessary!