I am trying to use the Adam algorithm for optimization, but instead of getting a decrease in the loss function value it increases up to some value and then remains the same.

The value of the objective function is calculated using a NN (`self.model`

variable), this is how the value of the objective function i calculated:

```
self.time_steps = torch.arange(self.settings.t_u + self.settings.t_p, self.settings.t_d, self.settings.delta_t)
d_t = F.pad(var_list[0], (2, 0), mode='constant', value=0.0)
d_s = F.pad(var_list[1], (2, 0), mode='constant', value=0.0)
times = list(map(lambda x: generate_network_input_row(d_t, d_s, 2, x), self.time_steps))
scores = self.model(torch.stack(times))
return (F.relu(scores - self.settings.max_state) ** 2).mean()
```

The `generate_network_input_row`

function is defined like that:

```
def generate_network_input_row(times, sizes, last_considered, current_time):
indexes = (times <= current_time).nonzero().flatten()[-last_considered:]
return torch.cat((current_time - times[indexes], sizes[indexes]))
```

Even though gradients are calculated the optimization does not seem to go in the right direction (mostly the loss increases). Could you please point me in the right direction, what I am doing wrong?