I am new to Python and Machine Learning in general. This question was raised for the case of a Gaussian Process training but I guess it is also applicable to Neural Networks.

```
# Find optimal model hyperparameters
model.train()
likelihood.train()
# Use the adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.1) # Includes GaussianLikelihood parameters
# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
for i in range(training_iterations):
optimizer.zero_grad()
output = model(train_x)
loss = -mll(output, train_y)
loss.backward()
print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))
optimizer.step()
```

In a typical training process, we are trying to minimize the loss function (which in this case is the Marginal Log Likelihood). My question is, what is the initial condition for that minimization? How can I input/output it to the training loop? Apologies in advance if the question is dumb or unspecific enough, I am just trying to make sense of the training stage from a mathematical perspective