Evaluation metrics

Hi! This is a simple question but for some reason I’m having a hard time finding the answer. It says here (https://pytorch.org/ignite/metrics.html) that metrics doesn’t store the memory of the entire output history of the model.

However, the evaluation loop predicts y values (a regressive MLP), and the MSE is computed with the ground truth y values per batch. Therefore, the “mse” returned by the evaluation metrics thing gets overriden every batch. I want to have the “mse” of the entire validation set - this seems not compatible with the current setup.

def create_evaluation_loop(
    model: nn.Module, cfg: DictConfig, name: str, device="cpu",
) -> Engine:

    # Loss
    if cfg.sum:
        loss_fn = torch.nn.MSELoss(reduction="sum")
    else:
        loss_fn = torch.nn.MSELoss()

    def _inference(engine, batch):
        model.eval()
        with torch.no_grad():
            x, y = batch
            x, y = x.to(device), y.to(device)
            y_pred = model(x)

            if cfg.house:
                factor = 6.0
            else:
                factor = 1777.0

            # pdb.set_trace()

            y = y * factor
            y_pred = y_pred * factor

            if cfg.envelope:
                y_hat_env = torch.tensor(np.abs(hilbert(y.cpu().detach().numpy())))
                y_pred_hat_env = torch.tensor(
                    np.abs(hilbert(y_pred.cpu().detach().numpy()))
                )
                mse_val = loss_fn(y_hat_env, y_pred_hat_env).item()
            else:
                mse_val = loss_fn(y, y_pred).item()

        # Anything you want to log must be returned in this dictionary
        # pdb.set_trace()
        infer_dict = {
            "loss": mse_val,
            "y_pred": y_pred,
            "y": y,
            "ypred_first": [y[0], y_pred[0]],  # * mean + stdv,  # * mean + stdv,
        }

        return infer_dict

    engine = Engine(_inference)

    engine.logger = setup_logger(name=name)

    metrics = {
        "mse": Loss(
            loss_fn,
            output_transform=lambda infer_dict: (infer_dict["y_pred"], infer_dict["y"]),
        ),
    }

    for name, metric in metrics.items():
        metric.attach(engine, name)

    return engine

Also, what is the difference between the trainer which returns an “output” and the evaluator which returns the infer_dict? The difference comes when you do “metrics = {
“mse”: Loss(
loss_fn,
output_transform=lambda infer_dict: (infer_dict[“y_pred”], infer_dict[“y”]),
),
}” and then “attach” the engine and name together… but im not really quite clear on what that does. I know for fact that with the trainer, the output dictionary is very online in that it stores no history. So I’d assume the “attach” stores the history, but then the link above specifically says that it doesn’t?

Thanks for your time!

@pytorchnewbie sorry I saw the question but forgot to answer.

If I understand correctly the problem, you would like to compute MSE metric (https://pytorch.org/ignite/metrics.html#ignite.metrics.MeanSquaredError).

To do that create a validation engine and attach MSE metric to that:

from ignite.engine import create_supervised_evaluator
from ignite.metrics import MeanSquaredError

val_metrics = {
    "mse": MeanSquaredError(),
}
evaluator = create_supervised_evaluator(model, metrics=val_metrics)

res = evaluator.run(val_loader)
print(res.metrics["mse"])

Also, what is the difference between the trainer which returns an “output” and the evaluator which returns the infer_dict?

Output of an engine is not restricted at all, but should be coherent with how we would like to use it after inside attached handlers/metrics. See concepts for more details.
If we would like to log batch loss during the training, then we need logically to return its value. But in addition if we would like to do something with predictions or compute another error metric etc we can also put everything into output and work with the output inside attached handler.
Same for validation engine. If we use it to compute metrics, we need to output predictions and target. If we set it to do inference=compute prediction and write the prediction after inside a save_handler. We output only predictions…

Hope it is more clear.

So I’d assume the “attach” stores the history, but then the link above specifically says that it doesn’t?

It depends on metric. Some metric can compute in online manner: storing only certain internal variables. For example, MSE = 1/N sum ( sample_err^2 ). It can be computed as accumulation of all sample_err^2 and finally divided by number of seen samples. So, practically, we do not store a list of [y_pred1, y_pred2, ...] and [y1, y2, ...] to compute sample_err etc.
But there are metrics, e.g. median “something” error that can not be computed like that and we need to store two lists and execute computation on stored history.

You can also compute any metric on the training dataset during the training (see here how to do this). But, in this case you should be aware of the fact that model is also changing. So, it means that first predictions in the begining of an epoch will be worse than the last ones, but everything will be accounted to compute final overall training dataset metric. For example, accuracy on an epoch of 3 batches (bs=16) can give the following: num_correct_per_batch = [0, 4, 8]. Final accuracy is (0 + 4 + 8) / (16 + 16 + 16). However, on validation it could give (8 + 8 + 8)/(16 + 16 + 16).

Please, let me know if it answers your questions.

Hi! thanks for your answer but not quite…

I’m asking: the metrics is calculated per batch, but I want the mse on the entire validation dataset. How do I do this? Do i have to store all the predictions somehow? How do I “store” them if so?

Also, when you say “it depends on the metric”. This sounds like behind the scenes ignite figures out whether to hold on the the storage of the history of predictions or not. #1) Does that mean that ignite’ MSE will automatically calculate the MSE on the entire validation set, not just one batch at a time? #2) But how do I do it for a personal metric that ignite doesn’t have already?

Thanks!

I’m asking: the metrics is calculated per batch, but I want the mse on the entire validation dataset. How do I do this? Do i have to store all the predictions somehow? How do I “store” them if so?

There is EpochMetric which computes a metric using user’s function on the entire history. https://pytorch.org/ignite/metrics.html#ignite.metrics.EpochMetric

This sounds like behind the scenes ignite figures out whether to hold on the the storage of the history of predictions or not.

Yes, each metric implementation know how to compute itself.

  1. Does that mean that ignite’ MSE will automatically calculate the MSE on the entire validation set, not just one batch at a time?

Yes, MSE metric attached to an engine compute MSE on the entire input data.

#2) But how do I do it for a personal metric that ignite doesn’t have already?

See here : https://pytorch.org/ignite/metrics.html#how-to-create-a-custom-metric