Reconstruction of a vector by autoencoder (effect of input size and range)

Hello,

I have two questions about the Autoencoder based signal (a vector here, considering an FC-autoencoder) reconstruction. I would be really thankful if anyone helps me through this.

1: does reconstruction error depend on vector size? (for example reconstruction a dataset with signals of 3 dimension like, [x1,x2,x3] versus 4 dimensional inputs like [x1,x2,x3,x4])

  1. can we say that reconstruction error depends on the mean of the inputs? set of inputs with higher values (like [10,10,10] would show higher reconstruction error versus lower ranged inputs ( like [1,1,1])? if yes is it recommended to normalize data beforehand?

Thank you so much

  1. It depends how the loss is calculated. If you are using e.g. nn.MSELoss in the default setup, the loss value should not depend on the input feature dimension:
x = torch.randn(1, 3)
y = torch.randn(1, 3)
criterion = nn.MSELoss()
loss_small = criterion(x, y)

x = torch.randn(1, 3000)
y = torch.randn(1, 3000)
loss_large = criterion(x, y)

print(loss_small)
> tensor(1.5440)

print(loss_large)
> tensor(2.0731)

However, you can of course use reduction='sum', which would change it.

  1. It depends again on your use case and the loss value will depend on the magnitude:
y = torch.tensor([[1.]])
rel_err = 1e-1
x = y - y * rel_err
loss_small = criterion(x, y)

y = torch.tensor([[100.]])
x = y - y * rel_err
loss_large = criterion(x, y)

print(loss_small)
> tensor(0.0100)

print(loss_large)
> tensor(100.)

As you can see, the loss is much higher in the second use case even though the relative error is the same. I don’t know how you are interpreting the loss, but it doesn’t necessarily mean that the first use case is “better” than the second one.
That being said, normalizing the inputs often helps during the training so you might want to normalize the inputs anyway and could even “unnormalize” the outputs, if necessary.

1 Like

Thank you so much @ptrblck. You are right, I think I should deeply consider and think about my loss_function.
I think I should read your comments several times and think more. Thank you again.