I am trying to model a real-valued function with real-valued input and outputs.

however, before that I thought it’s a good idea to understand how to model the simplest function, the “constant” function. The architecture is simply an input y, some non-linearity, then back to a single output y_pred. hence the title “autoencoding a single real number”

here is what I came up with so far

I could not get it to converge. My value of y is between 0 and 1, so my model happily learned that it should output 0.5 all the time (kind of a common issue). I tried changing the loss function to L1 loss, L2, nothing worked.

anything helps. I implemented a transformer network for some fancy task and had to tear it all down to realise this was the core problem of my model not converging . . .

edit: solved.

solution: David Qiu torch.sum((y - y_pred) ** 2): dimensions don’t match and it is doing some broadcasting to mess you up