I have a very simple use case where I would like to predict a floating number (y), given four floating numbers (x1, x2, x3, x4) that denotes the property of a process. I would like to have some pointers or insights on different supervised and unsupervised approaches to solve this problem and which one among the two would be a better choice?

y - doesn’t have any range so far, and can be any floating value.

Example:

Training data:
x1 = 3758687 (throughput)
x2 = 2.8678 (latency)
x3 = 0.098 (loss)
x4 = 0 or 1 (binary)
y = {1.28,2.28,4.019,9.8,13,2}

now using this I need to create a model to be able to predict ‘y’.

First, especially because your xs differ in scale by several orders
of magnitude, you should most likely normalize your input data to
be or order one. This makes it easier for your training to get started,
and your model doesn’t have to “learn” to deal with these very different
scales.

Given that your input data has no temporal or image-like structure,
I would suggest that you start with a so-call multi-layer perceptron
(MLP) that consists just of fully-connected Linear layers and
activations.

Because you have four input variables, your first Linear will have in_features = 4, and because you are predicting a single output
value, your final Linear layer will have out_features = 1. You
might start with “hidden” layers with 64 “neurons.” I like to start with Sigmoid as my non-linear activation (although ReLU is another
popular choice that has some possible theoretical advantages and
shows experimental benefits in some applications).

You would then typically experiment a little bit by making the network shallower – go down to a single hidden layer; deeper – go up to
perhaps three or four hidden layers; narrower – fewer “neurons” per
layer; and wider – more “neurons” per layer. You might also try switching
all or some of your activations to ReLU.

I assume that your predicted value, y, is a continuous scalar value
where it’s meaningful to say that a predicted value is closer to or further
away from your ground-truth target value (and that closer is better).

For this situation, MSELoss would likely be the most appropriate
loss criterion.

If you have good training data – that is, enough samples for which you
have the input xs as well as the ground-truth value for y – supervised
learning will be the clearly preferable approach. Unsupervised learning
is a distinctly more difficult problem.