Why does my deep-learning model (for classification problem) get all predicted values converge to a single label?

Actually nn.CrossEntropyLoss does the logsoftmax for you. Here is the link
So, you don’t need to add a softmax layer. sry for the confusion.

Age, Area should be considered as continuous variables. That’s why I mentioned about normalization.
Your last paragraph explanation is so called “Min-Max normalization”.
I talked about more like standardization which use mean and variance(mean=0 and std=1).
Both ways would be fine.

edit) Sry again. Area is the post code!! Ok, then you may want to use embedding with fixed length.