Input size dimensions (when not use images) on nn for classification

titoniubo · January 7, 2020, 10:58am

Hello,

My question is regarding the input size of the classification/regression when we use n columns instead of images.

Let’s say I have a classification problem, for which I have 20 numerical variables (500 raws each) and 1 label with 5 possible classes. I understand the output size is 5.

However, I do not know what is the input class, is it 1*20= 20?

Most Pytorch tutorials I have found for multi-class classification use MNIST dataset, thus the input size is 28*28, but what if I have several varibles instead of pictures?

Thanks for your help.

KFrank · January 7, 2020, 4:35pm

Hi Josep!

Yes, this is correct – with the proviso that pytorch models expect
batches of inputs.

So, if your batch size were 7, the input to your model would be a
tensor of shape (7, 20). (If you only want to feed one input sample
into your model, you still have to package it as a batch with a batch
size of 1; thus your input tensor would have shape (1, 20).)

For 20 input variables (without any spatial structure as an image
would have) and 5 classes, your network could be a simple as
something like:

model = nn.Sequential(
    nn.Linear (20, 50),
    nn.ReLU(),
    nn.Linear (50, 5),
)

(where I’ve chosen, for no particular reason to have 50 “hidden”
neurons).

Good luck!

K. Frank

titoniubo · January 7, 2020, 4:58pm

Thanks @KFrank,

I appreciate your feedback.

Can you recommend me any tutorial where Pytorch is used for Logistic Regression where the input is different from an image?

All tutorials I find use MNIST dataset.

I try to use Pytorch for a normal mulit-class classification problem, where I have over 20 variables and 1 target variable (5 class)

Thanks again for your time.

KFrank · January 7, 2020, 6:44pm

Hi Josep!

I don’t know offhand of a tutorial that is pytorch and logistic regression,
but doesn’t use the MNIST dataset.

But it doesn’t matter. If you look at one of your logistic-regression
tutorials (if it’s done right) you will see that it ignores the spatial
structure of the MNIST image. You just need to use your 20 variables
in lieu of the ~1000 variable (pixels) in the MNIST image (and adjust
the number of input variable in your model accordingly).

Just to be clear, you will be doing a multinomial (multiclass)
logistic regression (in contrast to binary). And (in the interest of
working through a practice problem) you’ll be using a sledge
hammer (pytorch) to perform an old-school logistic regression,
rather than using the power of neural networks (using pytorch)
to build what would likely be a much more capable classifier.
(There’s nothing the matter with this – I just want to make sure you
know that doing logistic regression with pytorch is a toy problem.)

Best.

K. Frank

titoniubo · January 8, 2020, 8:50am

@KFrank thanks for your time and answers.

You are right. It’s the first time I try to use regular variables instead of images on Pytorch.

If you could advise me on how to transform variables as input, I would be grateful.

So far I try (note I have 10 variables and 3 outputs):

X_train = np.array(X_train, dtype=np.float32)
X_train.shape

X_train = X_train.reshape(-1, 10)
X_train.shape

X_test = np.array(X_test, dtype=np.float32)
X_test.shape

X_test = X_test.reshape(-1, 10)
X_test.shape

Y_train = np.array(Y_train, dtype=np.float32)
y_train.shape

Y_train = Y_train.reshape(-1, 1)
Y_train.shape

I’d appreciate if you could let me know if my reasoning is correct.

Sincerely,

KFrank · January 9, 2020, 7:09pm

Hi Josep!

titoniubo:

So far I try (note I have 10 variables and 3 outputs):
X_train = np.array(X_train, dtype=np.float32)
X_train.shape
...
I’d appreciate if you could let me know if my reasoning is correct.

I’m not sure what you are trying to do here.

First, you need to package your input data as a pytorch tensor.
(That’s what pytorch works with.)

You need a batch size. It can be 1, if you want to work with a single
sample, but let’s say you use a batch size of 7. Then a single batch
of your input data should be a pytorch tensor of shape (7, 10). The
first row of this tensor will be one sample consisting of the values of
your 10 variables.

For regression (in contrast to a more full-featured neural network)
you would train the weights of s single Linear layer so that its
outputs best match your target value.

So you should instantiate nn.Linear (10, 3) as your “model”.

See if you can write a simple script that packs some data into a
tensor of the appropriate shape and successfully passes it through
a single Linear.

If you have issues, feel free to post the complete script, together
with any error messages or incorrect / unexpected output, and
forum participants will likely be able to help you further.

Good luck.

K. Frank

titoniubo · January 12, 2020, 4:35pm

Hello Frank,

Many thanks for your feedback.

titoniubo · January 14, 2020, 1:10pm

Hello Frank,

I hope this message finds you well.

Concerning my last enquiry, I have created a new post.

Should you have some time to have a look at it and let me know what your thoughts are, I would be very grateful.

Josep Maria