So I have this Data set (9230 * 460) where 460 is the number of features and the target is (9230 * 1) which contains 10 different types of classes [1…10] , I have tried RFC with grid search and gave me 75% accuracy now I am going to try a Forward Neural Network with pytorch . My question is how can one know which is the best approach (algorithm) to have the best classification accuracy of course without getting over-fitting and thank you.
P.S : I’ll have more data soon , the data that I have now is for the year 2016 , I’ll have for 2015 and 2017 , is the RNN with LSTM the best answer there ? ( my features are 460 observations from crops taken from a satellite image for parcels on different dates )
You means each data has 460 dimension feature and each sample associated with date? And，could you give more detail about your features, like what kinds of feature? or your features are just from hidden output from another promgram?
Well, all I can thought is you can attemp to RNN(LSTM or GRU) to utilize the information of date, followed by fully connected layer and auto_grad with CrossEntropy.
If more detail，It’s useful to give better advise.
Hi Liujie_Zhang ,
Another Team handles the image processing , they have satellite images of an area throughout the year they extract for each parcel ( piece of land with visible barriers natural or man made ) the mean and the std of 10 different spectral bands ( Green , IR etc … ) that gives 20 features for each parcel on a single date and the output is a the type of crop in that parcel [1…10] . I have now the 2016 data which consists of 23 dates i.e 23 * 20 = 460 features , I’ll be having soon the rest of the data from 2015 and 2017 , My question is what’s the best approach to have the best overall accuracy.
If you want more details just tell me and thank you.
OK, I almost understand your data frame.
I want to ensure that for each data [x1,x2, …, x460]，the front of 20 features [x1, …, x20] is associated with a timestamp(like Date0)，and [x21,…, x40] is associated with Date1… MeanWhile, I suppose that Date0 < Date1 < … < Date22，which means Date0 is earlier than Date1.
If so, you can use RNN to construct model to clasify craps(10 classes).
step1: split your sample feature to time step
[x1, x2, …, x460] —> [[x1,x2,…,x20], [x21,x22,…x40],…]
So it can be seen 23 sequences and each item with 20 features.
step2: construct GRU or LSTM model and feed the data
Now, you can feed batch data with size [batch_size, seq_len, feature_dim], which seq_len is 23 and feature_dim is 20 as above.
step3: auto_grad CrossEntropy loss
Just train and eval.
There many RNN model examples in github, follow it and train your model.