Hi, i want to convert text data where it will be possible to standardize it. This is my Desired Pipiline:
1)convert data to the way it will be possible to standardize.
2)apply standardized data to Linear Discriminant Analysis for Dimensionality Reduction.
3)After LDA apply it to CountVectorizer
4)then split and train with Traditional ML Algorithm.
Is it a correct pipeline? Right now i’m stuck at the first step. how Should i convert from string to digits that it will be possible to standardize?
Or may be there is an easier way to do that with different pipeline ?
All tutorials are on IRIS dataset, where every column is continues digits data. not text.
One question though, while training the ML model with Kfold cross-validation. Is it okay to dump trained model at every SKF split to use that in testing ? or how do i use Kfold trained model? i want Earlystopping in Traditional ML
Are you limited on data? Having a dedicated validation set would be better than Kfold, imo, if not.
I’m not sure what you mean by “dump” the model.
It’s good practice to make checkpoints between each epoch and a log of any running stats. Then you can choose which set of parameters were best after the fact.
When you want to run validation data, just be sure to use these bits of code:
with torch.no_grad():
model.eval()
#test and get loss or other stats here
#good spot to make a checkpoint with stats log
model.train()
#continue training