Hi, It might not be pytorch Question but it is About Machine Learning.
Doing Sentiment Analysis Using Traditional ML Algorithm(Either SVM or RF). So, Without PCA i either overfit or have a very low ACC.
To avoid overfitting i try to use PCA on a text data. This is the code :
vectorizer = CountVectorizer(token_pattern=r'[^\s]+',ngram_range=(1, 2))
vectors = vectorizer.fit_transform(df["Comments"])
svd = TruncatedSVD(n_components=10, random_state=42)
data = svd.fit_transform(vectors)
for train_index, test_index in skf.split(data, y):
X_train, X_test = data[train_index], data[test_index]
y_train, y_test = y[train_index], y[test_index]
text_classifier = SVC(C=100, decision_function_shape='ovo', gamma=0.01)
text_classifier.fit(X_train, y_train)
train_yhat = text_classifier.predict(X_train)
train_acc = accuracy_score(y_train, train_yhat)
train_scores.append(train_acc)
test_yhat = text_classifier.predict(X_test)
test_acc = accuracy_score(y_test, test_yhat)
test_scores.append(test_acc)
In this case, model has low acc. Without PCa it overfits.
What Should i do?