Say I have the feature datasets X and label sets Y, whose data is ordered. How can I split my data sets into trainning sets and validation sets and make sure that the data is orderless? Thx.
Thank u so much, I had thought that this function can’t shuffle the data sets. I solved my problems.
# 将原本的训练集数据集划分成训练街和验证集 x_train, x_val, y_train, y_val = train_test_split(X_train,Y_train, test_size=0.3, shuffle=True, random_state=0) print(X_train.shape,Y_train.shape) print(x_train.shape, y_train.shape) print(x_val.shape, y_val.shape) print(y_train[:6]) print(y_val[:6])
(9791, 7168) (9791,) (6853, 7168) (6853,) (2938, 7168) (2938,) [10 5 11 11 10 1] [8 6 6 2 2 1]
Also a nice feature is to use
stratify to make sure both splits have approx. the same target distribution.