How can I split my data sets into trainning sets and validation sets?

Say I have the feature datasets X and label sets Y, whose data is ordered. How can I split my data sets into trainning sets and validation sets and make sure that the data is orderless? Thx.

If your data is stored as numpy arrays, you could use scikit’s train_test_split method.
@kevinzakka created a Gist to split Datasets.

Thank u so much, I had thought that this function can’t shuffle the data sets. I solved my problems.

# 将原本的训练集数据集划分成训练街和验证集
x_train, x_val, y_train, y_val = train_test_split(X_train,Y_train, test_size=0.3, shuffle=True, random_state=0)
print(X_train.shape,Y_train.shape)
print(x_train.shape, y_train.shape)
print(x_val.shape, y_val.shape)
print(y_train[:6])
print(y_val[:6])

output

(9791, 7168) (9791,)
(6853, 7168) (6853,)
(2938, 7168) (2938,)
[10  5 11 11 10  1]
[8 6 6 2 2 1]

Also a nice feature is to use stratify to make sure both splits have approx. the same target distribution.