I am currently working with pytorch 0.4.1
In my dataset, there are 10,736 instances that have 9 different classes.
I tried to train my models with same training set and testset in the dataset.
After applying torch.manual_seed(), I splited train and test dataset with data.random_split function. each size is 0.8 and 0.2
The code is here
1st.
torch.manual_seed(10) train_dataset, test_dataset_1gram = data.random_split(full_dataset_1gram, [train_size, test_size]) train_loader = data.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle = False) test_loader_1gram = data.DataLoader(test_dataset_1gram, batch_size=BATCH_SIZE, shuffle=False)
2nd.
# spliting torch.manual_seed(10) train_dataset, test_dataset_2gram = data.random_split(full_dataset_2gram, [train_size, test_size]) train_loader = data.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle = False) test_loader_2gram = data.DataLoader(test_dataset_2gram, batch_size=BATCH_SIZE, shuffle=False)
… last
# spliting torch.manual_seed(10) train_dataset, test_dataset_addition = data.random_split(full_dataset_addition, [train_size, test_size]) train_loader = data.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle = False) test_loader_addition = data.DataLoader(test_dataset_addition, batch_size=BATCH_SIZE, shuffle=False)
the distribution of test datasets is here
1st
[(0, 302),
(1, 467),
(2, 595),
(3, 101),
(4, 6),
(5, 160),
(6, 72),
(7, 244),
(8, 201)]
2nd
[(0, 302),
(1, 467),
(2, 595),
(3, 101),
(4, 6),
(5, 160),
(6, 72),
(7, 244),
(8, 201)]
3rd
[(0, 302),
(1, 467),
(2, 595),
(3, 101),
(4, 6),
(5, 160),
(6, 72),
(7, 244),
(8, 201)]
4th
[(0, 306),
(1, 466),
(2, 596),
(3, 99),
(4, 5),
(5, 157),
(6, 75),
(7, 242),
(8, 202)]
BOOM? why the distribution suddenely different?
the overall dataset would be this
[(0, 1532),
(1, 2470),
(2, 2937),
(3, 447),
(4, 39),
(5, 732),
(6, 387),
(7, 1179),
(8, 1013)]
and I checked all of datasets I have same distribution.
what is the problem?