When I train the model, I split the data into train/val/test parts:
from torchtext import data train_iter, val_iter, test_iter = data.Iterator.splits( (train_data, val_data, test_data), batch_sizes=(64, 640, 640), device=args.device, repeat=args.repeat )
I also evaluate the model performance on val data while training, and the performace is extremely good (95%+). I remember to do
model.eval() and then
Then I save the model using
I load the model and test it on
slice_train_examples = train_examples[:6400] slice_train_data = DS(*fields, examples=slice_train_examples) # DS is a class inherited from torchtext.data.Dataset slice_val_examples = val_examples[:6400] slice_val_data = DS(*fields, examples=slice_val_examples)
Then I call
def compute_loss(self, data_iter): # do validation self.eval() corrects, avg_loss = 0, 0 steps = 0 for i,batch in enumerate(data_iter): loss, pos_n_energy, neg_n_energy = self.compute_batch_loss(batch) avg_loss += loss.data corrects += self.corrects(pos_n_energy, neg_n_energy) steps += 1 size = len(data_iter.dataset) # total size avg_loss = avg_loss / size accuracy = 100.0 * corrects / size self.train() return avg_loss, accuracy
Here comes the confusing part.
I notice that the train set and valid set are differently treated on the
train_iter, val_iter = data.Iterator.splits( (slice_train_data, slice_val_data ), batch_sizes=(640, 640), shuffle=False, device=0, repeat=False )
Then the accuracy of model.compute_loss(train_iter) is 53.4375% and the accuracy of model.compute_loss(val_iter) is 92.3356%.
However if I do following:
val_iter, train_iter = data.Iterator.splits( (slice_val_data, slice_train_data ), batch_sizes=(640, 640), shuffle=False, device=0, repeat=False )
The accuarcy of model.compute_loss(val_iter) is 49.8125%, and the accuracy of model.compute_loss(train_iter) is 83.3438%.
Why is the performance of train_iter and val_iter so different from each other?
What is the correct way to do splits while training and testing?
I saw some examples that they usually set the batch_size of validation set as the length of the data, say
batch_sizes=(xx, len(slice_val_data)). However, my validation set is too big (28,000+) to feed into my GPU. I set a smaller number as the
batch_size of validation set. Does this matter?
BTW, I used cnn+BatchNorm model. Does BatchNorm matter in this case?
Thank you very much.