When I train the model, I split the data into train/val/test parts:
from torchtext import data
train_iter, val_iter, test_iter = data.Iterator.splits(
(train_data, val_data, test_data),
batch_sizes=(64, 640, 640),
device=args.device, repeat=args.repeat
)
I also evaluate the model performance on val data while training, and the performace is extremely good (95%+). I remember to do model.eval()
and thenmodel.train()
back.
Then I save the model using torch.save
.
I load the model and test it on slice_train_data
and slice_val_data
.
slice_train_examples = train_examples[:6400]
slice_train_data = DS(*fields, examples=slice_train_examples) # DS is a class inherited from torchtext.data.Dataset
slice_val_examples = val_examples[:6400]
slice_val_data = DS(*fields, examples=slice_val_examples)
Then I call model.compute_loss(data_iter)
.
def compute_loss(self, data_iter): # do validation
self.eval()
corrects, avg_loss = 0, 0
steps = 0
for i,batch in enumerate(data_iter):
loss, pos_n_energy, neg_n_energy = self.compute_batch_loss(batch)
avg_loss += loss.data[0]
corrects += self.corrects(pos_n_energy, neg_n_energy)
steps += 1
size = len(data_iter.dataset) # total size
avg_loss = avg_loss / size
accuracy = 100.0 * corrects / size
self.train()
return avg_loss, accuracy
Here comes the confusing part.
I notice that the train set and valid set are differently treated on the torchtext.data.Iterator.splits
.
train_iter, val_iter = data.Iterator.splits(
(slice_train_data, slice_val_data ),
batch_sizes=(640, 640),
shuffle=False,
device=0, repeat=False
)
Then the accuracy of model.compute_loss(train_iter) is 53.4375% and the accuracy of model.compute_loss(val_iter) is 92.3356%.
However if I do following:
val_iter, train_iter = data.Iterator.splits(
(slice_val_data, slice_train_data ),
batch_sizes=(640, 640),
shuffle=False,
device=0, repeat=False
)
The accuarcy of model.compute_loss(val_iter) is 49.8125%, and the accuracy of model.compute_loss(train_iter) is 83.3438%.
Why is the performance of train_iter and val_iter so different from each other?
What is the correct way to do splits while training and testing?
I saw some examples that they usually set the batch_size of validation set as the length of the data, say batch_sizes=(xx, len(slice_val_data))
. However, my validation set is too big (28,000+) to feed into my GPU. I set a smaller number as the batch_size
of validation set. Does this matter?
BTW, I used cnn+BatchNorm model. Does BatchNorm matter in this case?
Thank you very much.