I am new to python and Pytorch. Don’t have a background in Mathematics. Recently in a task of predicting four scores for a pair of sentences through regression, I tried to implement it with Pytorch. Now my problems are about testing after training steps.
Question 1: When testing, how am I going to do in order to get prediction for each testing instance while the testing is also running with batches and epochs? During the training, I used MSELoss to measure the model performance and it is working well to report the loss for each epochs. However, I need to obtain the correlation (say, pearson, spearman, etc) score with human annotators. I was puzzled by this issue as the total testing instance is around 500, but on testing, the model does extactly 10 epochs as it did in the training with batches, which are not the total number of testing instances until it finishes.
Question 2: Is there a quicker way to run testing with just one go? I mean, by any chance, can we do the similar cross-validated with conventional approach (not deep learning)?
Attached below is part of my testing module and the main function (those commented lines are what I am trying to do for correlation but failed ):
Many thanks. Your timely help and friendly input will be much appreciated.
def test(model, vocab, args):
data_loader = DataLoader(vocab, args.test_file)
criterion = nn.MSELoss()
loss = [0., 0., 0. ,0.]
tot_size = 0
#test_orgs = [,,,]
#test_preds = [,,,]
for input, target, score in data_loader.get_batches(args.batch_size, shuffle = False):
batch_size = len(input)
tot_size += batch_size
input = Variable(torch.LongTensor(input))
target = Variable(torch.LongTensor(target))
golden = Variable(torch.FloatTensor(score))
input = input.cuda()
target = target.cuda()
golden = golden.cuda()
preds = model(input, target, batch_size)
for i, (pred, max_score) in enumerate(zip(preds, [35, 25, 25, 15])):
loss[i] += batch_size * criterion(predmax_score, golden[:, i]).data
#print(predmax_score) #, golden[:, i]
result = [ l/tot_size for l in loss ]
logger.info(’[Test MSE]: Usefulness: %5.3f; Terminology: $%5.3f; Idiomatic Writing: %5.3f; Target Mechanics: %5.3f ’ %(result, result, result, result))
return result #,test_orgs,
vocab = Vocab(args.train_file, args.src_emb, args.tgt_emb)
model = AttentionRegression(vocab, args.emb_size, args.feature_size, args.window_size, args.dropout, args.hidden_size, args.n_layers, args.attention_size)
if use_cuda: model.cuda() # starting training, comment this line if you are loading a pretrained model #train(model, vocab, args) # or load a pretrained model, comment this if you are training a new model model = torch.load('translation_quality3.pt', map_location=lambda storage, loc: storage) model.eval() # model.test() test(model, vocab, args) #test_orgs, test_preds = test(model,vocab, args), test(model,vocab,args) (where I attempted to try but no luck) logger.info('---------------------start evaluating on test data-----------------------') logger.info('[Model Hyperparameters: ] \nEmbedding Size: %5d; Feature Size: %5d; Window Size: %3d; Dropout: %5.3f; Hidden Layer Size: %5d; Layers: %3d; Attention Size: %3d'%(args.emb_size, args.featur