I have trained a simple network for sentence classification on 4 classes using
from pytorch_pretrained_bert.modeling import BertForSequenceClassification.
Then I evaluate it on 2 sequences (sentences) placing to DataLoader
- only second sentence, and
- both sentences.
In the first case the result for the second sentence is
tensor([[-0.3797, 4.1902, -3.0362, -0.9368]])
for the second case the result for the same second sentence is
tensor([[-0.0066, -2.3150, 3.2263, -0.3096]])
The snippet of my code is the following:
def evaluate(logger, model, device, eval_dataloader, eval_label_ids, num_labels, verbose=True):
model.eval()
for input_ids, input_mask, segment_ids, label_ids in eval_dataloader:
input_ids = input_ids.to(device)
input_mask = input_mask.to(device)
segment_ids = segment_ids.to(device)
label_ids = label_ids.to(device)
with torch.no_grad():
logits = model(input_ids, segment_ids, input_mask, labels=None)
print ( str(logits) )
The results are independent from the used device - cpu or gpu.
The eval_batch_size = 1 in both cases.
If I place more sentences to DataLoader then then the results again are varied.
What could be the cause of such behaviour of the model
and how to fix this problem ?