mystera
(Anthony Stevens)
December 26, 2022, 1:16am
1
When training my model, I get the following error during the evaluation step:
TypeError: vars() argument must have __dict__ attribute
I can recreate this error also by simply running trainer.evaluate(). The model appears functional as I can save then reload it and execute inference successfully. So I’m unsure what would cause the trained.evaluate() to throw this error.
Any recommendations?
mystera
(Anthony Stevens)
December 26, 2022, 2:08am
2
target_names=["false", "true"] # [0, 1]
employee_count_df['label_as_int'] = (employee_count_df['label'] == True).astype(int)
labels = employee_count_df.label_as_int.values.tolist()
labels = np.array(labels)
full_dataset = employee_count_df.text.values.tolist()
class TenKDataset(torch.utils.data.Dataset):
def __init__(self, encodings, labels):
self.encodings = encodings
self.labels = labels
def __getitem__(self, idx):
item = {k: torch.tensor(v[idx]) for k, v in self.encodings.items()}
item["labels"] = torch.tensor([self.labels[idx]])
return item
def __len__(self):
return len(self.labels)
def compute_metrics(pred):
labels = pred.label_ids
preds = pred.predictions.argmax(-1)
# calculate accuracy using sklearn's function
acc = accuracy_score(labels, preds)
return {
'accuracy': acc,
}
(train_dataset, validation_dataset,train_labels,validation_labels)=train_test_split(full_dataset, labels, test_size=0.3)
train_encodings = tokenizer(train_dataset, truncation=True, padding=True, max_length=max_length)
train_dataset = TenKDataset(train_encodings, train_labels)
model_directory = "./output"
model=BertForSequenceClassification.from_pretrained(model_name, num_labels=len(target_names))
training_args = TrainingArguments(
num_train_epochs=3,
per_device_train_batch_size=8,
per_device_eval_batch_size=20,
weight_decay=0.01,
load_best_model_at_end=True,
logging_steps=300,
save_steps=300,
evaluation_strategy="steps",
output_dir=model_directory
)
trainer = Trainer( model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=validation_dataset,
compute_metrics=compute_metrics,)
trainer.train()
ptrblck
December 26, 2022, 5:40am
3
I assume you are using some higher-level API and the Trainer
class comes from HuggingFace?
If so, I guess this issue might be related.