i added WeightedRandomSampler because of my unbalanced datased, but it only decreases stuff…
i use pretraind bert model for token classification
before/after
precision recall f1-score support
ANAT 0.4557 0.4186 0.4364 86
CHEM 0.7145 0.5354 0.6122 762
DEVI 0.3000 0.0789 0.1250 76
DISO 0.2898 0.4476 0.3518 248
GEOG 0.0000 0.0000 0.0000 14
LIVB 0.7382 0.7319 0.7350 235
OBJC 0.1064 0.1020 0.1042 49
PHEN 0.0000 0.0000 0.0000 29
PHYS 0.3889 0.2917 0.3333 72
PROC 0.5184 0.5570 0.5370 228
micro avg 0.5419 0.4925 0.5160 1799
macro avg 0.3512 0.3163 0.3235 1799
weighted avg 0.5577 0.4925 0.5142 1799
precision recall f1-score support
ANAT 0.2105 0.5581 0.3057 86
CHEM 0.4830 0.4094 0.4432 762
DEVI 0.2353 0.0526 0.0860 76
DISO 0.1559 0.1169 0.1336 248
GEOG 0.0000 0.0000 0.0000 14
LIVB 0.5144 0.6085 0.5575 235
OBJC 0.0000 0.0000 0.0000 49
PHEN 0.0179 0.0345 0.0235 29
PHYS 0.0000 0.0000 0.0000 72
PROC 0.1374 0.5965 0.2233 228
micro avg 0.2798 0.3741 0.3202 1799
macro avg 0.1754 0.2377 0.1773 1799
weighted avg 0.3310 0.3741 0.3259 1799
consider weights like this
def get_class_weights():
df = pd.read_csv(train_file_path, sep='\t', header=None)
df.columns = ['Text', 'Label']
new_df = df['Label'].value_counts().to_frame()
new_df['label'] = new_df.index
new_df.columns = ['count','label', ]
new_df['percentage'] = 1 - new_df['count'] / new_df['count'].sum()
class_weights = new_df['percentage'].to_list()
dev = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") #sometimes it cant see dev var
class_weights = torch.FloatTensor(class_weights).to(dev)
return class_weights
if i pass ones weights nothing changes…
def get_class_fixed_weights(): #manually create weights
class_weights = torch.ones((21,))
dev = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") #sometimes it cant see dev var
class_weights = torch.FloatTensor(class_weights).to(dev)
return class_weights
using of sampler:
training_set = CustomDataset(tokenizer, train_sentences, train_labels, MAX_LEN)
testing_set = CustomDataset(tokenizer, test_sentences, test_labels, MAX_LEN)
sampler = WeightedRandomSampler(weights=class_weights, replacement=True, num_samples=len(training_set))
train_params = {'batch_size': TRAIN_BATCH_SIZE, 'shuffle': False, 'num_workers': 0, 'sampler': sampler}
test_params = {'batch_size': VALID_BATCH_SIZE, 'shuffle': False, 'num_workers': 0}
training_loader = DataLoader(training_set, **train_params)
testing_loader = DataLoader(testing_set, **test_params)