I have a binary dependent variable and I am unclear as to:
- get BCEWithLogitsLoss to work
- incorporate pos_weight (how exactly do I calculate the weights, is it of the total data set?) One class has 7000 observations and the other has 2224 in the total data set. Should it just be a tensor thats like: torch.tensor([0.3, 0.7]) (more emphasis on the pos samples that are under-represented)?
For my loss function, CrossEntropyLoss is working, but I believe BCEWithLogitsLoss should be used instead (I think?).
My model outputs logits that look like:
tensor([[ 0.5015, -0.0165],
[ 0.5486, 0.0320],
[ 0.4227, 0.1604],
[ 0.2781, -0.0317],
[ 0.2667, 0.2109],
[ 0.1847, -0.1724],
[ 0.2727, -0.0598],
[ 0.3827, 0.1195],
[ 0.2796, -0.2183],
[ 0.6082, -0.1816],
[ 0.4710, -0.0551],
[ 0.1589, 0.0477],
My labels look like:
tensor([[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[1],
[0],
[0],
The error I get with BCEWithLogitsLoss is always; bool value of Tensor with more than one value is ambiguous.
I believe I got the weights to work appropriately:
# helper function to count target distribution inside tensor data sets
def target_count(tensor_dataset):
count0 = 0
count1 = 0
total = []
for i in tensor_dataset:
if i[1].item() == 0:
count0 += 1
elif i[1].item() == 1:
count1 += 1
total.append(count0)
total.append(count1)
return torch.tensor(total)
# prepare weighted sampling for imbalanced classification
def create_sampler(target_tensor, tensor_dataset):
class_sample_count = target_count(tensor_dataset)
weight = 1. / class_sample_count.float()
samples_weight = torch.tensor([weight[t[1]] for t in tensor_dataset])
sampler = torch.utils.data.WeightedRandomSampler(weights=samples_weight,
num_samples=len(samples_weight),
replacement=True)
return sampler
train_sampler = create_sampler(target_count(train_dataset), train_dataset)
val_sampler = create_sampler(target_count(val_dataset), val_dataset)
test_sampler = create_sampler(target_count(test_dataset), test_dataset)