Getting a weighted sampler:
df = pd.read_csv(file_path)
class_sample_count_y = len(df[df["label"] == "Y"].index)
class_sample_count_n = len(df[df["label"] == "N"].index)
weights = []
for x in df["label"].tolist():
if x == "Y":
weights += [1./ class_sample_count_y]
else:
weights += [1./class_sample_count_n]
num_samples = len(weights)
weights = torch.Tensor(weights)
weights = weights.double()
train_sampler = torch.utils.data.sampler.WeightedRandomSampler(
weights, num_samples)
How can I specify this for a buckeriterator?
train_iterator= data.BucketIterator.splits(
train_data
batch_size=BATCH_SIZE,
sort_key=lambda x: len(x.raw_text),
device=device,
)
There is no option for passing a sampler here.