I have a dataset with the following columns: book
, char1
, char2
, span
.
book
, char1
, and char2
are integers, whereas span
is a matrix Tensor of integers.
I would like to implement negative sampling so that, for each batch that I retrieve from my DataLoader that wraps the dataset, I also get a batch of negative samples.
For each individual data row retrieved (there may be multiple rows retrieved per batch, of course), I would like to have N
negative samples retrieved as well, so that a negative sample is a single row from any of the span
matrices in my dataset.
Naively, this is how I would retrieve a single negative sample (just to illustrate):
def getNegativeSamples(dataset, N):
ret = []
for i in range(N):
# Choose which row the data will be pulled from
dataset_row = dataset[random(0, len(dataset))]
span = dataset_row[3]
# Choose which row of the span matrix we will use
span_entry = span[random(0, len(span))]
ret.append(span_entry)
# ret has N samples in it
return ret
How can I implement this cleanly in PyTorch?