Implementing negative sampling in PyTorch

Evan_Weissburg · December 26, 2019, 6:59pm

I have a dataset with the following columns: book, char1, char2, span.

book, char1, and char2 are integers, whereas span is a matrix Tensor of integers.

I would like to implement negative sampling so that, for each batch that I retrieve from my DataLoader that wraps the dataset, I also get a batch of negative samples.

For each individual data row retrieved (there may be multiple rows retrieved per batch, of course), I would like to have N negative samples retrieved as well, so that a negative sample is a single row from any of the span matrices in my dataset.

Naively, this is how I would retrieve a single negative sample (just to illustrate):

def getNegativeSamples(dataset, N):   
    ret = [] 
    for i in range(N):
         # Choose which row the data will be pulled from
         dataset_row = dataset[random(0, len(dataset))]
         span = dataset_row[3]
         # Choose which row of the span matrix we will use
         span_entry = span[random(0, len(span))]
         ret.append(span_entry)

    # ret has N samples in it
    return ret

How can I implement this cleanly in PyTorch?

ruchitpatel · May 7, 2020, 3:38am

I’m also looking for similar problem. Has anyone found a solution for this yet?

aaronrmm · November 9, 2022, 1:35pm

I found this searching for something similar. I have since found this implementation, which I think is very similar to your requirements (where the span matrix would almost fit exactly into the “labels” argument of the constructor), or at least a good example.