How to get mini-batches in pytorch in a clean and efficient way?

Brando_Miranda · November 23, 2017, 10:39pm

I thought I had asked this on the pytorch forum but it seems I have not. How do we index a data set in a variable? I tried:

def get_batch2(X,Y,M):
    '''
    get batch for pytorch model
    '''
    X,Y = X, Y
    N = X.size()[0]
    batch_indices = torch.LongTensor( np.random.randint(0,N+1,size=M) )
    pdb.set_trace()
    batch_xs = torch.index_select(X,0,batch_indices)
    batch_ys = torch.index_select(Y,0,batch_indices)
    return Variable(batch_xs, requires_grad=False), Variable(batch_ys, requires_grad=False)

but I get an error that X and Y are variables and it can only index floattensors…Isn’t that odd? What should be the right way to do this?

weird error:

TypeError: torch.index_select received an invalid combination of arguments - got (torch.cuda.FloatTensor, int, torch.cuda.FloatTensor), but expected (torch.cuda.FloatTensor source, int dim, torch.cuda.LongTensor index)

ptrblck · November 24, 2017, 12:24pm

Make sure to call index_select with the same type of arguments, i.e. two tensors or two Variables.
Wrap your batch_indices into a Variable or just use X[batch_indices, :].

Brando_Miranda · November 29, 2017, 8:36pm

what types are X and batch_indices in ur example X[batch_indices, :]?

Brando_Miranda · November 29, 2017, 8:36pm

maybe useful:

ptrblck · November 29, 2017, 10:39pm

X can be a Tensor or a Variable. batch_indices is a LongTensor

X = torch.randn(10)
batch_indices = torch.LongTensor([0, 1, 2])

X[batch_indices]
Variable(X)[batch_indices]