Apply a nn.Module function to data in batches

maxmatical · November 24, 2019, 10:27pm

For example, I have data with features features in an n x m1 x m2 tensor, and a modelthat I’m using to transform the input features by putting it through the model



class model(nn.Module):
    def __init__(self):
        super(model, self).__init__()
        ...

    def forward(self, x):
        ...
        return x

model = model()

However due to the size of the data, I get a cuda out of memory error if I just do

new_features = model(features)

So I want to process the data in batches using a DataLoader, but what should I do after making the batch predictions to get the new_features as an n x m1 x m2 tensor?

ptrblck · November 25, 2019, 3:19am

The Data loading tutorial might be a good starter to familiarize yourself with the Dataset and DataLoader.
If you have already loaded all samples before, you could pass them to a TensorDataset (alternatively you could also write your own custom Dataset) and just wrap it in a DataLoader.

maxmatical · November 25, 2019, 4:20pm

From the documentation, to create a TensorDataset I just need the x and y values, and pass that into the TensorDataset? Does it require the y values be in a one-hot encoded tensor (so n x n_classes) or can it be just an n x 1 tensor?

ptrblck · November 25, 2019, 4:50pm

The shape of the target depends on your use case.
TensorDataset will take whatever tensors you pass to it.

However, if you are dealing with a multi-class classification (and use e.g. nn.CrossEntropyLoss), your targets should be LongTensors containing the class indices.
For a model output of [batch_size, nb_classes], your targets would be [batch_size] with values in [0, nb_classes-1].