How to process large batches of data

I have a neural network that consists of 3 convolutional layers, and 2 fully connected layers. The input data is of size 5194 * 4 * 256 * 256. I know this amount of data is huge, and GPUs will definitely not have enough space to hold this amount of data. So I instead defined the following function to sequentially process this amount of data:

def net_forward(in_Var, net, part=10):
    in_Var_ls = []
    for i in range(part-1):
    res1, res2 = net(in_Var_ls[0])
    for i in range(1, part):
        out1, out2 = net(in_Var_ls[i])
        res1 =[res1, out1], dim=0)
        res2 =[res2, out2], dim=0)
    return res1, res2

However, I still got the run out of memory error. What should I do?

Why aren’t you using the built in data loader? That’s the very reason why pytorch has a dataloader which automatically manages stuff for you.

The reason is that I’ve tried to use data loader, but does not work for my specific case. I’m doing a training where I cannot use stochastic gradient descent, and I must use batch gradient descent which means every iteration I have to process the whole dataset to get the value output, and back propagate through the value output with extra gradient. I’ve tried to divide the data into different batches but it didn’t work for my case.

I’ve got error like “torch: not enough memory: you tried to allocate 2GB.” I have 64 GB of memory, why I still do not have 2GB? That’s not true since I’m training entirely on cpu and not using GPU memory.

@Peter_Ham because pytorch accumulates gradients, to do batch descent, you can just call optimizer.step() at the end of your dataset epoch.

For example:

for epoch in range(epochs):
    for data, target in dataset:
        out = model(input)
        err = loss_fn(out, target)

Thanks! This solves my problem!

Hi @smth

As you mentioned i modified the transfer learning code. But I am getting error. Can you help me out on this.

def train_model(model, criterion, optimizer, lr_scheduler, num_epochs=25):
since = time.time()

best_model = model
best_acc = 0.0

for epoch in range(num_epochs):
    print('Epoch {}/{}'.format(epoch, num_epochs - 1))
    print('-' * 10)
optimizer=lr_scheduler(optimizer, epoch)
    # Each epoch has a training and validation phase
    for phase in ['train', 'val']:
        if phase == 'train':
            #optimizer = lr_scheduler(optimizer, epoch)
            model.train(True)  # Set model to training mode
            model.train(False)  # Set model to evaluate mode

        running_loss = 0.0
        running_corrects = 0

        # Iterate over data.
        for data in dset_loaders[phase]:
            # get the inputs
            inputs, labels = data

            # wrap them in Variable
            if use_gpu:
                inputs, labels = Variable(inputs.cuda()), \
                inputs, labels = Variable(inputs), Variable(labels)

            # zero the parameter gradients

            # forward
            outputs = model(inputs)
            _, preds = torch.max(, 1)
            loss = criterion(outputs, labels)

            if phase == 'train':

            running_loss +=[0]
            running_corrects += torch.sum(preds ==

    if phase=='train':
        epoch_loss = running_loss / dset_sizes[phase]
        epoch_acc = running_corrects / dset_sizes[phase]

        print('{} Loss: {:.4f} Acc: {:.4f}'.format(
            phase, epoch_loss, epoch_acc))

        if phase == 'val' and epoch_acc > best_acc:
            best_acc = epoch_acc
            best_model = copy.deepcopy(model)


time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
    time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))
return best_model

def exp_lr_scheduler(optimizer, epoch, init_lr=0.001, lr_decay_epoch=7):
lr = init_lr * (0.1**(epoch // lr_decay_epoch))

if epoch % lr_decay_epoch == 0:
    print('LR is set to {}'.format(lr))

for param_group in optimizer.param_groups:
    param_group['lr'] = lr

return optimizer	

model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 10)

if use_gpu:
model_ft = model_ft.cuda()

criterion = nn.CrossEntropyLoss()
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,num_epochs=25)

RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at /py/conda-bld/pytorch_1493676237139/work/torch/lib/THNN/generic/ClassNLLCriterion.c:57