How to run large images on gpu

yichao_Liu · July 20, 2020, 2:07pm

I have 12000 1024512 images as dataset. Initially, I resize them to 224112, but the results are terrible. Since these images are scattering images (physics images), I think the resolution may have some information, So I decided to use large images to run the prediction model (predicting some parameters). But when I only use 10 layers networks, it run out of memory. If someone can help me solve this, I’m very thankful.

I planed to load a batch into gpu and train it, then load another batch into gpu and train. But I don’t know how to implement…

RaLo4 · July 21, 2020, 1:56pm

First up I would recommend using square images if possible. For example 224 x 224.

On how to train on your gpu with a specific batch size:
When defining a dataloader you can specify a batch size like so:

batch_size = 96
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, 
                                           shuffle=True, num_workers=workers)

In the training loop you can then get your individual batches by doing something like:

device = torch.device('cuda:0')
for epoch in range(n_epochs):
    """training loop for all epochs"""
    for i, data in enumerate(train_loader, 0):
        """training loop for one batch"""
        #get images and labels
        inputs, labels = data
        #move data to gpu
        inputs, labels = inputs.to(device), labels.to(device)

yichao_Liu · July 22, 2020, 1:50pm

Thank you so much, I did so. But still can’t finish one epochs. I’m wondering how ImageNet runs on gpu. Because 124G data can be ran on one gpu.

RaLo4 · July 22, 2020, 2:14pm

If you get a CUDA out of memory error when running on GPU you can try using a smaller batch size.
The 96 I used was just there as an example.

It is oftentimes run on multiple GPUs but you can still run it on only one GPU, you just need a small enough batch size.
Because there is always just one batch at a times being put onto the GPU instead of all images, the total amount of images in the dataset does not matter.
Just use any form of dataloader, like the one I showed you above, with a small enough batch size and it should work.

yichao_Liu · July 22, 2020, 3:37pm

Thing is this dataset is a customed dataset. I wrote a dataloader for it (not Pytorch DataLoader), but I think maybe it’s the problem with that I load all the dataset in init function.

RaLo4 · July 23, 2020, 7:00am

It doesn’t really matter that you use your on dataloader. You just have to make sure that you only move one batch at a time on to the GPU. Like I did here:

RaLo4:

    for i, data in enumerate(train_loader, 0):
        """training loop for one batch"""
        #get images and labels
        inputs, labels = data
        #move data to gpu
        inputs, labels = inputs.to(device), labels.to(device)

As long as inputs and labels is always just one batch it should which kind of dataloader they come from.

Also! Check if you are maybe putting different tensors onto the GPU and keeping them there.
If you for example do total_loss += loss instead of total_loss += loss.item() for logging your loss, your GPU memory will fill up.

yichao_Liu · July 23, 2020, 7:20am

Following is my code, it’s same as you said, but it still has out of memory issue:

> for batch_idx, (inputs, targets) in enumerate(trainloader):
>         cur_iter = (epoch - 1) * len(trainloader) + batch_idx        
>         gpu_tracker.track()
>         if use_cuda:
>               inputs, targets = inputs.cuda(non_blocking=True), targets.cuda(non_blocking=True)  # GPU settings    
>         optimizer.zero_grad()
>         inputs, targets = Variable(inputs, requires_grad=True), Variable(targets)

RaLo4 · July 23, 2020, 7:51am

Depending on what pytorch version you are on Variable() is actually long deprecated and I am not sure you need non_blocking to be True for your application, but all in all your code should work.
You might wanna check if you have other tensors on the GPU. If you do anything with the tensor that is your models output, like logging, you might wanna check if it gets freed correctly, like I mentioned here:

yichao_Liu · July 23, 2020, 9:45am

Thank you so much for your patient explaination. I also did as you told me, but still have memory problem. I strongly doubt that pytorch load all the dataset in gpu which are 12000 896 * 448 images (48G). But ImageNet has 124G and can run on 18 G gpus. I have 2 nodes which is 48 G for each node. I can’t even run it with batch size 1.
And how can I check if there are other tensors on gpu? Thank you.

else: # logging for classification
            #_, predicted = torch.max(out.data, 1) for classification
            total += targets.size(0)
            #correct += predicted.eq(targets.data).cpu().sum() for classification
            loss_total+=loss

            log_dict = {"iter": cur_iter, "loss": loss.item(), "epoch": epoch}
            train_log.write("{}\n".format(json.dumps(log_dict)))
            train_log.flush()

RaLo4 · July 23, 2020, 9:56am

That’s what I meant.
Try changing:

to:
loss_total+=loss.item()

yichao_Liu · July 24, 2020, 11:41am

Hi, thank you so much, I fixed it. there really has other tensors on the gpu.