Pytorch multiple GPUs: AttributeError: 'list' object has no attribute 'to

Khawar_Islam · February 13, 2023, 4:15am

I have simply implemented DataParallel technique to utilize multiple GPUs on single machine. I am getting an error in fit function

https://github.com/mindee/doctr/blob/main/references/recognition/train_pytorch.py

from fastprogress.fastprogress import master_bar, progress_bar

In fit_one_epoch function:

    for images, targets in progress_bar(train_loader, parent=mb):

        images = images.to(device)
        targets = targets.to(device)

In main func:

model = model.to(device)
    if device == 'cuda':
        model = nn.DataParallel(model)
        # model = model.to(device)
        cudnn.benchmark = True

Traceback

Traceback (most recent call last):
  File "/home2/coremax/Documents/doctr/references/recognition/DP_KR.py", line 481, in <module>
    main(args)
  File "/home2/coremax/Documents/doctr/references/recognition/DP_KR.py", line 390, in main
    fit_one_epoch(model, train_loader, batch_transforms, optimizer, scheduler, mb, amp=args.amp)
  File "/home2/coremax/Documents/doctr/references/recognition/DP_KR.py", line 122, in fit_one_epoch
    targets = targets.to(device)
AttributeError: 'list' object has no attribute 'to

ptrblck · February 13, 2023, 4:59am

Based on the error message it seems your targets are passed as a list from the DataLoader. I don’t understand how nn.DataParallel is related to it as the data loading logic shouldn’t change. In any case, could you describe how you are loading the data and targets in your Dataset.__getitem__?

Khawar_Islam · February 13, 2023, 5:06am

You are right! this is docTR library and they are using different logic for a single GPU. Due to the huge amount of training data, I have to utilize multiple data. targets variable is problem for me

train_set = RecognitionDataset(
            parts[0].joinpath("images"),
            parts[0].joinpath("labels.json"),
            img_transforms=Compose(
                [
                    T.Resize((args.input_size, 4 * args.input_size), preserve_aspect_ratio=True),
                    # Augmentations
                    T.RandomApply(T.ColorInversion(), 0.1),
                    ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3, hue=0.02),
                ]
            ),
        )


train_loader = DataLoader(
        train_set,
        batch_size=args.batch_size,
        drop_last=True,
        num_workers=args.workers,
        sampler=RandomSampler(train_set),
        # sampler=train_data_sampler,
        pin_memory=True,
        collate_fn=train_set.collate_fn,
    )

ptrblck · February 13, 2023, 5:38am

Could you point me to the code which shows a different usage for multi-GPU use cases in the Dataset, please? I still don’t understand how this could be the case since the Dataset is not aware if you are using nn.DataParallel or not.

Khawar_Islam · February 13, 2023, 5:44am

I did not find multi-GPU usage in the Dataset Do you mean Data Parallel code? I am using simple example to implement DataParallel in my code like below link

https://github.com/chi0tzp/pytorch-dataparallel-example/blob/master/main.py

# Load doctr model
    model = recognition.__dict__[args.arch](pretrained=args.pretrained, vocab=vocab)

    # Resume weights
    if isinstance(args.resume, str):
        print(f"Resuming {args.resume}")
        checkpoint = torch.load(args.resume, map_location="cpu")
        model.load_state_dict(checkpoint)

    model = model.to(device)

    if device == 'cuda':
        model = nn.DataParallel(model)
        # model = model.to(device)
        cudnn.benchmark = True

    # Metrics
    val_metric = TextMatch()

    if args.test_only:
        print("Running evaluation")
        val_loss, exact_match, partial_match = evaluate(model, val_loader, batch_transforms, val_metric, amp=args.amp)
        print(f"Validation loss: {val_loss:.6} (Exact: {exact_match:.2%} | Partial: {partial_match:.2%})")
        return

    st = time.time()

ptrblck · February 13, 2023, 8:23pm

No, I did mean the Dataset since you’ve previously mentioned:

Could you point me to the code in docTR which apparently uses different logic for single- vs. multi-GPU usage, as I still doubt this is the case.

Khawar_Islam · February 14, 2023, 12:38am

They are utilizing single GPU and i just would like to add DataParallel to utilize multiple GPUs
Line 240 to 252

https://github.com/mindee/doctr/blob/main/references/recognition/train_pytorch.py

docTR Code


 # GPU
    if isinstance(args.device, int):
        if not torch.cuda.is_available():
            raise AssertionError("PyTorch cannot access your GPU. Please investigate!")
        if args.device >= torch.cuda.device_count():
            raise ValueError("Invalid device index")
    # Silent default switch to GPU if available
    elif torch.cuda.is_available():
        args.device = 0
    else:
        logging.warning("No accessible GPU, targe device set to CPU.")
    if torch.cuda.is_available():
        torch.cuda.set_device(args.device)
        model = model.cuda()