A100 training slower than V100

ashu1807 · December 10, 2021, 11:30am

I have moved my model from V100 to A100 and instead of seeing an increase in speed these has been a significant slowdown from 14.2 it/sec to 10.06 it/sec.
cuda version 11.3
Pytorch version 1.9.0+cu111
I have been specifically using the code from GitHub repository (NLSPN)

There is a apex dependency in the repository which I thought to be the issue, but removing it and using a single GPU training also suffers from this issue.

ptrblck · December 10, 2021, 8:27pm

The repository depends on Deformable-Convolution-V2-PyTorch, which seems to have been written ~3 years ago. Are you also seeing a slowdown without these custom layers or did you profile the model to see which operations are the bottleneck?

ashu1807 · December 11, 2021, 3:54am

Even after removing these deformable convolution layer there is a speed drop from 16.31it/s to 11.99 it/s.

ptrblck · December 11, 2021, 6:49am

Thanks for the update! Could you post the code to initialize the model as well as the input shapes you are using, please?

ashu1807 · December 11, 2021, 6:06pm

dist.init_process_group(backend='nccl', init_method='env://',
                            world_size=args.num_gpus, rank=gpu)
    torch.cuda.set_device(gpu)

    # Prepare dataset
    data = get_data(args)

    data_train = data(args, 'train')
    data_val = data(args, 'val')

    sampler_train = DistributedSampler(
        data_train, num_replicas=args.num_gpus, rank=gpu)
    sampler_val = DistributedSampler(
        data_val, num_replicas=args.num_gpus, rank=gpu)

    batch_size = args.batch_size // args.num_gpus

    loader_train = DataLoader(
        dataset=data_train, batch_size=batch_size, shuffle=False,
        num_workers=args.num_threads, pin_memory=True, sampler=sampler_train,
        drop_last=True)
    loader_val = DataLoader(
        dataset=data_val, batch_size=1, shuffle=False,
        num_workers=args.num_threads, pin_memory=True, sampler=sampler_val,
        drop_last=False)

    # Network
    model = get_model(args)
    net = model(args)
    net.cuda(gpu)

    if gpu == 0:
        if args.pretrain is not None:
            assert os.path.exists(args.pretrain), \
                "file not found: {}".format(args.pretrain)

            checkpoint = torch.load(args.pretrain)
            net.load_state_dict(checkpoint['net'])

            print('Load network parameters from : {}'.format(args.pretrain))

    # Loss
    loss = get_loss(args)
    loss = loss(args)
    loss.cuda(gpu)

    # Optimizer
    optimizer, scheduler = utility.make_optimizer_scheduler(args, net)

    net = apex.parallel.convert_syncbn_model(net)
    net, optimizer = amp.initialize(net, optimizer, , opt_level=args.opt_level,
                                    verbosity=0)
   net = DDP(net)
   for epoch in range(1, args.epochs+1):
       for batch, sample in enumerate(loader_train):
            sample = {key: val.cuda(gpu) for key, val in sample.items()
                      if val is not None}

            if epoch == 1 and args.warm_up:
                warm_up_cnt += 1

                for param_group in optimizer.param_groups:
                    lr_warm_up = param_group['initial_lr'] \
                                 * warm_up_cnt / warm_up_max_cnt
                    param_group['lr'] = lr_warm_up

            optimizer.zero_grad()

            output = net(sample)

The code is taken from src/main.py from the NLSPN repository. There are two inputs Rgb torch.Size([24, 3, 224, 304]) and lidar torch.Size([24, 1, 224, 304]). I have even tried to remove the apex dependency. That does not cause any issues with respect to the slowdown time.

ashu1807 · December 14, 2021, 7:41am

Any idea what could be issue?

my3bikaht · December 14, 2021, 10:52am

In the topic right below this one (Can not attain better performances after changing nvidia GPU - #8 by tjk) we found out that enabling cudnn benchmark can lead to a better perfomance on A5000.

zimian_wei · January 1, 2022, 10:56am

Did you find out what the issue is? I also encountered this problem.

ashu1807 · January 2, 2022, 6:20pm

No. @my3bikaht , it did not work. @zimian_wei , did not find any solution.

zimian_wei · January 3, 2022, 7:11am

I found that moving data to SSD helps.