Converting code for training on multiple gpu to single gpu based training

I recently came across a code that was trained on multiple gpus. However, since i don’t have access to multiple gpu(gcp or aws),my only option is to train on colab.
as such, is it possible to convert a code that was written for training on multiple gpus to single gpu based code.

I know that the standard process for training on multiple gpu people usually change the sampler from random or sequential to distributed data sampler.
But after looking at the code for distributed training, seems there are other nuances involved in it.
For reference here is the link to the repo for distributed training as well as the code section which i believe might need to be changed.

Repository that uses multiple gpu or dsitributed training

https://github.com/DerrickWang005/CRIS.pytorch

Code block which i think might need to be changed,so as to ensure i can train on single gpu

def main_worker(gpu, args):
    args.output_dir = os.path.join(args.output_folder, args.exp_name)

    # local rank & global rank
    args.gpu = gpu
    args.rank = args.rank * args.ngpus_per_node + gpu
    torch.cuda.set_device(args.gpu)

    # logger
    setup_logger(args.output_dir,
                 distributed_rank=args.gpu,
                 filename="train.log",
                 mode="a")

    # dist init
    dist.init_process_group(backend=args.dist_backend,
                            init_method=args.dist_url,
                            world_size=args.world_size,
                            rank=args.rank)

    # wandb
    if args.rank == 0:
        wandb.init(job_type="training",
                   mode="online",
                   config=args,
                   project="CRIS",
                   name=args.exp_name,
                   tags=[args.dataset, args.clip_pretrain])
    dist.barrier()

    # build model
    model, param_list = build_segmenter(args)
    if args.sync_bn:
        model = nn.SyncBatchNorm.convert_sync_batchnorm(model)
    logger.info(model)
    model = nn.parallel.DistributedDataParallel(model.cuda(),
                                                device_ids=[args.gpu],
                                                find_unused_parameters=True)