How to increase dropout rate during training?

I read that co-adaptation occurs after time and it makes no sense to have a full effect at the beginning of training.
How to test it?

You could change the p attribute, if you’ve created an nn.Dropout module:

# Train with your initial dropout

# Change to new value and continue training
model.drop.p = 0.1

or you could alternatively use the functional API and pass p into forward:

def forward(self, x, p):
    x = F.dropout(x, p=p,

Don’t forget to use the attribute from the parent model in the functional call. Otherwise you won’t disable the dropout call after calling model.eval().


Im using nn dropout.
What should be best way to replace all dropouts if i have a lot of modules inside modules?
Some recursion?

You could iterate all submodules, check if the current module is an nn.Dropout layer via isinstance, and set p accordingly.
The cleanest way would probably be to write a custom function which is similar to a weight_init method and call it via model.apply.

def set_dropout(model, drop_rate=0.1):
    for name, child in model.named_children():
        if isinstance(child, torch.nn.Dropout):
            child.p = drop_rate
        set_dropout(child, drop_rate=drop_rate)
set_dropout(model, drop_rate=0.2)

Like this?

Got error with this function, any advises?

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [64, 512, 1138]], which is output 0 of TanhBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Did you change anything besides the drop probability?
Could you post a code snippet, which yields this error?

non, did not change anything
Maybe it is because nvidia apex

model = load_model(hparams)

    optimizer = Ranger(model.parameters(), lr=hparams.learning_rate)

    criterion = Tacotron2Loss()

    logger = prepare_directories_and_logger(
        output_directory, log_directory, rank)

    train_loader, valset, collate_fn = prepare_dataloaders(hparams)

    iteration = 0
    epoch_offset = 0

    if hparams.fp16_run:
        from apex import amp
        model, optimizer = amp.initialize(model, optimizer, opt_level='O2')
    # Load checkpoint if one exists
    if os.path.isfile(checkpoint_path):
        model, optimizer, iteration = load_checkpoint(
            checkpoint_path, model, optimizer)
        iteration += 1  # next iteration is iteration + 1
        epoch_offset = max(0, int(iteration / len(train_loader)))
        if hparams.fp16_run:
    elif os.path.isfile(checkpoint_path_vanilla):
        model = warm_start_model(
            checkpoint_path_vanilla, model, hparams.ignore_layers)

    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
        optimizer, hparams.epochs - hparams.epochs * hparams.epochs_annealing, eta_min=1e-6)

    if hparams.distributed_run:
        model = apply_gradient_allreduce(model)

    is_overflow = False
    # ================ MAIN TRAINNIG LOOP! ===================
    for epoch in range(epoch_offset, hparams.epochs):
        def set_dropout(model, drop_rate=0.1):
            for name, child in model.named_children():
                if isinstance(child, torch.nn.Dropout):
                    child.p = drop_rate
                set_dropout(child, drop_rate=drop_rate)
        if epoch <= 50:
            set_dropout(model, drop_rate=epoch / 100)
        print("Epoch: {}".format(epoch))
        start_epoch = time.perf_counter()
        for i, batch in enumerate(train_loader):
            start = time.perf_counter()
            x, y = model.parse_batch(batch)
            #y_pred = model(x)

            loss = criterion(model(x), y, x[-1])

            if hparams.distributed_run:
                reduced_loss = reduce_tensor(, n_gpus).item()
                reduced_loss = loss.item()
            if hparams.fp16_run:
                with amp.scale_loss(loss, optimizer) as scaled_loss:

            if hparams.fp16_run:
                grad_norm = torch.nn.utils.clip_grad_norm_(
                    amp.master_params(optimizer), hparams.grad_clip_thresh)
                is_overflow = math.isnan(grad_norm)
                grad_norm = torch.nn.utils.clip_grad_norm_(
                    model.parameters(), hparams.grad_clip_thresh)


Might be. Could you isolate the issue and if possible post a code snippet to reproduce the issue?
I would start by disabling everything “additional” , i.e. apex, your dropout manipulations, data loading etc.

It trecked it to tanh function and residual connection
Works fine with another function.