How to increase dropout rate during training?

I read that co-adaptation occurs after time and it makes no sense to have a full effect at the beginning of training.
How to test it?

You could change the p attribute, if you’ve created an nn.Dropout module:

# Train with your initial dropout
...

# Change to new value and continue training
model.drop.p = 0.1

or you could alternatively use the functional API and pass p into forward:

def forward(self, x, p):
    ...
    x = F.dropout(x, p=p, training=self.training)

Don’t forget to use the self.training attribute from the parent model in the functional call. Otherwise you won’t disable the dropout call after calling model.eval().

5 Likes

Im using nn dropout.
What should be best way to replace all dropouts if i have a lot of modules inside modules?
Some recursion?

You could iterate all submodules, check if the current module is an nn.Dropout layer via isinstance, and set p accordingly.
The cleanest way would probably be to write a custom function which is similar to a weight_init method and call it via model.apply.

def set_dropout(model, drop_rate=0.1):
    for name, child in model.named_children():
        if isinstance(child, torch.nn.Dropout):
            child.p = drop_rate
        set_dropout(child, drop_rate=drop_rate)
set_dropout(model, drop_rate=0.2)

Like this?

Got error with this function, any advises?

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [64, 512, 1138]], which is output 0 of TanhBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Did you change anything besides the drop probability?
Could you post a code snippet, which yields this error?

non, did not change anything
Maybe it is because nvidia apex

model = load_model(hparams)

    optimizer = Ranger(model.parameters(), lr=hparams.learning_rate)

    criterion = Tacotron2Loss()

    logger = prepare_directories_and_logger(
        output_directory, log_directory, rank)

    train_loader, valset, collate_fn = prepare_dataloaders(hparams)

    iteration = 0
    epoch_offset = 0

    if hparams.fp16_run:
        from apex import amp
        model, optimizer = amp.initialize(model, optimizer, opt_level='O2')
    # Load checkpoint if one exists
    if os.path.isfile(checkpoint_path):
        model, optimizer, iteration = load_checkpoint(
            checkpoint_path, model, optimizer)
        iteration += 1  # next iteration is iteration + 1
        epoch_offset = max(0, int(iteration / len(train_loader)))
        if hparams.fp16_run:
            amp.load_state_dict(torch.load(
                checkpoint_path)['amp'])
    elif os.path.isfile(checkpoint_path_vanilla):
        model = warm_start_model(
            checkpoint_path_vanilla, model, hparams.ignore_layers)

    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
        optimizer, hparams.epochs - hparams.epochs * hparams.epochs_annealing, eta_min=1e-6)

    if hparams.distributed_run:
        model = apply_gradient_allreduce(model)

    model.train()
    is_overflow = False
    # ================ MAIN TRAINNIG LOOP! ===================
    for epoch in range(epoch_offset, hparams.epochs):
        def set_dropout(model, drop_rate=0.1):
            for name, child in model.named_children():
                if isinstance(child, torch.nn.Dropout):
                    child.p = drop_rate
                set_dropout(child, drop_rate=drop_rate)
        if epoch <= 50:
            set_dropout(model, drop_rate=epoch / 100)
        print("Epoch: {}".format(epoch))
        start_epoch = time.perf_counter()
        for i, batch in enumerate(train_loader):
            start = time.perf_counter()
            model.zero_grad()
            x, y = model.parse_batch(batch)
            #y_pred = model(x)

            loss = criterion(model(x), y, x[-1])

            if hparams.distributed_run:
                reduced_loss = reduce_tensor(loss.data, n_gpus).item()
            else:
                reduced_loss = loss.item()
            if hparams.fp16_run:
                with amp.scale_loss(loss, optimizer) as scaled_loss:
                    scaled_loss.backward()
            else:
                loss.backward()

            if hparams.fp16_run:
                grad_norm = torch.nn.utils.clip_grad_norm_(
                    amp.master_params(optimizer), hparams.grad_clip_thresh)
                is_overflow = math.isnan(grad_norm)
            else:
                grad_norm = torch.nn.utils.clip_grad_norm_(
                    model.parameters(), hparams.grad_clip_thresh)

            optimizer.step()

Might be. Could you isolate the issue and if possible post a code snippet to reproduce the issue?
I would start by disabling everything “additional” , i.e. apex, your dropout manipulations, data loading etc.

It trecked it to tanh function and residual connection
https://colab.research.google.com/drive/1YjqlhWjjTQffANSGvOKT3yfFgIO1NxEp
Works fine with another function.