I read that co-adaptation occurs after time and it makes no sense to have a full effect at the beginning of training.
How to test it?
You could change the p
attribute, if you’ve created an nn.Dropout
module:
# Train with your initial dropout
...
# Change to new value and continue training
model.drop.p = 0.1
or you could alternatively use the functional API and pass p
into forward
:
def forward(self, x, p):
...
x = F.dropout(x, p=p, training=self.training)
Don’t forget to use the self.training
attribute from the parent model in the functional call. Otherwise you won’t disable the dropout call after calling model.eval()
.
Im using nn dropout.
What should be best way to replace all dropouts if i have a lot of modules inside modules?
Some recursion?
You could iterate all submodules, check if the current module is an nn.Dropout
layer via isinstance
, and set p
accordingly.
The cleanest way would probably be to write a custom function which is similar to a weight_init
method and call it via model.apply
.
def set_dropout(model, drop_rate=0.1):
for name, child in model.named_children():
if isinstance(child, torch.nn.Dropout):
child.p = drop_rate
set_dropout(child, drop_rate=drop_rate)
set_dropout(model, drop_rate=0.2)
Like this?
Got error with this function, any advises?
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [64, 512, 1138]], which is output 0 of TanhBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Did you change anything besides the drop probability?
Could you post a code snippet, which yields this error?
non, did not change anything
Maybe it is because nvidia apex
model = load_model(hparams)
optimizer = Ranger(model.parameters(), lr=hparams.learning_rate)
criterion = Tacotron2Loss()
logger = prepare_directories_and_logger(
output_directory, log_directory, rank)
train_loader, valset, collate_fn = prepare_dataloaders(hparams)
iteration = 0
epoch_offset = 0
if hparams.fp16_run:
from apex import amp
model, optimizer = amp.initialize(model, optimizer, opt_level='O2')
# Load checkpoint if one exists
if os.path.isfile(checkpoint_path):
model, optimizer, iteration = load_checkpoint(
checkpoint_path, model, optimizer)
iteration += 1 # next iteration is iteration + 1
epoch_offset = max(0, int(iteration / len(train_loader)))
if hparams.fp16_run:
amp.load_state_dict(torch.load(
checkpoint_path)['amp'])
elif os.path.isfile(checkpoint_path_vanilla):
model = warm_start_model(
checkpoint_path_vanilla, model, hparams.ignore_layers)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
optimizer, hparams.epochs - hparams.epochs * hparams.epochs_annealing, eta_min=1e-6)
if hparams.distributed_run:
model = apply_gradient_allreduce(model)
model.train()
is_overflow = False
# ================ MAIN TRAINNIG LOOP! ===================
for epoch in range(epoch_offset, hparams.epochs):
def set_dropout(model, drop_rate=0.1):
for name, child in model.named_children():
if isinstance(child, torch.nn.Dropout):
child.p = drop_rate
set_dropout(child, drop_rate=drop_rate)
if epoch <= 50:
set_dropout(model, drop_rate=epoch / 100)
print("Epoch: {}".format(epoch))
start_epoch = time.perf_counter()
for i, batch in enumerate(train_loader):
start = time.perf_counter()
model.zero_grad()
x, y = model.parse_batch(batch)
#y_pred = model(x)
loss = criterion(model(x), y, x[-1])
if hparams.distributed_run:
reduced_loss = reduce_tensor(loss.data, n_gpus).item()
else:
reduced_loss = loss.item()
if hparams.fp16_run:
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
else:
loss.backward()
if hparams.fp16_run:
grad_norm = torch.nn.utils.clip_grad_norm_(
amp.master_params(optimizer), hparams.grad_clip_thresh)
is_overflow = math.isnan(grad_norm)
else:
grad_norm = torch.nn.utils.clip_grad_norm_(
model.parameters(), hparams.grad_clip_thresh)
optimizer.step()
Might be. Could you isolate the issue and if possible post a code snippet to reproduce the issue?
I would start by disabling everything “additional” , i.e. apex
, your dropout manipulations, data loading etc.
It trecked it to tanh function and residual connection
https://colab.research.google.com/drive/1YjqlhWjjTQffANSGvOKT3yfFgIO1NxEp
Works fine with another function.
Have been relying on your advice for school projects for quite some time. Just wanna create an account to thank you
for idx, m in enumerate(model.named_modules()):
path = m[0]
component = m[1]
if isinstance(component, nn.Dropout):
component.p = 0