[Solved] Reverse gradients in backward pass

BartolomeD · June 11, 2017, 12:42pm

Hi Alain,

With the help of Marcin’s awesome solution (thanks!) I have been able to reproduce results from Bousmalis et al. (2016). I get 72% (77% in Bousmalis) on USPS with an MNIST-trained classifier, and then 86% (85% in Bousmalis) on USPS using the DANN with Marcin’s reversal layer. I had to adjust the code provided by Marcin a bit to make it work. I am using PyTorch version 0.1.12_2, so maybe that had to do with it. This did the trick:

class GradReverse(Function):
    def forward(self, x):
        return x.view_as(x)

    def backward(self, grad_output):
        return (grad_output * -lambd)

def grad_reverse(x):
    return GradReverse()(x)

class domain_classifier(nn.Module):
    def __init__(self):
        super(domain_classifier, self).__init__()
        self.fc1 = nn.Linear(1200, 100) 
        self.fc2 = nn.Linear(100, 1)
        self.drop = nn.Dropout2d(0.25)

    def forward(self, x):
        x = grad_reverse(x)
        x = F.leaky_relu(self.drop(self.fc1(x)))
        x = self.fc2(x)
        return F.sigmoid(x)

To train the model, standard PyTorch rules apply obviously. I did implement some learning rate and lambda parameter adjustments as proposed in the paper (Ganin et al., 2016). Here’s the code:

for i in range(num_epochs):
    source_gen = batch_gen(source_batches, source_idx, Xs_train, ys_train)
    target_gen = batch_gen(target_batches, target_idx, Xt_train, None)

    # iterate over batches
    for (xs, ys) in source_gen:
        
        # update lambda and learning rate as suggested in the paper
        p = float(j) / num_steps
        lambd = 2. / (1. + np.exp(-10. * p)) - 1
        lr = 0.01 / (1. + 10 * p)**0.75
        d_optimizer.lr = lr
        c_optimizer.lr = lr
        f_optimizer.lr = lr
        
        # exit if batch size incorrect, get next target batch
        if len(xs) != batch_size / 2:
            continue
        xt = next(target_gen)
        
        # concatenate source and target batch
        x = torch.cat([xs, xt], 0)
        
        # 1) train feature_extractor and class_classifier on source batch
        # reset gradients
        f_ext.zero_grad()
        c_clf.zero_grad()
        
        # calculate class_classifier predictions on batch xs
        c_out = c_clf(f_ext(xs).view(batch_size // 2, -1))
        
        # optimize feature_extractor and class_classifier on output
        f_c_loss = c_crit(c_out, ys.float())
        f_c_loss.backward(retain_variables = True)
        c_optimizer.step()
        f_optimizer.step()
        
        # 2) train feature_extractor and domain_classifier on full batch x
        # reset gradients
        f_ext.zero_grad()
        d_clf.zero_grad()
        
        # calculate domain_classifier predictions on batch x
        d_out = d_clf(f_ext(x).view(batch_size, -1))
        
        # optimize feature_extractor and domain_classifier with output
        f_d_loss = d_crit(d_out, yd.float())
        f_d_loss.backward(retain_variables = True)
        d_optimizer.step()
        f_optimizer.step()

Thanks again, Marcin, for your solution.
And Alain, I hope this helps you build the model.

Daniel