Boosting implementation backs through graph a second time

I have been trying to implement an adaboost implemenation for training a simple multi-layer perceptron (in the code below, 10 models).

The model loop executes fine the first time, but in the second loop after the first epoch an errror is thrown ;

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

The problem seems to come from the line:

w_dp = rboost_nn.do_reweighting(w_dp)

which is the method required to re-weight the datapoints to modify the loss of the next model:

    def do_reweighting(self,dpw):

        new_dpw = torch.zeros_like(dpw,requires_grad=False)
        prev_batch_size = 0

        for idx,batch in enumerate(train_loader):
            
            base = idx*prev_batch_size
            end = base + len(batch)
            prev_batch_size = len(batch)

            data_batch = batch[:,:32]
            classes_batch = batch[:,-1]
            signed_classes = torch.sign(classes_batch-0.5)

            rboost_result = self.sign_forward(data_batch.float(),0.5)
            exponent = (-alpha*(torch.unsqueeze(signed_classes,1))) * rboost_result
            new_dpw[base:end] = torch.squeeze(torch.unsqueeze(dpw[base:end],1) * (torch.exp(exponent)),1)
            
        total_weight = new_dpw.sum()
        new_dpw /= total_weight
                        
        return new_dpw

the training loop here:

loss_scaling = 100000


for modidx, models in enumerate(range(10)):
    
    w_dp = torch.ones(len(train_loader.dataset.data),requires_grad=False) / float(len(train_loader.dataset.data))
    
    rboost_nn = fcnn(32,1024,256,1)
    rboost_criterion = nn.BCELoss(reduction='none')
    rboost_optimiser = torch.optim.RMSprop(rboost_nn.parameters(),lr=5e-4)

    rboost_loss_list = []
    ctrl_loss_list = []
    
    for epoch in range(epochs):
        for idx,batch in enumerate(train_loader):

            base = idx*len(batch)
            end = base + len(batch)

            data_batch = batch[:,:32]
            classes_batch = batch[:,-1]

            rboost_result = rboost_nn.forward(data_batch.float())
            rboost_loss = rboost_criterion(rboost_result.float(),torch.unsqueeze(classes_batch,1).float())
            rboost_loss *= torch.unsqueeze(w_dp,1)[base:end]
            rboost_loss = rboost_loss.mean()
            rboost_loss *= loss_scaling
            rboost_optimiser.zero_grad()
            rboost_loss.backward()
            rboost_optimiser.step()
    
    eps = rboost_nn.calc_eps(train_loader,w_dp)
    alpha = rboost_nn.calc_alpha()
    w_dp = rboost_nn.do_reweighting(w_dp)

Much appreciated is someone can help me understand where the error is coming from and how it can be resolved :slight_smile:

I’m not seeing any backward operation in do_reweighting. Are you sure this method creates the issue?

I think it is here since if I comment this part out, the training finishes without problem.

For the training of first model there is no error. It is only when we come to train the second model with the adjusted data point weights that the error comes. I wondered if that is because the re-weighted loss is attached to the previous model.

Thanks

So i finally found a solution to this. Maybe it is because there is another problem but if I detach the data point weights w_dp after do_reweighting(w_dp) there I don’t get the error when training the second model using the new data point weights.

    w_dp = rboost_nn.do_reweighting(w_dp) 
    w_dp.detach()