Updating tensors that are used in backpropagation but are not network parameters

I am currently trying to reimplement “Deep SVDD” for my own project.

During training, I use my network to compute the outputs as usual. To compute the loss, I then use the hypersphere center c and radius R as follows:

outputs = model(data)
dist = torch.sum((outputs - self.c) ** 2, dim=1)
scores = dist - self.R ** 2
loss = self.R ** 2 + (1 / self.nu) * torch.mean(torch.max(torch.zeros_like(scores), scores))
loss.backward()
optimizer.step()

After backpropagating the loss and updating the parameters, I now want to update the radius R. In the original implementation this is done by

self.R.data = torch.tensor(get_radius(dist, self.nu), device=self.device)

where the get_radius function is defined as follows:

def get_radius(dist: torch.Tensor, nu: float):
    return np.quantile(np.sqrt(dist.clone().data.cpu().numpy()), 1 - nu)

After updating the radius, the processing of the current training batch is done.

Now I am wondering whether this is the proper way to do this. I know that the use of .data can be problematic and can cause issues during backpropagation. I have been reading around about it but I can’t quite wrap my head around it and how to fix it. Would it cause issues in the original implementation and how should I implement it instead to avoid any issues and why?

More generally, my question is how to properly update the radius in my implementation without causing issues with backpropagation or more generally how to update tensors like the radius?

You may only have issues if you update in-place a tensor that is referenced in the graph, between the forward pass and the backward pass. This is because a reference to this tensor may be saved in the computation graph, so changing the value stored in the tensor will also change the result of the backpropagation. Since you’re updating self.R after the graph has been used (after loss.backward()), what you’re doing should never cause issues. However, it’s cleaner to just assign to self.R directly rather than self.R.data (this way you avoid making an in-place update altogether).

By the way, since your radius should not require grad, you could compute it in a torch.no_grad() context just to be sure that no operation constructing a graph will be executed (which would waste time), and that the obtained tensor does not require grad (which would make the backpropagation waste time at the next iteration).

So your current implementation should be fine, but a maybe safer approach would be:

with torch.no_grad():
    self.R = torch.tensor(get_radius(dist, self.nu), device=self.device)
1 Like

Hi, thank you a lot. That really cleared up my confusion!

1 Like