Dear Opacus community,
I have implemented the DP-SGD algorithm myself, by first clipping the per-sample gradients and noising the batch. I am combining it with the opacus.data_loader.DPDataLoader class to be sure that the sampling for the boosting is being done correctly.
When I use the DPDataLoader, in my training loop, for each epoch I see different batch_sizes being selected (as expected) and the “epoch” continues until the sum of these batches is a number close to the size of the training dataset.
Now my question is if I want to calculate the privacy budget using the RDP as implemented here, is the parameter “epoch” the same as the usual epochs of my training loop that in expectation parses through as many data points as the training dataset?
My code looks something like this:
train_dataloader = DataLoader(torch.from_numpy(d2[:2500, :]).type(torch.LongTensor), batch_size=batch_size, shuffle=True, num_workers=4)
train_dataloader = DPDataLoader.from_data_loader(train_dataloader, distributed=False)
for epoch in range(num_epochs):
train_acc = 0
train_loss = 0
num_samples = 0
sample_rate = 1 / len(data_loader)
expected_batch_size = int(len(data_loader.dataset) * sample_rate)
self.model.train()
print_interval = 100
params = {k: v.detach() for k, v in self.model.named_parameters()}
buffers = {k: v.detach() for k, v in self.model.named_buffers()}
ft_compute_grad = grad(self.compute_loss, argnums=(0, 1), has_aux=True)
ft_compute_sample_grad = vmap(ft_compute_grad, in_dims=(None, None, 0))
for idx, batch in enumerate(tqdm(data_loader)):
batch_size = len(batch)
size += batch_size
if batch_size > 40 :
continue
ft_per_sample_grads, loss = ft_compute_sample_grad(params, buffers, batch)
loss = loss.mean()
ft_per_sample_grads = ft_per_sample_grads[0]
for key in ft_per_sample_grads:
for i in range(batch_size):
utils.norm_clipper(ft_per_sample_grads[key][i], max_norm=max_norm) #per-sample norm clipping in-place
ft_per_sample_grads[key] = ft_per_sample_grads[key].sum(0) / expected_batch_size #sum up over batch
rand = torch.zeros_like(ft_per_sample_grads[key])
rand.normal_(mean=0, std=noise_multiplier*max_norm/expected_batch_size) #! consider the size of lot in std of Gaussian
ft_per_sample_grads[key].add_(rand) #add in-place random noise to averaged grads
for key, p in zip(ft_per_sample_grads, self.model.parameters()):
p.grad = ft_per_sample_grads[key]
num_samples += batch_size
self.optimizer.step()
self.optimizer.zero_grad()
train_loss += loss.cpu().data.numpy().item() * batch_size
if idx % print_interval == 0:
print(loss.item())
train_loss /= num_samples
train_acc /= num_samples
print(f"total size: {num_samples}")