Privacy accounting based on DPDataLoader

ShadiRahimian · September 9, 2024, 2:41pm

Dear Opacus community,

I have implemented the DP-SGD algorithm myself, by first clipping the per-sample gradients and noising the batch. I am combining it with the opacus.data_loader.DPDataLoader class to be sure that the sampling for the boosting is being done correctly.

When I use the DPDataLoader, in my training loop, for each epoch I see different batch_sizes being selected (as expected) and the “epoch” continues until the sum of these batches is a number close to the size of the training dataset.
Now my question is if I want to calculate the privacy budget using the RDP as implemented here, is the parameter “epoch” the same as the usual epochs of my training loop that in expectation parses through as many data points as the training dataset?

My code looks something like this:

train_dataloader = DataLoader(torch.from_numpy(d2[:2500, :]).type(torch.LongTensor), batch_size=batch_size, shuffle=True, num_workers=4)

train_dataloader = DPDataLoader.from_data_loader(train_dataloader, distributed=False)

for epoch in range(num_epochs):
	train_acc = 0
	train_loss = 0
	num_samples = 0
	sample_rate = 1 / len(data_loader)
	expected_batch_size = int(len(data_loader.dataset) * sample_rate)
	self.model.train()
	print_interval = 100
	params = {k: v.detach() for k, v in self.model.named_parameters()}
	buffers = {k: v.detach() for k, v in self.model.named_buffers()}
	ft_compute_grad = grad(self.compute_loss, argnums=(0, 1), has_aux=True)
	ft_compute_sample_grad = vmap(ft_compute_grad, in_dims=(None, None, 0))
	for idx, batch in enumerate(tqdm(data_loader)):
		batch_size = len(batch)
		size += batch_size
		if batch_size > 40 :
			continue
		ft_per_sample_grads, loss = ft_compute_sample_grad(params, buffers, batch)
		loss = loss.mean()
		ft_per_sample_grads = ft_per_sample_grads[0]
		for key in ft_per_sample_grads:
			for i in range(batch_size):
				utils.norm_clipper(ft_per_sample_grads[key][i], max_norm=max_norm) #per-sample norm clipping in-place
			ft_per_sample_grads[key] = ft_per_sample_grads[key].sum(0) / expected_batch_size #sum up over batch
			rand = torch.zeros_like(ft_per_sample_grads[key])
			rand.normal_(mean=0, std=noise_multiplier*max_norm/expected_batch_size) #! consider the size of lot in std of Gaussian
			ft_per_sample_grads[key].add_(rand) #add in-place random noise to averaged grads

		for key, p in zip(ft_per_sample_grads, self.model.parameters()):
			p.grad = ft_per_sample_grads[key]
		num_samples += batch_size
		self.optimizer.step()
		self.optimizer.zero_grad()
		train_loss += loss.cpu().data.numpy().item() * batch_size
		if idx % print_interval == 0:
			print(loss.item())
	train_loss /= num_samples
	train_acc /= num_samples
	print(f"total size: {num_samples}")

HuanyuZhang · September 12, 2024, 1:35pm

Yes. It is the same. Opacus and DP-SGD by default uses Poisson subsampling so it is expected to see variable batch size.