Opacus: training MONAI U-Net on a MSD dataset

Hello!

I’m new to Opacus, and wondered whether I could get any advice as to how to improve performance for my first attempt to use the library. I’m training a residual UNet network implemented using MONAI on the spleen dataset of the Medical Segmentation Decathlon (http://medicaldecathlon.com/). This dataset has very few images (~30 or so for training). Using MONAI’s guidelines ( tutorials/3d_segmentation/spleen_segmentation_3d_visualization_basic.ipynb at main · Project-MONAI/tutorials · GitHub ) I managed to obtain very good Dice scores after ~100 epochs, with reasonable performance happening after 30 epochs or so. From this baseline, I incorporated Opacus:

            self.model, self.optimizer, self._train_loader = self.privacy_engine.make_private_with_epsilon(
                module=self.model,
                optimizer=self.optimizer,
                data_loader=self._train_loader,
                epochs=self._epochs,
                target_epsilon=self._target_epsilon,
                target_delta=self.get_target_delta(len(self._train_dataset)),
                max_grad_norm=self._max_grad_norm,
            )

I’ve varied the DP hyperparameters without any improvement:

  • Learning rate: originally 1e-4 (I’ve tried 1e-3 and 1e-5)
  • Maximum Grad Norm: 2 to 5
  • Delta: I’ve set it to 1e-4 (normally, it would be much higher cause the dataset is small, but I’ve read it’s not common to have a delta value>1e-4).
  • Target epislon: oscillating between 5 and 15 (I’m currently using 15)

I’m using the BatchMemoryManager as well on my MONAI Dataloader (which operates in 96,96,96 patches).

So far, my loss does not vary much after ~100 epochs: it oscillates between 0.71 and 0.68, with the Dice stagnating as well (the loss is almost always the same for the validation, I would suspect there was a bug in the loop if the no-DP version did not work using the same code). The epsilon value augments from 0.94 to 8.

Is there anything in terms of hyperparameter tuning I should be taking into account?

On the other hand, I am also having errors with the BatchMemoryManager loader, where I get this error:

TypeError: zeros() received an invalid combination of arguments - got (tuple, dtype=type), but expected one of:
 * (tuple of ints size, *, tuple of names names, torch.dtype dtype = None, torch.layout layout = None, torch.device device = None, bool pin_memory = False, bool requires_grad = False)
 * (tuple of ints size, *, Tensor out = None, torch.dtype dtype = None, torch.layout layout = None, torch.device device = None, bool pin_memory = False, bool requires_grad = False)

I think the reason is that the batch size, using the DP engine, varies (compared to the non-DP version), and perhaps comes an iteration where the batch dim is 0 for some reason. I’m not sure why this is happening. Is there any recorded incompatibility between MONAI transforms and the Opacus library?

Many thanks for any advice!

Virginia