Data format for Input to Opacus

I am trying to use Opacus to train distilgpt2 on my data with DP-SGD. I have my data loaded and it has the below format.
datasets.Dataset({
features: [‘input_ids’, ‘attention_mask’, ‘labels’],
num_rows: 139
})

I am struggling to convert it into the required format for this code to work.
data_loader = DataLoader(data, batch_size=2)
privacy_engine = PrivacyEngine()
model, optimizer, train_data = privacy_engine.make_private(
module=model,
optimizer=optimizer,
data_loader=data_loader,
noise_multiplier=1.0,
max_grad_norm=1.0
)

I am able to train distilgpt2 without Opacus. I am new to this and any help would be appreciated.

1 Like

I figured out the training part using a custom training loop instead of using Trainer. I am having issues with generation as I get the below error.
AttributeError: ‘GradSampleModule’ object has no attribute ‘generate’
So, do I having to implement my own generate method?