I am using above code for training RLHF for answer generation.
learning_rate=1.41e-5
max_ppo_epochs=1
mini_batch_size=4
batch_size=16
config = PPOConfig(
model_name=model_name,
learning_rate=learning_rate,
ppo_epochs=max_ppo_epochs,
mini_batch_size=mini_batch_size,
batch_size=batch_size
)
ppo_trainer = PPOTrainer(config=config,
model=ppo_model,
ref_model=ref_model,
tokenizer=tokenizer,
dataset=dataset_train,
data_collator=collator)
but while running this code I am getting error like
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-41-5fa467fb1768> in <cell line: 14>()
12 )
13
---> 14 ppo_trainer = PPOTrainer(config=config,
15 model=ppo_model,
16 ref_model=ref_model,
3 frames
/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py in __init__(self, data_source, replacement, num_samples, generator)
141
142 if not isinstance(self.num_samples, int) or self.num_samples <= 0:
--> 143 raise ValueError(f"num_samples should be a positive integer value, but got num_samples={self.num_samples}")
144
145 @property
ValueError: num_samples should be a positive integer value, but got num_samples=0
How to solve this error? Thanks in advance.