Train model with use subset of data for subset of epoch

test_project · December 29, 2022, 3:17am

I’m trying to train my model with subset of data for each 10 epoch and I write this code but unfortunately it doesn’t work how can I change this code?
Num_data=25000
Num_epoch=50000
Num_sub_data=1000


class DataModule(pl.LightningDataModule):

  def __init__(self, train_dataset):

    super(DataModule, self).__init__()
    self.train_dataset = train_dataset
  def train_dataloader(self):
    return DataLoader(self.train_dataset,2, shuffle = True)

Num_data=25000
Num_epoch=50000
Num_sub_epoch=10
Num_sub_data=1000
for i in range(int(Num_epoch/Num_sub_epoch)):
  i=np.random.choice(datamodule.train_dataloader(),Num_sub_data)
  x=x[i]
  y=y[i]
  for _ in range(Num_sub_epoch):
    trainer.fit(model, datamodule)

That error is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
mtrand.pyx in numpy.random.mtrand.RandomState.choice()

TypeError: 'DataLoader' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-52-0b8908d20663> in <module>
      2 ne=10
      3 for i in range(int(epochs/ne)):
----> 4   i=np.random.choice(datamodule.train_dataloader(),10)
      5   x=x[i]
      6   y=y[i]

mtrand.pyx in numpy.random.mtrand.RandomState.choice()

ValueError: a must be 1-dimensional or an integer

If this code is error how can I change dataloder to could be use only 1000 data for each 10 epoch?

nivek · January 3, 2023, 9:16pm

You can have an inner for loop that get samples from DataLoader up to the desired number while enabling shuffling to get different samples during each epoch.