SubsetRandomSampler for every epoch

There is CNN now imagenet pre-trained. I want to do some fine-tuning (re-training) on CNN. I do not think I need to run the imagenet training set at all, so I want to train every 100,000 samples per epoch. If I implement the following, is 100,000 samples sampled every epoch differently?

...

# Weight training
def weight_train(epoch):
    print('\nWeight Training Epoch: %d' % epoch)

    train_sampler = torch.utils.data.sampler.SubsetRandomSampler(indices[:split])
    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=args.bs, sampler=train_sampler, num_workers=4, drop_last=False, pin_memory=True)

    batch_time = AverageMeter()
    data_time = AverageMeter()
    losses = AverageMeter()
    top1 = AverageMeter()
    top5 = AverageMeter()

    # switch to train model
    model.train()

    end = time.time()
    for batch_idx, (inputs, targets) in enumerate(train_loader):
        # measure data loading time
        data_time.update(time.time() - end)

        if use_cuda is not None:
            inputs, targets = inputs.cuda(), targets.cuda()

        # compute output
        outputs = model(inputs)
        loss = criterion(outputs, targets)

        # measure accuracy and record loss
        prec1, prec5 = accuracy(outputs, targets, topk=(1, 5))
        losses.update(loss.item(), inputs.size(0))
        top1.update(prec1[0], inputs.size(0))
        top5.update(prec5[0], inputs.size(0))

        # compute gradient and do SGD step
        weight_optimizer.zero_grad()
        loss.backward()
        weight_optimizer.step()

        # measure elapsed time
        batch_time.update(time.time() - end)
        end = time.time()

        progress_bar(batch_idx, len(train_loader), 
                  'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
                  'Data {data_time.val:.3f} ({data_time.avg:.3f})\t'
                  'Loss {loss.val:.4f} ({loss.avg:.4f})\t'
                  'Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t'
                  'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format(
                   batch_time=batch_time,
                   data_time=data_time, loss=losses, top1=top1, top5=top5))

...

for epoch in range(0,args.ne):
    weight_train(epoch)

I’m not sure how indices is defined, but I assume all sample indices are stored there.
If so, you should slice them differently in each epoch. Currently you are just using the same indices in each epoch:

indices[:split]

Probably something like this should work in your use case:

for epoch in range(epochs-1):
    train_size = 1000
    idx = indices[epoch*train_size:(epoch+1)*train_size)
    train_Sample = torch.utils.data.sampler.SubsetRandomSampler(idx)

I defined indices like this.

n_train = 10000 # which is smaller than total dataset size (1.28M images)
split = n_train // 2
indices = list(range(n_train))

So in that case split will be a constant value of 5000, which will always yield indices[:5000].

I got it.

Then how about using torch.randint for every epoch??

Should be alright to use. Alternatively, I would just shuffle all sample indices and use a windowed approach as shown in my example.

  1. How can i shuffle index?

  2. How about RandomSampler every epoch? Using num_samples=100000