To me a data-set is a task as I understand it. i.e. we generate 20 samples from a selected distribution (from a fixed function or from a mini-classification task were we sample say 5 random labels).
Let’s take mini-Imagenet for example with meta-batch=1
. It has 64 images for meta-train and each has a total of 600 images. A data-set in the meta-set would be a sample of 5 images from those 64 classes with a sample of 20 actual images from the 600. During one epoch then we create ceil(64/5) of iterations/meta-batches (with the last meta-batch having only 4 classes or we skip it). That is what is seems to be happening in torchmeta, at least thats correct for regression.
I created a task of 100 samples per function with 20 functions and then after 2 batches the function is done (first batch has 16 data-sets the next one 4):
[epoch=0]
0%| | 0/2 [00:00<?, ?it/s]
batch_idx = 0
train_inputs.shape = torch.Size([16, 5, 1])
train_targets.shape = torch.Size([16, 5, 1])
test_inputs.shape = torch.Size([16, 15, 1])
test_targets.shape = torch.Size([16, 15, 1])
batch_idx = 1
train_inputs.shape = torch.Size([4, 5, 1])
train_targets.shape = torch.Size([4, 5, 1])
test_inputs.shape = torch.Size([4, 15, 1])
test_targets.shape = torch.Size([4, 15, 1])
[epoch=1]
50%|█████ | 1/2 [00:00<00:00, 3.48it/s]
batch_idx = 0
train_inputs.shape = torch.Size([16, 5, 1])
train_targets.shape = torch.Size([16, 5, 1])
test_inputs.shape = torch.Size([16, 15, 1])
test_targets.shape = torch.Size([16, 15, 1])
batch_idx = 1
train_inputs.shape = torch.Size([4, 5, 1])
train_targets.shape = torch.Size([4, 5, 1])
test_inputs.shape = torch.Size([4, 15, 1])
test_targets.shape = torch.Size([4, 15, 1])
Done with test! a
import sys; print('Python %s on %s' % (sys.version, sys.platform))
100%|██████████| 2/2 [00:00<00:00, 3.49it/s]
code:
# loop through meta-batches of this data set, print the size, make sure it's the size you exepct
from torchmeta.utils.data import BatchMetaDataLoader
from torchmeta.transforms import ClassSplitter
from torchmeta.toy import Sinusoid
from tqdm import tqdm
dataset = Sinusoid(num_samples_per_task=100, num_tasks=20)
shots, test_shots = 5, 15
# get metaset
metaset = ClassSplitter(
dataset,
num_train_per_class=shots,
num_test_per_class=test_shots,
shuffle=True)
# get meta-dataloader
batch_size = 16
num_workers = 0
meta_dataloader = BatchMetaDataLoader(metaset, batch_size=batch_size, num_workers=num_workers)
epochs = 2
print(f'batch_size = {batch_size}')
print(f'len(metaset) = {len(metaset)}')
print(f'len(meta_dataloader) = {len(meta_dataloader)}\n')
with tqdm(range(epochs)) as tepochs:
for epoch in tepochs:
print(f'\n[epoch={epoch}]')
for batch_idx, batch in enumerate(meta_dataloader):
print(f'\nbatch_idx = {batch_idx}')
train_inputs, train_targets = batch['train']
test_inputs, test_targets = batch['test']
print(f'train_inputs.shape = {train_inputs.shape}')
print(f'train_targets.shape = {train_targets.shape}')
print(f'test_inputs.shape = {test_inputs.shape}')
print(f'test_targets.shape = {test_targets.shape}')