What does the term episode mean in meta-learning?

pinocchio · July 21, 2020, 7:35pm

Does it mean a single task (dataset) sampled form the meta-set or does it mean a batch of tasks (datasets)?

Reference:

https://stats.stackexchange.com/questions/478255/what-does-the-term-episode-mean-in-meta-learning

reference:

ayalaa2 · July 21, 2020, 8:53pm

An episode is considered to be a batch of tasks. For example, an episode may be 5 classes, where we have 5 images per class known as our support set. We also have some number of query images in which we classify as one of those 5 classes. This is what’s considered an episode.

So we essentially have a mini train-set (our support set) and a mini test-set (our query set). This is where the meta-learning concept comes in.

Depending on your set-up, your episode can be sampled from several datasets (also known as domains) or a single dataset. In my experience, it’s typical to sample an episode from a single domain. The idea is that it’s more realistic to have your model adapt to a single unseen domain rather than several unseen domains.

I hope this helps your understanding.

pinocchio · July 23, 2020, 6:21pm

Hi Alex, Thanks for taking the time to respond.

Before I respond let me share what I understand from the meta-learning papers I’ve read (which are a bunch in some detail at this point).

In meta-learning there is a meta-set which has the collection of data-sets to learn from (usually also called tasks). Each task/data-set in the meta-set is usually split into a train/support set and a test/query set (usually 5+15 examples afaik its common). Each of these tasks/data-set is built as following:

If its regression one samples a function from a family of similar function and creates the examples from an input range where y=f_i(x) for task i or data-set D_i. With 20 examples then split it.
If it’s classification it’s a N-way, K-shot task (+ K_eval shots). So one usually samples 5 classes from the fraction of classes used for meta-training and then from each class sample K+K_eval examples (usually 5+15=20 total).

Then the meta-set has a bunch of these tasks/data-sets.

From Trist’s response it seems that an episode is 1 data-set/task. Not a (meta) batch of data-sets.

I am trying to confirm if this is correct but he’s published a paper with Yoshua Bengio, so he’s likely a reliable source, but it’s weird to me cuz I thought your answer is the correct one. But Im unsure.

Quote:

tristandeleu commented 2 days ago

Usually an episode means one single dataset D_i . If you have a (meta-)batch of size 16, this means you have 16 episodes/datasets in your (meta-)batch.

ayalaa2 · July 23, 2020, 6:39pm

I would agree with the way that you described the meta-set.

When you say:

Not a (meta) batch of data-sets

Do you mean, the batch is not sampled from multiple data-sets? I think my answer and his are about the same. He suggests that an episode is sampled from one data-set, and I would agree with this. I just don’t know if this is a strict requirement of an episode. What’s the main contradictory thought that you’re noticing?

pinocchio · July 23, 2020, 8:27pm

To me a data-set is a task as I understand it. i.e. we generate 20 samples from a selected distribution (from a fixed function or from a mini-classification task were we sample say 5 random labels).

Let’s take mini-Imagenet for example with meta-batch=1. It has 64 images for meta-train and each has a total of 600 images. A data-set in the meta-set would be a sample of 5 images from those 64 classes with a sample of 20 actual images from the 600. During one epoch then we create ceil(64/5) of iterations/meta-batches (with the last meta-batch having only 4 classes or we skip it). That is what is seems to be happening in torchmeta, at least thats correct for regression.

I created a task of 100 samples per function with 20 functions and then after 2 batches the function is done (first batch has 16 data-sets the next one 4):

[epoch=0]
  0%|          | 0/2 [00:00<?, ?it/s]
batch_idx = 0
train_inputs.shape = torch.Size([16, 5, 1])
train_targets.shape = torch.Size([16, 5, 1])
test_inputs.shape = torch.Size([16, 15, 1])
test_targets.shape = torch.Size([16, 15, 1])
batch_idx = 1
train_inputs.shape = torch.Size([4, 5, 1])
train_targets.shape = torch.Size([4, 5, 1])
test_inputs.shape = torch.Size([4, 15, 1])
test_targets.shape = torch.Size([4, 15, 1])
[epoch=1]
 50%|█████     | 1/2 [00:00<00:00,  3.48it/s]
batch_idx = 0
train_inputs.shape = torch.Size([16, 5, 1])
train_targets.shape = torch.Size([16, 5, 1])
test_inputs.shape = torch.Size([16, 15, 1])
test_targets.shape = torch.Size([16, 15, 1])
batch_idx = 1
train_inputs.shape = torch.Size([4, 5, 1])
train_targets.shape = torch.Size([4, 5, 1])
test_inputs.shape = torch.Size([4, 15, 1])
test_targets.shape = torch.Size([4, 15, 1])
Done with test! a
import sys; print('Python %s on %s' % (sys.version, sys.platform))
100%|██████████| 2/2 [00:00<00:00,  3.49it/s]

code:

    # loop through meta-batches of this data set, print the size, make sure it's the size you exepct
    from torchmeta.utils.data import BatchMetaDataLoader
    from torchmeta.transforms import ClassSplitter
    from torchmeta.toy import Sinusoid

    from tqdm import tqdm

    dataset = Sinusoid(num_samples_per_task=100, num_tasks=20)
    shots, test_shots = 5, 15
    # get metaset
    metaset = ClassSplitter(
        dataset,
        num_train_per_class=shots,
        num_test_per_class=test_shots,
        shuffle=True)
    # get meta-dataloader
    batch_size = 16
    num_workers = 0
    meta_dataloader = BatchMetaDataLoader(metaset, batch_size=batch_size, num_workers=num_workers)
    epochs = 2

    print(f'batch_size = {batch_size}')
    print(f'len(metaset) = {len(metaset)}')
    print(f'len(meta_dataloader) = {len(meta_dataloader)}\n')
    with tqdm(range(epochs)) as tepochs:
        for epoch in tepochs:
            print(f'\n[epoch={epoch}]')
            for batch_idx, batch in enumerate(meta_dataloader):
                print(f'\nbatch_idx = {batch_idx}')
                train_inputs, train_targets = batch['train']
                test_inputs, test_targets = batch['test']
                print(f'train_inputs.shape = {train_inputs.shape}')
                print(f'train_targets.shape = {train_targets.shape}')
                print(f'test_inputs.shape = {test_inputs.shape}')
                print(f'test_targets.shape = {test_targets.shape}')

pinocchio · July 23, 2020, 8:29pm

My contradictory concepts is this:

First Trist’s definition:

episode = 1 single data set/task

Our definition of episode:

episode = 1 batch of data sets/tasks

in fact Trist implies this equation should hold:

total_episodes = num_meta_epochs * meta_batch_size

pinocchio · July 23, 2020, 8:41pm

references I have that might help:

reference: arxiv.org/pdf/1606.04080.pdf
reference: proceedings.mlr.press/v48/santoro16.pdf

this is the quote I find unreadable:

More specifically, let us define a task T as distribution over possible label sets L. Typically we consider T to uniformly weight all data sets of up to a few unique classes (e.g., 5), with a few examples per class (e.g., up to 5). In this case, a label set L sampled from a task T , L ∼ T , will typically have 5 to 25 examples.
To form an “episode” to compute gradients and update our model, we first sample L from T (e.g.,
L could be the label set {cats, dogs}). We then use L to sample the support set S and a batch B
(i.e., both S and B are labelled examples of cats and dogs). The Matching Net is then trained to
minimise the error predicting the labels in the batch B conditioned on the support set S. This is a
form of meta-learning since the training procedure explicitly learns to learn from a given support set
to minimise a loss over a batch.

references for Trist’s answer:

In our setup, a task, or episode, in-
volves the presentation of some dataset D = {dt}Tt=1 = T
{(xt,yt)}t=1. For classification, yt is the class label for an image xt, and for regression, yt is the value of a hid- den function for a vector with real-valued elements xt, or simply a real-valued number xt (here on, for consistency, xt will be used).

pinocchio · October 12, 2020, 7:27pm

In my opinion the right definition of an episode should be a batch of tasks (usually called a meta-batch). For regression if we have 100 𝑓𝑖
f
i
from some family (e.g. sine functions) then 1 episode with a meta-batch size of size 16 should be 16 functions, each with a support set and a query set.

For classification an episode is still a (meta) batch of tasks. In this case a task is a N-way K-shot classification task. e.g. 5-way, 5-shot would have 25 examples for the support set and if the Keval is 15 then 75 examples for the query set. In this case if we have meta-batch size of 16 then we sample 16 tasks, each with 25+75 examples. So a total of 16*100 examples for a meta-batch.

In fact with this definition 1 episode is the same as an iteration step. When meta-batch size is 1 then a task is an episode.

I can’t imagine why we’d define an episode as a task, which I thought at some point. In that case we have the same word for task and episode. But an episode of learning happens fully during each iteration.

Though, I’d prefer to not use this word at all since it seems redundant + RL already uses this term which adds to the confusion in my opinion.