thanks for your comments @anantguptadbl but I’m afraid I’m missing your point. The doc you point is flagged as legacy, with current doc being this: Know your dataset.
and I am “leveraging” the return value. my preprocess function is for multiple choice examples (ala swag, cf. Google Colab)
def preprocessOne(example):
first_sentences = [example[Context_name] for i in range(NMultChoice)]
second_sentences = [ f"{example[str(i+1)]}" for i in range(NMultChoice) ]
flat_first = list(chain(first_sentences))
flat_second = list(chain(second_sentences))
tokenized_examples = tokenizer(first_sentences, second_sentences, truncation=True)
tokDict = {k: [v[i : i + NMultChoice] for i in range(0, len(v), NMultChoice)] for k, v in tokenized_examples.items()}
tokDict['label'] = example[Label_name]
tokDict['idx'] = example[Idx_name]
tokDict['scores'] = {f'{i}': example[f'{i+6}'] for i in range(NMultChoice)}
return tokDict
@rbelew
Okay perfect, you are returning the value from preprocessOne
Since you did not have any LHS variable, I assumed that you are not returning anything from the function. You just need to consume the updated structure from the map
ha, it was that simple! i just had to use ds2 = ds.map(preprocessOne) vs. what I had assumed as in-place ds.map(preprocessOne). thanks so much @anantguptadbl