I am working on a document-based dataset where each sentence is a sample (torch Example). My target is to create an iterator where each batch represents a document, sentences as samples where I can have a dynamic batch sizes depending on the number of sentences in the document.
In the normal case, I would have used the below
data_iter = data.Iterator( dataset, config.batch_size, repeat=False )
In my case now I can utilize
batch_size_fn Iterator parameter but that won’t give me access to previous sample in the batch so that I can access and compare the document ids and only add to batch if it matches. I thought about creating a wrapper iterator but no idea how to get this done. Any ideas would be appreciated!