I’m trying to test my model with torchdata and ddp (2 gpus or sometimes 6 gpus), however, the final number of samples is always less than the actural number here.
My datadir has 17 packed tar files, number 0~15 has 5120 entries, while the last one has only 2730 entries. If I add shardingfilter after filelister, it seems that all of datas in the last tar file will be ignored by the dataloader. I also tried to adding shardingfilter after the map function, however, there are still servel samples missing .
Is there anyway I can make sure I can have the correct number of files? Do I have to repack my dataset so they all have the same number of entries?
Many thanks in advance!
import torchdata
from torchdata.datapipes.iter import FileLister, FileOpener
rank, world_size = get_dist_info()
rootdir = "/data/test/"
dataset = FileLister(rootdir, "*.tar")
# if dist: dataset = dataset.sharding_filter()
dataset = FileOpener(dataset, mode="rb")
dataset = dataset.load_from_tar(length=length)
dataset = dataset.webdataset().map(postprocess_func)
if dist: dataset = dataset.sharding_filter()
data_loader = DataLoader(
dataset,
batch_size=batch_size,
num_workers=num_workers,
pin_memory=pin_memory,
shuffle=shuffle,
drop_last=False,
**kwargs)
cnts = 0
for ind, x in enumerate(data_loader):
cnts += len(x)
# cnts doesnt match here