My data is both image and the compressed(.npz) files that contain 3d points and labels for each 3d points(contains a minimum of 646464 points and each file size varies from 10MB to 45MB). num_workers=4, batch_size=32, points_sampled for every batch is 2048.
Total number of parameters: 13414465
train batch 1 load time 21.43259024620056s
train batch 2 load time 0.031423091888427734s
train batch 3 load time 0.004406452178955078
train batch 4 load time 0.004347562789916992
train batch 5 load time 18.13344931602478
train batch 6 load time 0.004399538040161133
train batch 7 load time 0.03353142738342285
train batch 8 load time 0.004467010498046875
train batch 9 load time 16.202253103256226
train batch 10 load time 0.8377358913421631
I understand that input file is too large and I load it from a network location, this could cause the slowdown.
- Why is it slow only after #num_workers iteration?
- How can we make it faster loading?