Hello,
I’m curious about how training is done on datasets with dynamic shapes. For instance, in point cloud segmentation tasks you might have different sizes for each point cloud.
I’m immediately confused on how people handle these shapes, as even stacking two tensors of unequal size results in an error. The Dataloader class similarly expects that each item in a batch will have an identical shape - so you cannot even load such a dataloader.
From looking online, people suggest a workaround such as using collate_fn and padding the points, but this seems like a massive issue if your dataset has large deviations. For instance, what if one pointcloud has 1,000 vertices and another has 1 million? What if you have a batch size of 256, with all having ~1,000 points and the final one having 1 million? Suddenly, you need to clear space and compute gradients with 256 million vertices, not to mention the additional extracted features.
Another ‘solution’ would be to use a batch size of 1, but this results in other similar issues. Perhaps using a batch size of 1 and accumulating gradients is the solution? This still results in significantly reduced performance.
The main question: I would like to know how one generally deals with a dataset which has varying, dynamic shapes to classify. Any information would be great.
Thank you