NLP torchtext multiple cuda - the best strategy for a large imbalanced dataset


I am working on the problem of multiclass classification on a large imbalanced dataset. Next few weeks, I will have the opportunity to use 4 Cuda devices. The structure of the dataset is simple: TEXT, LABEL.
What is the best strategy in using torchtext for such a situation? Unfortunately, I could not find an example for processing on multiple cuda devices and with such a dataset (large imbalanced dataset - 20 Gb).

You should be able to use DistributedDataParallel to utilize all GPUs in your model training. I’m not aware of any torchtext-specific limitations for this.

I was trying to find some example for multiclass classification on multiple cudas but without success. Do you know any of such examples?

You could start by creating a training script for the multi-class classification on a single device and then add the distributed training on top of it using e.g. this tutorial.