Hello, I have a piece of code that uses a torch.utils.data.DataLoader with a custom BatchSampler to sample batches with the same amount of objects in each class.
from functools import partial
from typing import Any, Dict, List, Optional, Union
import numpy as np
import torch
from nemo import logging
from nemo.backends.pytorch import DataLayerNM
from nemo.collections.asr.parts.dataset import AudioLabelDataset, seq_collate_fn
from nemo.collections.asr.parts.features import WaveformFeaturizer
from nemo.collections.asr.parts.perturb import AudioAugmentor
from nemo.collections.asr.parts.perturb import perturbation_types
from nemo.core.neural_types import *
from torch.utils.data.sampler import BatchSampler
class BalancedBatchSampler(BatchSampler):
"""
BatchSampler - from a MNIST-like dataset, samples n_classes and within these classes samples n_samples.
Returns batches of size n_classes * n_samples
"""
This file has been truncated. show original
I’m trying to use it in a multi-gpu scenario with NeMo framework. By default when in multi-gpu mode it should be something like this:
if self._placement == DeviceType.AllGpu:
sampler = torch.utils.data.distributed.DistributedSampler(self._dataset)
self._dataloader = torch.utils.data.DataLoader(
dataset=self._dataset,
sampler=sampler,
num_workers=num_workers,
)
I’ve found some tricks to implement a custom distributed sampler, but none of them work for custom distributed batch sampler. What can I do?
wayi
(Yi Wang)
May 12, 2021, 6:15pm
2
@VitalyFedyunin for data loader questions