The problem of DataParallel outputs

Thanks! Checking some other sources: ValueError: Expected input batch_size (1) to match target batch_size (64) and https://stackoverflow.com/questions/56719867/pytorch-expected-input-batch-size-12-to-match-target-batch-size-64, and ValueError: Expected input batch_size (324) to match target batch_size (4), there is likely a bug in how you’ve defined the shapes in the implementation of your forward pass.

If your model works without DataParallel but breaks with it, it’s likely due to your model implicitly hardcoding a specific batch size it expects, likely in the beginning of the forward pass (maybe somwhere in self.bert().