Size dismatch when use multi-cuda(torch.nn.DataParallel)

t6am3 · November 28, 2019, 6:36am

Hi~ I’m using a custom model like this

class SimpleNN(nn.Module):
    def __init__(self, vectors_size, features_size, hidden_size=15, dropout_rate=0.1):
        super(SimpleNN, self).__init__()

        self.vectors_size = vectors_size
        self.features_size = features_size
        self.hidden_size = hidden_size
        self.dropout_rate = dropout_rate

        self.vectors_hidden = nn.Sequential(
            nn.Dropout(self.dropout_rate),
            nn.Linear(vectors_size, vectors_size//2),
            nn.Tanh(),
            nn.Linear(vectors_size//2, features_size),
            nn.Tanh()
        )
        self.hidden = nn.Sequential(
            nn.Linear(features_size*2, hidden_size),
            nn.ReLU(),
        )

        self.output = nn.Linear(hidden_size, 2)

    def forward(self, pairs, features):
        """
        features: (n_samples, features_size)
        """
        vectors = pairs2vectors(train_pub, pairs).to(device)
        embedding_features = self.vectors_hidden(vectors)
        combined_features = torch.cat([features, embedding_features], dim=1)
        return self.output(self.hidden(combined_features))

This model works well when i use only one cuda, but after ‘DataParallel’ used like below, it always tell me the size of features and embedding_features are not match, i find that the n_samples shape of features doesn’t follow my expectation just like another batch data, i dont know why and how to solve this problem.

if torch.cuda.device_count() > 1:
        print("Let's use", torch.cuda.device_count(), "GPUs!")
        model = nn.DataParallel(model)

BTW, here is the pic of error message is Error message