1D convolution peak picking

Hello, I want to use 1D connection to pick a peak pick on every beat of music, but I want to know if I am doing right or missing anything.

Input uses a tensioner of Batch * 441 (SampleRate/100 of sound, that is, 0.01 second data) * 1 (Batch * 441 * 1) as input. After the 1D Convolution shown in the picture, the output of Batch * 2 is shown, which is the score for Class 1 and Class 2.

Target data is a tensor of (Batch * 2) size that displays every 0.01 second whether the two classes are true or false, respectively Because each of the two classes needs to be scored separately, the Loss function uses BCELoss with Multi Label Classification.

Assuming there’s no problem with the learning data, is there anything I’m missing? Or is there something wrong?

class convNet(nn.Module):

    def __init__(self):

        super(convNet, self).__init__()
        # model
        self.conv1 = nn.Conv1d(in_channels=441, out_channels=881, kernel_size=1, stride=2)
        self.conv2 = nn.Conv1d(in_channels=881, out_channels=1024, kernel_size=1, stride=2)
        self.fc1 = nn.Linear(1024, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 4)


    def forward(self, x, batch):

        x = F.max_pool1d(F.relu(self.conv1(x)), 1)
        x = F.max_pool1d(F.relu(self.conv2(x)), 1)
        x = F.dropout(x.view(batch, -1))
        x = F.dropout(F.relu(self.fc1(x)))
        x = F.dropout(F.relu(self.fc2(x)))

        return F.sigmoid(self.fc3(x))