Two Branch Network and Masking one Class

Suppose I have a network as mentioned below. I have an encoder plus two branches.
One branch is predicting if the sample is music or not music. if prediction is music then follow the isMusicNet, otherwise if prediction is not music then follow the MusicType branch to predict type of Music

class Encoder(nn.Module):
 def __init__(self):
     super(Encoder, self).__init__()
     self.enc = nn.Encoderlayers()   #Some N number of encoder layers
 def forward(self, x):
     return self.enc(x)       
class isMusicNet(nn.Module):
 def __init__(self):
     super(isMusicNet, self).__init__()
     self.fc = nn.Linear(64, 2)
 def forward(self, x):
     return self.fc(x)
class MusicTypeNet(nn.Module):
 def __init__(self):
     super(MusicTypeNet, self).__init__()
     self.fc = nn.Linear(64, 4)
 def forward(self, x):
     return self.fc(x)
class OverallNet(nn.Module):
 def __init__(self):
     super(MusicTypeNet, self).__init__()
     self.enc = Encoder()
     self.musicnet = isMusicNet()
     self.musictypenet = MusicTypeNet()
 def forward(self, x):
     x = self.enc(x)
     x_m = self.musicnet(x)
     x_mtype = self.musictypenet(x)
     return x_m, x_mtype

Now Consider, I have 5 classes in total
C1, C2, C3, C4, C5
Suppose C5 is non music
Where 4 classes are of 4 different types of music and other one is not music
What I want is to have my network , two branches, one branch differentiates between music and non music [i. e, it will be having two classes] and other predicts or differentiates between music types [in this case should have 4 classes]

I am getting confused in the training loop.
Since in isMusicNet, I should have binary cross entropy loss
and in MusicTypeNet, I should be having categorical cross entropy loss, to differentiate between 4 classes
but when the pytorch loader passes the batches, it contains both music and non music samples. How to mask the non music one in MusicTypeNet and combine music types as one class in isMusicNet

In addition, if the label are

batchlabels = tensor([4, 4, 2, 4, 4, 0, 3, 4, 0, 1, 4, 0, 3, 2, 4, 4, 1, 4, 4, 2, 4, 0, 4, 0,
0, 0, 0, 4, 0, 4, 4, 4, 4, 4, 4, 4, 0, 0, 4, 4, 1, 4, 4, 1, 4, 0, 4, 1,
4, 4, 1, 4, 4, 3, 3, 3, 3, 4, 1, 0, 4, 0, 0, 4])

and I want

batchlabels[batchlabels!=5]
tensor([2, 0, 3, 0, 1, 0, 3, 2, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 3,
3, 3, 3, 1, 0, 0, 0]) as one class suppose class A.

and

batchlabels[batchlabels==5]
tensor([4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4])
as class B

Since my original labels are 5 in dataloader, how can I map these newly created label in the loss function now

Any help or suggestion would be appreciated ?
Thank you