Using different modules for a condition

I have a bunch of values I’d like to predict, and each value belongs to a category that has its own distribution of values. Instead of creating a new model per category, I’d like to incorporate weight sharing, then have the model apply a different fully connected layer based on the category. What’s the most efficient way to implement this?

Could you explain your use case a bit?
How would you select the category? Do you have this information for each sample, i.e. training and test samples?

Sure. For context, this has to do with the CHAMPS competition on Kaggle.

I have a tensor of shape NxF, where N is the number of nodes in a graph and F is the length of features. Per graph, I have a set of attributes that I am regressing against, the scalar coupling constant. Each attribute involves a pair of nodes, and I am given the type of coupling for each attribute I am trying to predict. Ideally, I’d like to have a model that does convolutions on the entire graph, then use different fully connected networks per coupling type. The only weight sharing across coupling types is the convolution/pooling part.

Currently, I am doing this by concatenating every combination of 2 nodes possible, forming an NxNxF tensor, then passing that through a fully connected network that outputs 8 values, one for every coupling type (now NxNx8). Then, since I have the indices of each pair of nodes to train on, I can index that tensor with torch.take and get the right values. However, I’m wondering if there’s a more efficient way to do this. Doing a pairwise concatenation of every node is very memory intensive.

Do you have any suggestions? Thanks!