I want to produce a network like the diagram above. Some features will be grouped together before going to the fully connected layers (group 1), there maybe be some overlap between groups (2 and 3), whereas other features will not be grouped at all and just feed directly into the fully connected layers. The diagram shows only one fully connected layer as an example but there would actually be several.
I hope this makes sense.
I want to do this as we found that one engineered feature we had, which was made from a linear regression of a group of related features, was very effective and I’d like to generalise that into the network itself and try for other features we think are related.
Can you share more than just a figure? Perhaps what shape each group is? Because it seems like you have 3 feed-foward networks for each group that map some inputs features to 3 different scalars, then paass those scalars (with the ungrouped scalars) into another feed-forward network which maps them all to a single scalar. So, more detail on the input data (i.e. batch size, input shapes etc…) will make it easier for people to debug.
What you’ve described is accurate. It’s not a request for debugging, more a request as to whether such a network is feasible and any pointers before I expel the effort trying to make it.
As for the data, I’m currently working with multivariate tabular data. For the purposes of the illustration each leftmost node is a single numerical feature, and the target is a regression. How does batch size factor in?
You could just create those “group” networks via nn.Module objects then call them, and pass their outputs to the final network on the right-hand-side of the figure. So, something like,
out1 = group1net(in1) #returns batch of scalars
out2 = group2net(in2) #returns batch of scalars
out3 = group3net(in3) #returns batch of scalars
#ungroup is shape [B,2]
x = torch.cat([out1, out2, out3, ungroup], dim=-1) #concatenate into single input tensor
output = finalnet(x) #simply pass to single-layer network on the right. (outputs scalar)
Given group2 and group3 share an input feature you can simple handle that when you create the in2 and in3 input Tensors, and let your network handle the rest.
Make sure the batch size is consistent across all inputs features!