Applying independent nn modules in segregated layers (parallelised)

I recommend PyTorch provide a method to apply multiple independent nn modules in parallel, such as in a segregated layer. At a minimum I recommend they provide a wrapper function for the nn.Conv1d workaround used to parallelise the execution of independent nn.Linear layers. For example;

segregatedLinear = LinearSegregated(in_features=in_features, out_features=out_features, number_sublayers=linearSublayersNumber)

class LinearSegregated(nn.Module):
	def __init__(self, in_features, out_features, number_sublayers):
		super().__init__()
		self.segregatedLinear = nn.Conv1d(in_channels=in_features*number_sublayers, out_channels=out_features*number_sublayers, kernel_size=1, groups=number_sublayers)
		self.number_sublayers = number_sublayers
		
	def forward(self, x):
		#x.shape = batch_size, number_sublayers, in_features
		x = x.view(x.shape[0], x.shape[1]*x.shape[2], 1)
		x = self.segregatedLinear(x)
		x = x.view(x.shape[0], self.number_sublayers, x.shape[1]//self.number_sublayers)
		#x.shape = batch_size, number_sublayers, out_features
		return x

nn.Linear layers already accept inputs with additional dimensions as [batch_size, *, in_features] and will apply the layer on each sample in these additional dimensions.
I would assume nn.Conv1d would provide the same functionality with its groups argument.
If you have a good abstraction in mind, could you create a feature request on GitHub so that the code owners could discuss it, please?

I would assume that nn.Linear would apply (multiply) the same parameters for each x subsample (in dim=1; number_sublayers). The example I provided is based on the following workaround;

Yes as seen here:

lin = nn.Linear(10, 10)
x = torch.randn(2, 10, 10)
ref = lin(x)

outs = []
for idx in range(x.size(1)):
    x_ = x[:, idx]
    out = lin(x_)
    outs.append(out)
    
outs = torch.stack(outs, dim=1)
print((outs - ref).abs().max())
# tensor(0., grad_fn=<MaxBackward1>)

Thank you for your support