Dynamic Parallel Network

There are some confusing terms in this title, so let me start by defining what I mean.
Dynamic - not a preset number of subnets
Parallel - multiple subnets that feed into a single head

I have written a simple “parallel MLP.” Basically, I have the data input shape as the model input parameter, and from this, an independent MLP is instantiated for each “layer.” I’m doing this instead of flattening to disconnect the different dimensions of my data from one another for a time to see if my results improve. Because I want to do this dynamically based on the initial parameter, rather than hard-coding these subnets, i have a loop to process each layer of the input through each individual network. Each part of these subnets (Linear layers, activation functions, etc) are stored in lists that are indexed into. The problem is, my validation accuracy is unchanging during training, so I am worried that the training is not “reaching” the parameters in these lists like it normally would, and is only actually training the final few parameters that are responsible for taking the output of each subnet.

I realize this is a loaded problem, but the gist that I really need help making sense of is simply if PyTorch is going to interpret this model and backprop through this model like I think it is.

# This is my network
class parMLPNet(nn.Module):
  def __init__(self, input_size):
    super(parMLPNet, self).__init__()
    self.input_size = input_size
    self.dims = self.input_size[0]
    self.input_nodes = np.prod(self.input_size[1:])

    self.fc1 = self.dims*[nn.Linear(self.input_nodes, 100)]
    self.out1 = self.dims*[nn.Linear(100, 1)]
    
    self.fc2 = nn.Linear(self.dims, 25)
    self.out = nn.Linear(25, 1)
    self.relu = nn.ReLU()
    self.sigm = nn.Sigmoid()

  def forward(self, x):
    xii = torch.tensor([])
    for ii in range(self.dims):
      xi = self.fc1[ii](x[:, ii])
      xi = self.relu(xi)
      xi = self.out1[ii](xi)
      xii = torch.cat((xii, xi), 1)
    x = self.fc2(xii)
    x = self.relu(x)
    x = self.out(x)
    x = self.sigm(x)
    return x

# This is my training function
def train_epoch(model, loader, loss_fun, optimizer):
  model.train()
  total_loss = 0
  for batch_idx, (x, target) in enumerate(loader):
    if torch.cuda.is_available():
      x, target = x.cuda(), target.cuda()
    optimizer.zero_grad()
    out = model(x)
    loss = loss_fun(out, target.unsqueeze(-1))
    total_loss += loss.item()
    loss.backward()
    optimizer.step()
  epoch_loss = total_loss/len(loader.sampler)
  return epoch_loss

# This is my evaluation function
def valid_epoch(model, loader, loss_fun):
  model.eval()
  with torch.no_grad():
    correct, total_loss = 0, 0
    for batch_idx, (x, target) in enumerate(loader):
      if torch.cuda.is_available():
        x, target = x.cuda(), target.cuda()
      out = model(x)
      loss = loss_fun(out, target.unsqueeze(-1))
      total_loss += loss.item()
      prediction = torch.round(out)
      correct += (prediction == target.unsqueeze(-1)).sum().item()
    val_loss = total_loss/len(loader.sampler)
    val_accu = correct/len(loader.sampler)
    return val_loss, val_accu

Thanks in advance to whomever can help!