Efficient Training for Neural Nets with Evolving Architecture

I’ve been experimenting with neural nets where the architecture can change, and as a result I am trying to make my code support any directed acyclic graph where each node is a neuron.

Right now to evaluate this net on a data point, the output neurons are evaluated recursively, working backwards through the graph (and caching values of nodes visited to avoid repeated work).

The basic optimization code I’m using looks standard

        criterion = nn.MSELoss()
        optimizer = torch.optim.SGD(self.parameters(), lr=0.1)
        for step in range(num_epochs):
            for target, vector in data:
                vector = vector[0]
                optimizer.zero_grad()
                pred = self(vector)
                loss = criterion(pred, target)
                loss.backward()
                optimizer.step()
                print(f'loss after {step} step optimization: ', loss.item())

except evaluating the line pred=self(vector) uses my custom-built forward evaluation function.

Does having something close to fully connected feedforward layers allow torch to better optimize? Or should I just have a bunch of “layers” which only have one neuron? I wonder if I am somehow making it repeat a lot of unnecessary work (such as if it’s rebuilding a computational graph each time but wouldn’t normally do that).

I’m somewhat of a beginner to how Torch works under the hood (before this experiment was usually training more standard kinds of neural nets and basing my code on templates).

Thanks so much in advance!
Joel