Dynamically combining multiple models into one which will run in parallel

Suppose I have the following problem. I am running an iterative algorithm which, on iteration t, trains a neural network - call it “NN_t”. So after T iterations, I have T models. For every t1 != t2, NN_t1 and NN_t2 differ only in their weights - the architecture is the same for all of them. All models are trained completely from scratch - NB, the training for NN_t is NOT a continuation of ( or related to in any way ) the training for NN_t-1. To reiterate, all models share the same architecture, but have differing and totally independent weights. When this training is all done, I need to do inference in the following way for a given input ( model actually takes two inputs, but just say x is a list containing both of them ) :

multiOutputs = [ NN_t( x ) for t in range( itersDone )]
CompareOutputs( multiOutputs ) #exactly what happens in here is unimportant

Ok that’s straightforward enough. Now say, for every t, NN_t takes 2 inputs ( two arrays of real numbers ) and gives 1 output ( a single real number ). Doing this in series ( as in the for loop above ) is extremely slow, since I need to do it A LOT. As in, A LOT A LOT - enough that I need this operation to be as fast as it can conceivably be. I am currently migrating from Tensorflow to Torch, and in tf, I was doing this the following way:

Given T iters of training, make a single large model consisting of all the layers ( with their trained weights ) from all T NN_t’s stacked on each other in a non-interacting way ( i.e. no iteraction between any layers of any NNs from different iters ) - call this multi-model MM_T. MM_T takes 2T inputs and gives T outputs. All 2T of these inputs are just really copies of the two inputs that would be given to an NN_t on its own. Hence the for loop is eliminated, i.e. we do:

multiInputs = CopyInput( input=x,iters=T ) #|x| = 2, |multiInputs| = 2T 
multiOutputs = MM_T( multiInputs ) #|multiOutputs| = T
CompareOutputs( multiOutputs )

Turns out, this is significantly faster than simply running all the separate NN_t’s in series ( gee…what a surprise ). MM_T.init is simple enough:

class MultiModel( nn.Module ):

	def __init__( self,mFile,iterSpan,nnScale=1 ):

		super( MultiModel,self ).__init__()
		self.IterSpan = iterSpan

		for t in range( iterSpan ):

			NN_t = NN( modelIter=t,load_from_file=mFile,nnScale=nnScale )
			layerDict_t = NN_t.EnumerateLayers()

			for layerKey,layer in layerDict_t: 
			setattr( self,layerKey,layer )

Ez pz. What I’m having trouble with here is how to go about doing MM_T.forward( multiInputs ) without resorting to a serial loop. Any takers?