Dynamically combining multiple models into one which will run in parallel

Suppose I have the following problem. I am running an iterative algorithm which, on iteration t, trains a neural network - call it “NN_t”. So after T iterations, I have T models. For every t1 != t2, NN_t1 and NN_t2 differ only in their weights - the architecture is the same for all of them. All models are trained completely from scratch - NB, the training for NN_t is NOT a continuation of ( or related to in any way ) the training for NN_t-1. To reiterate, all models share the same architecture, but have differing and totally independent weights. When this training is all done, I need to do inference in the following way for a given input ( model actually takes two inputs, but just say x is a list containing both of them ) :

multiOutputs = [ NN_t( x ) for t in range( itersDone )]
CompareOutputs( multiOutputs ) #exactly what happens in here is unimportant

Ok that’s straightforward enough. Now say, for every t, NN_t takes 2 inputs ( two arrays of real numbers ) and gives 1 output ( a single real number ). Doing this in series ( as in the for loop above ) is extremely slow, since I need to do it A LOT. As in, A LOT A LOT - enough that I need this operation to be as fast as it can conceivably be. I am currently migrating from Tensorflow to Torch, and in tf, I was doing this the following way:

Given T iters of training, make a single large model consisting of all the layers ( with their trained weights ) from all T NN_t’s stacked on each other in a non-interacting way ( i.e. no iteraction between any layers of any NNs from different iters ) - call this multi-model MM_T. MM_T takes 2T inputs and gives T outputs. All 2T of these inputs are just really copies of the two inputs that would be given to an NN_t on its own. Hence the for loop is eliminated, i.e. we do:

multiInputs = CopyInput( input=x,iters=T ) #|x| = 2, |multiInputs| = 2T 
multiOutputs = MM_T( multiInputs ) #|multiOutputs| = T
CompareOutputs( multiOutputs )

Turns out, this is significantly faster than simply running all the separate NN_t’s in series ( gee…what a surprise ). MM_T.init is simple enough:

class MultiModel( nn.Module ):

	def __init__( self,mFile,iterSpan,nnScale=1 ):

		super( MultiModel,self ).__init__()
		self.IterSpan = iterSpan

		for t in range( iterSpan ):

			NN_t = NN( modelIter=t,load_from_file=mFile,nnScale=nnScale )
			layerDict_t = NN_t.EnumerateLayers()

			for layerKey,layer in layerDict_t: 
			setattr( self,layerKey,layer )

Ez pz. What I’m having trouble with here is how to go about doing MM_T.forward( multiInputs ) without resorting to a serial loop. Any takers?

Did you find a solution to this? I’m interested in this as well. This would be really helpful for cross entropy reinforcement learning techniques.

I’m thinking maybe this could be done with func torch?

Take your model, use make functional (functorch.make_functional — functorch nightly documentation) - then when you want to run a batch of weights, use a vmap?