How to create a hierarchy in PyTorch?

something like
—ab… …cd–

where a, b, c, d… are patterns learnt by a neural network.
ab, cd is combination of the patterns learnt at the lower hierarchy and so on.

if a has learnt patterns for lips, b for nose, c for eyes, d for tongue, then abcd would be face.

This looks like the idea behind the hierarchical features learned in CNNs as described in Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks.
The figures from this paper are quite popular in presentation slides and blog posts (Google search).

I think it is a bit different,

What I am saying, is a whole neural network at the bottom, which only knows how to predict lips, similarly whole neural network for tongue.

And then these combine together to form next hierarchy.

But what is combining two neural networks in PyTorch, so if one of them is able to predict lips, other nose, then combination would be able to predict (nose + lips)?

I’m not sure, what your work flow would be and how you would like to force the submodels to predict only a part of the face.
Would you want to use different label sets, i.e. one model would only get nose - not nose labels, while the others would get the mouth pairs etc.?
Do you have a dataset, where only parts of faces would be visible?
If the complete face is always visible, you wouldn’t be able to only force on a particular part.
Or would you only use cropped patches of the face image?
In that case, I’m not sure, how your validation/test set would look like.

This is related to one shot meta learning, so I have sample and query in meta train set, and sample and query in meta test set, the task is to make neural network classify by showing only one image, that is, show it one image in the sample of meta test set, and it would give correct classification for query in meta test set, for this we show images to neural network in sample of meta train set, and ask it to predict correctly in the query of meta train set, if it gives incorrect prediction, then loss increases, task is to minimize loss, once we have a neural network that gives low loss in meta train set, then it would give correct result by showing only one image in sample of meta test set and make correct prediction for query of meta test set. What I think is that we want our neural network to learn a whole hierarchy during meta training stage, that is, face, lips, tongue, eyes…, and after meta training we have a neural network that has learnt hierarchy, so in meta test set, it would adapt quickly to the sample, and be able to predict query of meta test set correctly.