Adding MLP Into the intermediate layer of Conv2D

Hi PyTorchers,
I am currently attempting to implement a network similar to Network in Network (2014; M.Lin Q.Chen). Which takes the output of each particular sliding window of a 2D Convolutional Layer and passes it through multiple linear layers prior to the pooling down sampling. Which looks like the following:
I am curious if there is a way to manually extract all of the features from each window as the Conv2D creates feature maps, so I can add my linear MLP layers prior to the down sampling.