Match pixels of an image with corresponding features after encoding

Assume that we have an input image, a tensor inputImg of shape B,C,W,H (in our case its B,3,256,128), and after encoding with pre-trained VGG, for a specific layer (let’s say for relu4_1), we got a feature tensor, featTens of shape ([B, 512, 16, 32]) . Is there a way to directly map each pixel of featTens with each corresponding patch or region of inputImg ?

The main reason is that I want to find the corresponding image pixels for the features that seem to give big errors during training.

You could either try to calculate the receptive fields manually (might be cumbersome) or use a method from e.g. Captum which might have some methods to visualize (large) output activations in the input tensor.