Depending on the model architecture the intermediate activations would need a huge amount of memory as @ConvolutionalAtom explained.
A simple conv layer would be a good example, as it’s often not reducing the spatial size significantly while increasing the number of channels in the output activation.
An 8GB input would thus create an even larger output.
Take a look at this post which estimates the memory usage of a resnet.