Adding autoencoders to large network causes memory issues

This would be expected depending on the input activation size, as the intermediate forward activations could use the majority of the memory. You could check this post for an estimation of a ResNet architecture.