General doubt regarding training a FasterRCNN Model and feature extractor

I am working on object detection and I have a dataset containing images and their corresponding bounding boxes (ground-truth values).
I actually have built my own feature extractor which takes an image as input and outputs a feature map(basically an encoder-decoder system where the final output of the decoder is the same as the image size and has 3 channels). Now, I want to feed this feature map as an input to a FasterRCNN model for detection. I have 2 main doubts at this point

  1. Is it okay if I skip training the feature extractor and just train the FasterRCNN for detection?
  2. If I should also train the feature extractor, what will be my labels while training?

I am actually a beginner in the field of Computer Vision and using Pytorch. It would be really helpful if anyone could guide me on how to approach this problem.