Question about fine tuning a fcn_resnet101 model with 2 classes

Most likely not. Base on your description, it seems you are dealing with color images as masks.
For a usual segmentation use case, the mask should be a LongTensor with the shape [batch_size, height, width] containing class indices.

Have a look at this post to see, how a mapping between the color values and your class indices can be created.