Hello, I recetently followed the tutorial on the Pytorch Official Webset(Link) to fine-tune a Mask R-CNN.
In the tutorial, the author just need to seperate the pedstrain and the background. So the definition of the Dataset class looks like this:
class PennFudanDataset(object):
-
def init(self, root, transforms):*
-
self.root = root*
-
self.transforms = transforms*
-
# load all image files, sorting them to*
-
# ensure that they are aligned*
-
self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))*
-
self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))*
-
def getitem(self, idx):*
-
# load images ad masks*
-
img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])*
-
mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])*
-
img = Image.open(img_path).convert("RGB")*
-
# note that we haven't converted the mask to RGB,*
-
# because each color corresponds to a different instance*
-
# with 0 being background*
-
mask = Image.open(mask_path)*
-
# convert the PIL Image into a numpy array*
-
mask = np.array(mask)*
-
# instances are encoded as different colors*
-
obj_ids = np.unique(mask)*
-
# first id is the background, so remove it*
-
obj_ids = obj_ids[1:]*
-
# split the color-encoded mask into a set*
-
# of binary masks*
-
masks = mask == obj_ids[:, None, None]*
-
# get bounding box coordinates for each mask*
-
num_objs = len(obj_ids)*
-
boxes = []*
-
for i in range(num_objs):*
-
pos = np.where(masks[i])*
-
xmin = np.min(pos[1])*
-
xmax = np.max(pos[1])*
-
ymin = np.min(pos[0])*
-
ymax = np.max(pos[0])*
-
boxes.append([xmin, ymin, xmax, ymax])*
-
# convert everything into a torch.Tensor*
-
boxes = torch.as_tensor(boxes, dtype=torch.float32)*
-
# there is only one class*
-
labels = torch.ones((num_objs,), dtype=torch.int64)*
-
masks = torch.as_tensor(masks, dtype=torch.uint8)*
-
image_id = torch.tensor([idx])*
-
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])*
-
# suppose all instances are not crowd*
-
iscrowd = torch.zeros((num_objs,), dtype=torch.int64)*
-
target = {}*
-
target["boxes"] = boxes*
-
target["labels"] = labels*
-
target["masks"] = masks*
-
target["image_id"] = image_id*
-
target["area"] = area*
-
target["iscrowd"] = iscrowd*
-
if self.transforms is not None:*
-
img, target = self.transforms(img, target)*
-
return img, target*
-
def len(self):*
-
return len(self.imgs)*
However, in my case, I need to seperate the nose, mouth and background from face images.
I’d like to know that how should I change definition of Dataset class,(especially the getitem method), then my module could seperate the nose and mouth as different classes after training?