Mask-RCNN

additional branch from Fast R CNN that predicts segmentation masks from a region. produces bounding boxes

faster-RCNN already has claass label and bounnding box offset

Mask-RCNN wants to also give you a mask, but requires fine spatial layouts

multi-task loss - classification, bounding box, and mask.

mask loss is sigmoid, binary cross entropy.

key

not semantic segmentation, which does a softmax and multinomail cross entropy loss on each pixel - instead, per-pixel sigmoid, doucpling mask and class prediction.

use a FCN for predicing an m x m mask.

ROI align

ROI pool extracts a small 7x7 feature map from each ROI

it quantizes a floating-number ROI to the granuarliy of feature map.

instead of quantization, using blinear interpolation on the feature map, no quanization/rounding

Last Reviewed: 10/28/2025