Mason Wang

R-CNN

R-CNN - regional propsoal network - split object detection into two stages - first propose region, then classify object

get a bunch of region proposals using “selective search” (merges simlar regions using algorithms)

regize each region to 227 x 227 size, after dilating the bounuding box

selective search has fast mode during test time

SVM on feature vectors trained for that class, do non-maximum suppression to remove overlapping regions, by rejecting a region with an IoU of a higher scoring region

Trainingss

selective search generates 2000 candidate recatngles, and linear regression prediccts 4 offsets

first, generate region proposals likely to contain objects (bounding boxes)

each region is cropped and resized (ROI polling)

feature extraction - CNN extracts features

object classification and bounding box regression - determine object class, and refine coordinates.

produces bounding boxes, not segmentation masks

Last Reviewed: 10/28/2025