R-CNN

R-CNN - regional propsoal network - split object detection into two stages - first propose region, then classify object

get a bunch of region proposals using “selective search” (merges simlar regions using algorithms)

regize each region to 227 x 227 size, after dilating the bounuding box

selective search has fast mode during test time

SVM on feature vectors trained for that class, do non-maximum suppression to remove overlapping regions, by rejecting a region with an IoU of a higher scoring region

Trainingss

strain the CNN on a dataset using image-level annotation (imagenet)
domain adapt using warped region proposals, replace linear layer
greater than 0.5 IoU with ground truth box as positivies, lower learning rate by 0.1, 32 positive windows, 96 background windows.
biased sampling toward positive windows.
0.3 IoU overlap thresehold between positive and negative examples, determined by validation set.
optimize one linear SVM per class
hard negative mining

selective search generates 2000 candidate recatngles, and linear regression prediccts 4 offsets

first, generate region proposals likely to contain objects (bounding boxes)

each region is cropped and resized (ROI polling)

feature extraction - CNN extracts features

object classification and bounding box regression - determine object class, and refine coordinates.

produces bounding boxes, not segmentation masks

Last Reviewed: 10/28/2025