Mason Wang

MLP Interpretation - UDL

Shallow MLPs clip linear functions, rescale, and combine.
$D$ hidden units means $D+1$ Linear Regions
Multivariate outputs are all clipped at the same joints
There’s a Multivariate Input Visualization in the book
All ReLU MLPs split input space into Linear Regions
“Folding” interpretation
Adding a Layer is clipping Each Linear Region, and recombining
Bottlenecks are restricting weights to outer product
Depth efficiency is exponential compared to width efficiency
Depth generalizes and trains better
Swishes solve Dying ReLU
Weights can be rescaled as long as biases are too
Depth approximation theorem

Last Reviewed: 11/1/24