Mason Wang

Concept Activation Vectors

Specify a concept by collecting examples train a classifier on these examples wrt random examples or another group (e.g. stripes vs dots) take the orthogonal vector to this classifier (CAV) compute the directional derivative of a class label (e.g. zebra) wrt CAV can use to tell which concepts inform classifier decision other use cases (see notes)

Last Reviewed: 10/27/24 Reference # 1