Mason Wang

Similarity-Based Methods

Some data

use another network to extract some target

compare similarity between target and data,

SimCLR, MoCo, JEPA, BYOL, SWAV, SimSiam, Dino

In depth

Want to organize data

what kind of inductive biases can we design?

Information retrieval

information retrieval - like a library - organized to findinformation faster

corpus of N documents - long pieces of text, PDFs. Goal is search, find top-k most relevant given query.

k « N

succcess @ 1 = anything in set is good precision @ k - fractcion of things that are relevant that you have retried recalll - fraction of relevant items in top K / total known relevant items.

must have sub-linear latency

before 2019 - matching keywords

Transforms:

N-way regression? huge computation — too many corpus weights?

Need a more compositional approach

Decomposition:

Better latency?

Dual encoders

sample negative documents randomly?

proof

if dimensions are big enough, and encoder is big enough, we can approximate any continous scoring function.

dot products are very expressive

Dimensinons

can design retrieval problems where in order for vector representations to work, need very large dimension (millions) to even be expressive with dual encoders

larger embedding dimensions are better, but it’s a loc scale

can get zero trainig loss, but not generalize.

Last Reviewed: 10/26/2025