Transformers
Transformer Basics Rotary Embeddings (Review) LayerNorm, projecting latent onto hypersphere MQA, GQA SwiGLU Prenorm vs postnorm Last Reviewed: 6/1/24
Transformer Basics Rotary Embeddings (Review) LayerNorm, projecting latent onto hypersphere MQA, GQA SwiGLU Prenorm vs postnorm Last Reviewed: 6/1/24