Re-Bottleneck

codecs useful for compression, transmission, feature-extraction, latent-space generation.

reconstruction objectives might not be the best for latent structure.

re-bottleneck - latent space inner bottleneck used to instill user-defined structure

enforce an ordering on latent channels
align latents with semantic embeddings, use for downstream diffusion modeling
equivariance - filtering operation on waveform corresponds to a specific latent space operation.

codecs are used for next-token prediction and diffusion, and supports classification, enhancement, and source separation.

latent structure needed for these tasks, but does not emerge from reconstruction loss.

adapting from this means

doing coarse-to-fine diffusion
special token prediction order
retrain autoencoder from scratch to include these properties, task-specific tokenizers, designing models for semantic alignment - this is costlhy computationally, and require substantial architectural modifications.

structured representations help latent diffusion in other domains

semantic alignment in audio or equivariance is promising but requires retraining, what a waste of pretrained codecs.

Method

train an inner autoencoder, with a reconstruction objective on the latent, along with a discriminator

avoid computation/tuning with waveform losses - only latent domain losses

Last Reviewed: 7/16/2025