I am an EECS PhD student at MIT CSAIL, where I am working with Professor Anna Huang. My research is at the intersection of audio, machine learning, and signal processing.
You can contact me at ycda [at] stanford [dot] edu. Or, you can find me on
Twitter, LinkedIn.
Research Interests
Capturing Real Auditory Scenes: Recently, my work has been on virtualizing real
auditory scenes and acoustic spaces. For instance, imagine being able to capture a video of a
concert, and then moving around the concert space freely. Imagine taking several videos of a fireworks show, and compiling them into
an interactive 3D experience. Perhaps you could capture the
intrinsic acoustic properties of your living room, in a way that allows you to listen to your
favorite artist there.
Differentiable and Inverse Audio Rendering: Audio renderers often require slow and
non-differentiable techniques. This makes it difficult to fit to real scenes via gradient-based
optimization processes, and thus, often results in audio simulations that are not accurate to the
real-world sounds they attempt to replicate. Inspired by visual inverse rendering and capture
techniques, I believe combining physical inductive biases with machine learning can help us fit
simulations to real scenes, and thus make them more accurate.
AI assisted Sound Design and Music-making: Making music requires many steps:
writing melodies/themes, chord progressions, arrangement, sound design, mixing, mastering, etc. Some musicians are more
inclined towards certain parts of this process. My goal is to provide musical artists with controllable
assistance for parts of the music-making process they are unfamiliar with.
We create a method of capturing real acoustic spaces from 12 RIR measurements, letting us play any audio signal in the room and listen from any location/orientation. We develop an 'audio inverse-rendering framework' that allows us to synthesize the room's acoustics (monoaural and binaural RIRs) at novel locations and create immersive auditory experiences (simulating music).
Humans induce subtle changes to the room's acoustic properties. We can observe these changes (explicitly via RIR measurement, or by playing and recording music in the room) and determine a person's location, presence, and identity.
Everyday objects possess distinct sonic characteristics determined by their shape and material. RealImpact is the largest dataset of object impact sounds to date, with 150,000 recordings of impact sounds from 50 objects of varying shape and material.