SoundCam: A Dataset for Finding Humans Using Room Acoustics

NeurIPS 2023 Datasets and Benchmarks



A room’s acoustic properties are a product of the room’s geometry, as well as the objects within the room and their specific positions. A room’s acoustic properties can be characterized by its impulse response (RIR) between a source and listener location, or inferred roughly from recordings of natural signals present in the room. We present SoundCam, the largest dataset of unique RIRs from in-the-wild rooms released to date publicly. It includes 5,000 10-channel real-world measurements of room impulse responses and 2,000 10-channel recordings of music in three different rooms, including a controlled acoustic lab, an in-the-wild living room, and a conference room, with different humans in positions throughout each room. We show that these measurements can be used for interesting tasks, such as detecting and identifying the human, and tracking their position.


Dark Room

Dark Room

Living Room

Living Room

Conference Room

Conference Room
In each room, we collect 1000 measurements of the room's acoustic impulse response, while varying the location, presence, and identity of a human in the room. Each impulse response is measured from 10 microphones.


The SoundCam dataset can be used to evaluate methods which:

Below, we show some results from our best performing baseline for localization using a single RIR, in the acoustically treated room.



The dataset is hosted by the Stanford Data Repository:

The compressed archives include both raw recordings and preprocessed impulse responses for all the subdatasets used in our experiments. Subdatasets are sorted by room, with some rooms' archives including recordings and data from more than one distinct experiment. 3Dscans.tar.gz includes textured 3D scans of each room, along with 3D scans of each human in the dataset (untextured to preserve anonymity).

Sample Dataset

We provide a small downloadable sample dataset: Download TreatedRoomSmallSet The files are from the Treated Room, preprocessed, but the number of data points has been significantly reduced. Information on the data's organization is included below.

Dataset Organization

The preprocessed data will serve most use cases. Its organization is as follows:


Each subdataset file contains

Preprocessed Files

Each data folder contains some or all of these files:

Raw Files

The raw files are provided for completeness. Each folder contains raw recordings from each of the recording channels, as well as the skeletal poses from each camera, and depth maps.


Mason Wang and Samuel Clarke are maintaining the dataset. Mason Wang can be contacted at, and Samuel Clarke can be contacted at

Please contact us if you notice any errors with the dataset. To the extent that we notice errors, they will be fixed and the dataset will be updated. Previous versions of the dataset will be maintained. Errors and previous versions will be posted below.


    title={SoundCam: A Dataset for Finding Humans Using Room Acoustics},
    author={Mason Wang and Samuel Clarke and Jui-Hsien Wang and Ruohan Gao and Jiajun Wu},
    booktitle={Advances in Neural Informaion Processing Systems},