A room’s acoustic properties are a product of the room’s geometry, as well as the objects within the room and their specific positions. A room’s acoustic properties can be characterized by its impulse response (RIR) between a source and listener location, or inferred roughly from recordings of natural signals present in the room. We present SoundCam, the largest dataset of unique RIRs from in-the-wild rooms released to date publicly. It includes 5,000 10-channel real-world measurements of room impulse responses and 2,000 10-channel recordings of music in three different rooms, including a controlled acoustic lab, an in-the-wild living room, and a conference room, with different humans in positions throughout each room. We show that these measurements can be used for interesting tasks, such as detecting and identifying the human, and tracking their position.
The SoundCam dataset can be used to evaluate methods which:
Below, we show some results from our best performing baseline for localization using a single RIR, in the acoustically treated room.
The dataset is hosted by the Stanford Data Repository: https://purl.stanford.edu/xq364hd5023
The compressed archives include both raw recordings and preprocessed impulse responses for all the subdatasets used in our experiments. Subdatasets are sorted by room, with some rooms' archives including recordings and data from more than one distinct experiment. 3Dscans.tar.gz includes textured 3D scans of each room, along with 3D scans of each human in the dataset (untextured to preserve anonymity).
We provide a small downloadable sample dataset: Download TreatedRoomSmallSet The files are from the Treated Room, preprocessed, but the number of data points has been significantly reduced. Information on the data's organization is included below.
The preprocessed data will serve most use cases. Its organization is as follows:
Each subdataset file contains
Each data folder contains some or all of these files:
The raw files are provided for completeness. Each folder contains raw recordings from each of the recording channels, as well as the skeletal poses from each camera, and depth maps.
Mason Wang and Samuel Clarke are maintaining the dataset. Mason Wang can be contacted at ycda@stanford.edu, and Samuel Clarke can be contacted at spclarke@stanford.edu.
Please contact us if you notice any errors with the dataset. To the extent that we notice errors, they will be fixed and the dataset will be updated. Previous versions of the dataset will be maintained. Errors and previous versions will be posted below.
@inproceedings{wang2023soundcam, title={SoundCam: A Dataset for Finding Humans Using Room Acoustics}, author={Mason Wang and Samuel Clarke and Jui-Hsien Wang and Ruohan Gao and Jiajun Wu}, booktitle={Advances in Neural Informaion Processing Systems}, year={2023} }