Download Data Codes Participants Call
DARai is a multimodal, multi-view dataset capturing human daily activities in home-like environments, leveraging 20 heterogeneous modalities focused on sensing humans and their surroundings, synchronized with visual data.
The Sensors
The Activities
Our data collection encompassed four primary scenes: living room, office, kitchen, and dining area. Activities within these settings encompass office tasks, entertainment, chores, exercise, rest, and cooking. Subjects were engaged in diverse actions to enable cross-domain activity analysis in authentic home-like environments. Participants transitioned naturally between daily tasks without imposed start commands or strict time constraints. Activities were categorized into control and counterfactual sets, with two recording sessions per environment to introduce variability in task execution.
The Annotations
Our hierarchical data structure has multi-level of annotations as follow:
- Level 1, Activities, represent high-level concepts that are generally recognized as independent tasks.
- Level 2, Actions, are recurring patterns found across multiple activities.
- Level 3, Procedures, distinguish between different instances of an action, addressing “how” and “what” questions about the actions.
- Level 4, Interactions, describe relationships between one or two objects, often connected and characterized by a verb. Language description provided by annotators.
Annotation Format: Local Annotation Path/ {L1 Activity Folder}/ {L1 Activity name}_Sxx_sessionxx_Level_x_Annotations e.g. Writing_S11_session01_Level_2_Annotations
Folder Structure and File Formats
For easier access, we organized our folder structure with data modalities as the top-level folders. Under each modality, we have folders for activity labels (level 1)
Naming convention: The identifier for sample files follows the format {2-digit subject id}_{session id}.{file format}, e.g., 01_3.csv. Separate sample files are maintained for each view, activity, and data modality.
Since every vision model needs to extract frames from video samples as a preprocessing step, we have already shared extracted frames for faster processing within the community. Thus, in visual data, a 5-digit frame number will be added at the end of the sample identifier. E.g.: 01_03_00000.jpg
Data Modality | RGB | Depth | IR | Depth Confidence | Audio | Timeseries |
File Format | jpg | png | png | png | wav | csv |
Ethics and Data Collection Process
We obtained approval from an Institutional Review Board (IRB) to conduct this study and collect data from human subjects.
Our recording setup includes two separate environments designed to mimic a home-like living room, home office, kitchen, and dining spaces, with six varying environmental conditions (light, time of day, air conditioning, and background noise).
Every two recording sessions are divided into a control and counterfactual session where subjects perform the same activity with enforced variations, such as moving a heavy box instead of a light box or playing a speed test game versus a reaction test game.
Citation and Usage
BibTeX
@data{ecnr-hy49-24,
doi = {10.21227/ecnr-hy49},
url = {https://dx.doi.org/10.21227/ecnr-hy49},
author = {Kaviani, Ghazal and Yarici, Yavuz and Prabhushankar, Mohit and AlRegib, Ghassan and Solh, Mashhour and Patil, Ameya},
publisher = {IEEE Dataport},
title = {DARai: Daily Activity Recordings for AI and ML aplications},
year = {2024} }
Contact Us
Acknowledgments
This work is supported by Amazon Lab126.