Speaking Memories: A Multimodal Adaptive Dialogue Framework for Reminiscence Robotics

Sep 28, 2025·
Zachary Zhao
Zachary Zhao
· 1 min read
Abstract
We present Speaking Memories, a multimodal adaptive dialogue framework designed for reminiscence robotics. This framework integrates auditory, visual, and textual inputs to support emotion-aware and personalized conversations with older adults, particularly those facing dementia. We have redesigned the system, using an edge-based interaction server that replaces the robot’s native hardware. This leads to a lower latency, improved robustness, and enhanced privacy. The new design couples a secure reminiscence media portal, a locally executed multimodal inference pipeline, and an embodied robotic agent into a scalable distributed system. We evaluate the framework through real-world deployments with older adults and caregivers, measuring both quantitative and qualitative metrics - including end-to-end latency, dialogue coherence, and interaction stability - as well as caregiver and participant evaluations of usability and engagement. The results demonstrate an end-to-end latency of under 6 seconds, high contextual coherence across dialogue turns, and consistently positive feedback from users and caregivers. We also release a structured multimodal dataset that synchronizes user inputs, affective cues, and system outputs to support future fine-tuning of vision-language and dialogue models. Together, these contributions establish Speaking Memories as a deployable and generalizable platform for adaptive reminiscence robotics.
Type
Publication
IEEE Transactions on Robotics (T-RO)

This work is driven by the results in my previous paper on HRI.