This document discusses multimodal learning analytics (MmLA), which analyzes learning traces from multiple data sources to understand learning. It describes the rationale for MmLA as clickstream data alone provides limited insights. The process involves sensing data from tools like cameras and microphones, perceiving learning traces using techniques like pose detection and speech recognition, feature extraction, data fusion, and providing feedback. Examples of MmLA systems that provide automatic feedback on oral presentations are presented. Directions for MmLA include using non-intrusive sensors at scale, artificial sensing to augment humans, and connecting it to educational research to gain deeper insights into learning contexts.