- Introduction In this paper, we review the principles, methodologies, and applications of intention estimation based on human gaze, particularly focusing on Human-Robot Interaction (HRI). Estimating human intentions has become a crucial area in the interaction between humans and artificial systems, designed to assist and coordinate human actions. By predicting human intentions, it becomes possible to reduce the Cognitive Load on humans and understand their goals without explicit instructions. Gaze serves as a reliable behavioral indicator that helps predict individual steps during task execution. This paper aims to connect insights from psychological studies on oculomotor control with technical applications of gaze-based intention recognition, with a specific focus on remote operation robot systems and assistive robot systems. It also examines important challenges in the design of gaze-based intention recognition systems.
- Intention, Gaze, and Behavior Terms like intention recognition, estimation, prediction, inference, action prediction, activity recognition, and task prediction all represent the same concept of recognizing and predicting human behavior. These systems infer human intentions based on observable behavioral indicators. Gaze, especially in estimating intentions in physical actions, becomes a crucial clue in predicting intentions before actions commence. Intentions play a central role in action generation and agency, explaining actions and being associated with preparing, inducing, and controlling actions over time. Intentions are categorized into three levels - Distant Intent, Proximal Intent, and Motor Intent - each functioning on different time scales.
2.2 Prediction Based on Gaze and Hand-Eye Coordination in Natural Tasks Tasks influence attention and shape gaze patterns. In interactions with objects, intentional actions affect the performance of visual exploration. Motor control and attention are closely related, with visual information processing and action selection being carried out by shared mechanisms. To prepare and control actions, the visual system utilizes past experiences and knowledge to predict crucial information related to actions. In tasks involving hand-eye coordination for object manipulation, gaze plays a vital role in predicting hand movements. Gaze predicts contact points on objects, helps in devising appropriate grasping plans, and guides hand movements.
- Gaze Features and Models for Intention Prediction Gaze features encompass various aspects such as gaze position, gaze movements, gaze sequences, and semantic features. Gaze position is used for detecting gaze events like fixation and saccades, which are linked to visual inputs in visual processing. Gaze sequences are employed for recognizing and predicting tasks or activities, and transition patterns within sequences can be learned using Markov models. Intention recognition systems input gaze features into classifiers to output the most probable intentions.> Intention Recognition Intention recognition is often modeled using probabilistic graphical models such as Dynamic Bayesian Networks and [Hidden Markov Models].
- I see, intentions are like hidden nodes.
Recurrent Neural Networks, especially LSTMs, are well-suited for processing gaze data that changes over time.
4. Gaze-Based Intention Estimation in Technological Fields In computer vision, gaze features are mainly used for activity recognition and task prediction. In Human-Robot Interaction (HRI), gaze is used to predict robot movements in advance and assist humans.
- Is this like interaction with robots as if they were another person? Predicting intentions by following the other party’s gaze, similar to what humans do?
In collaborative work and social robotics, robots collaborate by inferring human intentions to assist in tasks. In assistive technology and smart remote operations, gaze guides robot movements to reduce human cognitive load.
- This seems closer to FMRG.
In Advanced Driver-Assistance Systems (ADAS), gaze is used to predict drivers’ intentions and enhance safety.
- This is also interesting.
4.1 Computer Vision and Human-Computer Interaction Gaze is used to predict tasks when viewing images or videos. Gaze is utilized to recognize object manipulation tasks in Virtual Reality (VR) and interactive games. In VR games, gaze is used to predict players’ goals and personalize gameplay.
4.2 Human-Robot Interaction In human-robot collaborative work, robots infer human intentions to support collaborative tasks. Gaze helps robots understand human intentions and plan appropriate actions. In assistive technology and smart remote operations, robots infer human intentions to assist in operations. Gaze is used to control robot arm movements based on human intentions. Gaze reduces human cognitive load and improves operational efficiency.
4.3 Driver Intention Estimation in ADAS
- ADAS = Advanced Driver-Assistance Systems ADAS monitors driver cognitive states to enhance safety. Gaze is used to predict driver intentions and assist in operations like lane changes and braking. In semi-autonomous driving systems, gaze is used to adjust driver intentions and system operations.
5. Current Limitations, Challenges, and Future Tasks Gaze has a multitasking nature, making it challenging to predict human intentions. While gaze is used as a proxy for attention, the information targeted by gaze may not always be consciously recognized. Gaze-based intention estimation systems may influence human gaze patterns, leading to issues of user adaptation and trust. Systems need to adjust operations based on user adaptation. Accurate acquisition of gaze data in a 3D space and associating it with real-world objects is crucial.
6. Conclusion Gaze-based intention estimation is a crucial technology that links human goals and actions. Gaze serves as a vital clue to make human-machine interaction more effective and natural. The application of gaze-based intention estimation technology is expected to expand into various fields in the future.To further develop intention estimation technology based on gaze, it is important to deeply understand the relationship between human cognitive processes and gaze patterns.