AI-Integrated Unmanned Ground Rover with Visual Perception and Autonomous Decision-Making (Empowered with AI | Vision | Emotions | Protection)
Author(s): 1. Nellaturu Praveen, 2. Professor S. Swarnalatha
Authors Affiliations:
1Master of Technology, 2 Professor (Ph.D),
Department of Electronics and Communication Engineering, Sri Venkateswara University College of Engineering Tirupati, Andhra Pradesh, India
DOIs:10.2015/IJIRMF/202511020     |     Paper ID: IJIRMF202511020This paper presents a next-generation humanoid smart rover that seamlessly integrates Artificial Intelligence (AI), Machine Learning (ML), real-time vision, and voice-driven interactions. It serves not only as a vigilant security companion capable of patrolling and surveillance but also as an emotionally aware entertainer to uplift the morale of soldiers at outposts. This unique system combines a dual-core architecture with an ESP32 master module handling sensory intelligence, mobility, and voice synthesis, while an AI-Thinker ESP32-CAM slave module delivers live video streaming and pan/tilt operations.
Enhancements include the integration of the AI-Thinker VC-02 voice assistant module, enabling natural voice commands and conversational interactions, and an AI+ML module, adding smart patrolling and autonomous area learning. The platform stands as designed for soldiers on remote outposts, it:
Patrols & Protects: Learns its environment, automatically follows safe routes, and raises alerts if something unusual happens.
Sees & Understands: Streams real-time video (via the ESP32CAM) and can pan or tilt its camera to "look" around corners or track targets.
Speaks & Listens: Uses the AI-Thinker VC-02 module to recognize spoken commands ("Start patrol," "Show me the map," "Dance!") and to reply naturally in voice.
Learns & Adapts: Employs an AI/ML model right on the ESP32 master to improve its patrol routes over time and to trigger a playful dancing routine when duty is done.
Feels & Expresses: Displays simple emotive icons on a small OLED ("😊" when all clear, "⚠️" on alert) and modulates its voice tone to match the situation—reassuring in emergencies, upbeat when entertaining.
By merging these capabilities—secure patrols, environmental sensing (BMP280, MQ135), GPS location, AI-driven voice dialogue, and emotion-aware feedback—this rover becomes both a reliable guardian and a cheerful companion.
1. Nellaturu Praveen, 2. Professor S. Swarnalatha (2025); AI-Integrated Unmanned Ground Rover with Visual Perception and Autonomous Decision-Making (Empowered with AI | Vision | Emotions | Protection), International Journal for Innovative Research in Multidisciplinary Field, ISSN(O): 2455-0620, Vol-11, Issue-11, Pp. 121-131. Available on – https://www.ijirmf.com/
- G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv:1704.04861, 2017.
- Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. Cambridge, MA: MIT Press, 2005.
- Siegwart and I. R. Nourbakhsh, Introduction to Autonomous Mobile Robots. Cambridge, MA: MIT Press, 2004.
- Hinton et al., “Deep Neural Networks for Acoustic Modeling in Speech Recognition,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, Nov. 2012.
- S. Department of Defense, Summary of the 2018 National Defense Strategy, Washington, DC, 2018.
- Al-Fuqaha et al., “Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications,” IEEE Communications Surveys & Tutorials, 2015.
- Fong, I. Nourbakhsh, and K. Dautenhahn, “A Survey of Socially Interactive Robots,” Robotics and Autonomous Systems, 2003.
- Warden and D. Situnayake, TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers. O’Reilly Media, 2019.
- L. Hall and J. Llinas, “An Introduction to Multisensor Data Fusion,” Proceedings of the IEEE, 1997.

