Embodied-AI for Aerial Robots: What do we need for full autonomy?

Organizers and Lecturers

  • Nitin J. Sanket
    Worcester Polytechnic Institute
  • Guanrui Li
    Worcester Polytechnic Institute
  • Sihao Sun
    Delft University of Technology
  • Melissa Greeff
    Queen’s University

Summary
Aerial robots present a peculiar challenge for autonomy: the requirement for high-rate processing with a dearth of computation and sensing capabilities. They push the limits of what can be possible with onboard sensing and computation in robotics. To solve complex autonomy tasks on aerial robots, the approach has to be ingenious and creative, breaking away from the norm of how they are solved on other types of robots with substantially more computing and sensing capabilities. Furthermore, with the advent of the Artificial Intelligence (AI) revolution in robotics, the utilization of embodiment (knowledge of self) becomes pivotal to the success of AI-based approaches on resource-constrained aerial robots. 

Data for Generalized Embodied AI Models:

Creating generalized embodied AI models for aerial robots requires data that reflects the complexities and challenges unique to this domain. Unlike ground-based robots or manipulators, aerial robots usually navigate in three-dimensional, often unstructured environments such as forests, deserts, or mountainous areas. These conditions demand that data collection for aerial robots must extend beyond traditional sources, which are often constrained to urban or controlled settings designed for cars or manipulators. Furthermore, current open-source datasets are predominantly designed for other platforms, making the transfer of knowledge to aerial systems inefficient and incomplete.

The proposed workshop seeks to address this gap by fostering global, cross-institutional collaboration aimed at creating large-scale, multimodal datasets that capture the unique operational environments of aerial robots. These datasets should span a range of conditions, including varying weather, altitudes, and geographical terrains, to ensure models can generalize effectively to real-world tasks. Collecting data from dynamic, unstructured environments such as disaster zones, agricultural fields, and remote wilderness areas will be critical for improving model robustness.

In addition to real-world data collection, the workshop will emphasize the complementary role of high-fidelity simulations in generating diverse training data. Simulation platforms can provide scalable, safe, and cost-effective means of testing aerial robots in complex environments that may be difficult or dangerous to access in the real world. By focusing on strategies such as data augmentation and transfer learning, the workshop aims to improve model flexibility and adaptability across a range of deployment scenarios.

Efficient Neural Networks for Aerial Systems:

Designing neural network models for tiny aerial robots (< 1Kg of All Up Weight) with reduced Size, Weight, Area and Power (SWAP) constraints requires a delicate balance between computational efficiency and high-level intelligence. These robots operate under strict computational and power constraints, making it essential to design neural network structures that can process data in real-time, with minimal latency and low energy consumption. In the proposed workshop, we will explore and discuss the recent development of novel architectures such as Spiking Neural Networks (SNNs), mobileSAM, and Dinov2, which are specifically optimized for deployment on resource-constrained aerial robot platforms. 

We also reckon that the co-design of hardware and AI is equally important to this topic. By leveraging hardware structures designed specifically for AI acceleration, such as edge computing modules with integrated CPUs, Graphics Processing Units (GPUs), Neural Processing Units (NPUs), and neuromorphic chips, aerial robots can achieve higher performance for demanding tasks like obstacle avoidance and decision-making, meanwhile maintaining low energy consumption. 

In addition to architectural and hardware considerations, the selection of input and output modalities is a key factor in designing efficient neural networks for aerial systems. Multimodal inputs, such as visual data, inertial measurements, and proprioception, are essential for enabling robust navigation and autonomy in dynamic and unpredictable environments. Efficient processing of these inputs supports critical tasks, including obstacle avoidance, human-object interaction, and mission-specific autonomy. The choice of output modalities should be context-driven, emphasizing fast and accurate decision-making to meet the specific needs of the task at hand.

Furthermore, embedding an awareness of the robot’s physical state and limitations into neural network models can significantly improve resilience in challenging conditions, such as sensory failures or environmental disruptions. This embodied intelligence is vital for ensuring real-time, precise decision-making in mission-critical applications such as search and rescue, environmental monitoring, and disaster response. By integrating these advanced neural network designs with hardware optimization and multimodal processing, we can push the boundaries of autonomous aerial systems.

Generalized World Intelligence for Flexible Adaptation:

Embodied AI requires not only advanced perception but also the ability to abstract and generalize world models for real-world adaptability. This capability is especially critical for tiny aerial robots to navigate dynamic and unpredictable environments. These robots need minimal yet highly flexible world models that allow for rapid adaptation to new or changing conditions without requiring significant computational resources. By developing such models, we enable aerial robots to generalize across a wide spectrum of environments, facilitating seamless operations in diverse, real-world scenarios.

In this workshop, we will examine the methods and frameworks required to build these generalized world intelligence systems, with a focus on their role in enhancing the adaptability of tiny aerial robots to novel or zero-shot environments. These models allow robots to switch between tasks and domains without extensive retraining, thereby improving efficiency and scalability. Moreover, they must be capable of handling uncertainty, such as incomplete or noisy sensory data, which is common in complex, real-world environments.

Additionally, the workshop will explore the compositionality of foundational models—how simpler models can be combined and adapted to solve complex, multi-step tasks in real-world settings. By addressing these challenges, we aim to push the boundaries of embodied AI, enabling aerial robots to operate autonomously across a wider array of applications, including search and rescue, environmental monitoring, and infrastructure inspection.

This workshop focuses on the emerging need for onboard embodied AI in tiny aerial robots (< 1Kg All Up Weight), particularly those with tight constraints on computational resources. The expected impact is to drive the development of more efficient, scalable AI models that can power the next generation of autonomous aerial robots. This work will have broad implications for applications such as search and rescue, plant pollination environmental monitoring, and smart infrastructure inspection.

Workshop Objectives
The objective of the workshop is to encourage discussion among the participants to come up with focus areas for embodied aerial autonomy in the near and far future.  Our invited talks will cover subject experts from various sub-fields, cultural and technical backgrounds such that they can bring different perspectives on solving the common problems in embodied AI for aerial robots. This will ensure that we provide and collate the highest quality content and enable young and seasoned researchers to think out of the box to solve classical problems in a new light with the latest toolkits and frameworks. Our panel discussions will involve questions that probe the curiosity of the audience and take them on a philosophical thrilling adventure of research questions for the future.

Key Topics

  1. High-fidelity simulation for data generation
  1. Sim2real transfer
  1. Reinforcement learning for aerial robots
  1. Multi-modal learning for aerial autonomy
  1. Embodied AI models for aerial robots
  1.  Size, Weight, Area and Power (SWAP) – aware design
  1. Foundational models for autonomy under SWAP constraints
  1. Bio-inspired navigation
  1. Efficient neural network designs
  1. Spiking neural network
  1. Generalized world intelligence for flexible adaptation
  1. Online adaptation learning
  1. Zero-shot generalization for navigation, action and recognition
  1. Evaluation and benchmarking methods for aerial robot autonomy
  1. Light-weight sensor fusion for autonomy, SLAM and odometry

Workshop Format
The workshop will consist of invited talks and panel discussions.

Target Audience
This workshop is suitable for young researchers, graduate students conducting research in related areas, scientists, and engineers, as well as developers and manufacturers of autonomous UAVs.

Tentative Outcome
It is expected that participants will obtain a deep understanding of emboided AI challenges, tools, methods that need to be addressed and overcome to advance autonomy. The discussions will unravel and set the stage for short and long term research goals for the community at large.