Research at MPL
The Mobile Perception Lab researches on 3D perception solutions for emerging smart mobile systems. Important examples are given by robots (e.g. service robots, factory robots), smart vehicles, or intelligence augmentation devices such as mixed or augmented reality headsets. Our focus lies on using visual sensors as a primary exteroceptive sensing modality through which such systems will be enabled to perceive and understand our complex everyday environments.
An important solution to the visual 3D perception problem for smart mobile devices is given by Simultaneous Localization And Mapping (SLAM). SLAM aims at a simultaneous reconstruction of a 3D reference model of the environment while at the same time keeping the employed exteroceptive sensor(s) localized on top of this representation. In its simplest form, the generated representation of the environment appears in the form of a sparse set of 3D landmarks (i.e. a point cloud) that can be stably redetected inside images, thus enabling the continuous localization or even relocalization of any body that carries the sensing device. The latter will however not be able to use this representation for anything more than just mere localization. A more advanced form of environment representation is given by dense visual SLAM algorithms, which aim at a representation of all surfaces and structural boundaries in a given 3D environment. While still enabling localization, such representations can also be used in order to for example evaluate the traversability of certain areas and–more generally–apply path planning algorithms. At MPL, we now aim at going yet one step further by developing novel visual SLAM solutions in which the generated models of the environment cover everything reaching from low-level geometry to higher level aspects such as scene composition, object poses or shapes, dynamics, and semantic meaning. We believe that this will provide tomorrow's smart mobile devices with the required robust, accurate, and efficient localization, 3D scene geometry perception, and–most importantly–scene understanding capabilities. Joint geometric-semantic representations will enable machines to not only perform collision-free navigation in an environment, but provide a solid basis for the execution of complex tasks that simultaneously require an understanding of the object-level scene composition as well as semantic interpretation. Specific examples of devices that are required to have an either virtual or real interaction with the environment are given by an AR headset performing complex scene augmentation, or a service robot that is required to find and collect all objects of a certain kind.
Our research often happens at the intersection of "Sensors", "Algorithms", and "Applications":
- Sensors: While our research often relies on the input of a single regular or consumer depth camera, we also explore the potential given by more exotic camera architectures, such as multi-perspective camera arrays, light-field cameras, and event-based sensors. The latter in particular promise to help in difficult illumination or high-speed conditions, and currently represent a major focus of our research.
- Algorithms: Our perception solutions often combine more classical model and optimization based methods with modern, data-driven techniques, a paradigm that extends traditional SLAM and is also known as Spatial AI. This marriage aims at applying prior knowledge through network predictions while at the same time relying on bottom-up, first principles (e.g. physical constraints, geometric or photometric consistency) to continuously verify the plausibility of the generated forward predictions. In our research, we furthermore often make use of convex optimization, global optimization, algebraic geometry, and continuous-time optimization for solving geometric calibration and registration problems. Though not all of these theories are able to satisfy the computational efficiency demands of real-time applications, they are important in many offline scenarios and help us to gain a better understanding about the underlying problems.
- Applications: General purpose solutions often fail to deliver high robustness in all situations. A substantial improvement can be achieved by inlcuding application specific constraints into the design of algorithms. For example, the motion of a platform may obey a specific motion model, which in turn can be included as a regularization term in simultaneous localization and mapping paradigms. We may also find that certain environments contain a very specific set of stably detectable features, and thus train and employ higher-level detectors for such features, or choose low-dimensional parametric models in order to optimize for their geometry. An important part of our research looks into how such application-specific cues can be used in order to develop tailored solutions with substantial advantages over general purpose methods.
You may also browse MPL's research by topics for further information. Note that many works fall into different categories, so you will find that they appear in different contexts:
- Geometric camera pose calculation and algebraic geometry
- Camera architectures
- 360 Multi-perspective cameras
- Globally optimal methods
- Visual SLAM
- Geometry for dynamic vision sensors
- Vision for Ackermann vehicles
- Point-set registration methods
- Spatial AI
- Camera calibration
- Continuous-time representations