Motion estimation and mapping with event cameras

Regular cameras suffer from disadvantages under certain conditions. For example, they are unable to capture blur-free images in highly dynamic or low illumination conditions. They are also unable to produce clear images when the image faces different parts of a scene with substantially different illumination (i.e. their HDR is limited). At MPL, we have have therefore started to investigate a still relatively new, bio-inspired visual sensor called an event camera or dynamic vision sensor. Our aim is to push the envelope of visual perception solutions towards highly challenging applications. Examples are given by autonomous race car driving or the tracking of highly dynamic objects.

Event cameras are bio-inspired sensors that perform well in challenging illumination conditions and have high temporal resolution. Rather than measuring frame-by-frame, the pixels of an event camera operate independently and asynchronously. Each one measures changes of the logarithmic brightness and returns them in the highly discretised form of time-stamped events indicating a relative change by a certain quantity since the last event.

Though the potential of event cameras in highly dynamic or challenging illumination conditions is somewhat clear, the complicated nature of the sensor data makes reliable, real-time SLAM a particularly hard problem to be solved. MPL has contributed to novel algorithms for event camera calibration, mapping, pose estimation, and event-based SLAM.

Event camera calibration

Camera calibration is an important prerequisite towards the solution of 3D computer vision problems. Traditional methods rely on static images of a calibration pattern. This raises interesting challenges towards the practical usage of event cameras, which notably require image change to produce sufficient measurements. The current standard for event camera calibration therefore consists of using flashing patterns. They have the advantage of simultaneously triggering events in all reprojected pattern feature locations, but it is difficult to construct or use such patterns in the field. We present the first dynamic event camera calibration algorithm. It calibrates directly from events captured during relative motion between camera and calibration pattern. The method is propelled by a novel feature extraction mechanism for calibration patterns, and leverages existing calibration tools before optimizing all parameters through a multi-segment continuous-time formulation. The resulting calibration method is highly convenient and reliably calibrates from data sequences spanning less than 10 seconds.

K. Huang, Y. Wang, and L. Kneip. Dynamic Event Camera Calibration. In Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS), 2021b. [pdf] [code] [youtube] [bilibili]

Stereo depth estimation and mapping

In one of our collaborations, we developed a solution to the problem of 3D reconstruction from data captured by a stereo event-camera rig moving in a static scene, such as in the context of stereo Simultaneous Localization and Mapping. The proposed method optimizes an energy function designed to exploit small-baseline spatio-temporal consistency of events triggered across both stereo image planes. In another recent work, we again explore the stereo depth estimation problem, this time from a hybrid RGB-event camera setup. The method relies on deep learning and employs an attention module.

Y Zhou, G Gallego, H Rebecq, L Kneip, H Li, and D Scaramuzza. Semi-dense 3d reconstruction with a stereo event camera. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, September 2018 [pdf]

Y.-F. Zuo, L. Cui, X. Peng, Y. Xu, S. Gao, X. Wang, and L. Kneip. Accurate Depth Estimation from a Hybrid Event-RGB Stereo Setup. In Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS), 2021.

Globally optimal contrast maximization and contrast maximization in 3D

The below works look at several motion estimation problems with event cameras. The flow of the events is hereby modelled by a general homographic warping in a space-time volume, and the objective is formulated as a maximisation of contrast within the image of warped events. The following problems have been solved:

Camera rotation estimation
Planar motion estimation with a downward facing camera
Optical flow

The core contribution of the below works consists of a globally optimal solution to these generally non-convex problems, which removes the dependency on a good initial guess plaguing prior local optimization methods. The methods rely on branch-and-bound optimisation and employ novel and efficient, recursive upper and lower bounds derived for six different contrast estimation functions. The basic principle is illustrated in the above figure.

X. Peng, Y. Wang, L. Gao, and L. Kneip. Globally-optimal event camera motion estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, August 2020 [pdf] [youtube] [bilibili]

X. Peng, L. Gao, Y. Wang, and L. Kneip. Globally-Optimal Contrast Maximisation for Event Cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2021.

In order to solve the case of full 3D motion estimation in arbitrary environments with a monocular camera, MPL presents a further approach that performs contrast maximization in 3D. The 3D location of the rays cast for each event is smoothly varied as a function of a continuous-time 3D trajectory parametrization, and the optimal parameters are found by maximizing the contrast in a volumetric ray density field. Interestingly, the method is the first an only one to date to perform genuine joint optimization over both motion and structure (whereas existing methods rely on less elegant alternating optimization frameworks).

Y Wang, J Yang, X Peng, P Wu, L Gao, K Huang, J Chen, and L Kneip. Visual Odometry with an Event Camera Using Continuous Ray Warping and Volumetric Contrast Maximization. MDPI Sensors, 22(15):5687, 2022b. Special issue [pdf] [video]

Eventail: Sparse, spatio-temporal geometric solvers for event camera motion estimation

Calculating the relative displacement of an event camera is a challenging topic as it is difficult to pursue classical solution strategies that rely on feature extraction and matching followed by robust geometric fitting (e.g. Ransac). The reason is that events simply do not form quasi-instantaneous frames in which one could easily detect corners or lines. Hence, contrast maximization has become a popular approach. However, contrast maximization is not a low-level approach that would rely on individual events, but rather requires dense (or at least semi-dense) image processing, a paradigm that is not necessarily in line with the original spirit of the event camera. There are certainly methods that bypass via the creation of frame-like representations (such as frames of accumulated events, time surface maps, etc.), but these methods tend to ignore temporal information somewhere along the process and are therefore inexact.

At MPL, we have initiated a new line of research in which we try to characterize exact geometric models that explain the location in space and time of single events under certain conditions. For example, if a camera observes a straight line under locally constant linear displacement, the location of the events generated by this line can be described by an exact parametric model that depends on the relative line geometry as well as partial velocity parameters. It is a beautiful theory that not only enables exact motion estimation through a single, parametric event clustering technique, but also a general handling of spatio-temporally sampling sensors.

X. Peng, W. Xu, J. Yang, and L. Kneip. Continuous Event-Line Constraint for Closed-Form Velocity Initialization. In Proceedings of the British Machine Vision Conference (BMVC), 2021. [pdf]

L. Gao, H. Su, D. Gehrig, M. Cannici, D. Scaramuzza, and L. Kneip. A 5-Point Minimal Solver for Event Camera Relative Motion Estimation. In Proceedings of the International Conference on Computer Vision (ICCV), 2023. Oral Presentation. [pdf] [video] [code]

L. Gao, D. Gehrig, H. Su, D. Scaramuzza, and L. Kneip. An n-point linear solver for line and motion estimation with event cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024. Oral presentation (0.8% acceptance rate!). [pdf] [code] [video]

W. Xu, X. Peng, and L. Kneip. Tight Fusion of Events and Inertial Measurements for Direct Velocity Estimation. IEEE Transactions on Robotics (T-RO), 40:240–256, 2023. [pdf]

Following method is a special case in which the motion of the camera is again assumed to be following the non-holonomic constraints of a ground vehicle:

W. Xu, S. Zhang, L. Cui, X. Peng, and L. Kneip. Event-based visual odometry on non-holonomic ground vehicles. In Proceedings of the International Conference on 3D Vision (3DV), 2024. [pdf] [code] [video]

Following paper also presents a series of novel sparse geometric solvers, which make additional use of normal flow.

Z. Ren, B. Liao, D. Kong, J. Li, P. Liu, L. Kneip, G. Gallego, and Y. Zhou. Motion and structure from event-based normal flow. In Proceedings of the European Conference on Computer Vision (ECCV), 2024. [pdf]

Cross-modal map-based tracking of event cameras under challenging conditions

An event camera is rather naturally a sensor to estimate motion, as indeed no measurements are generated unless etiher the camera or the elements in the scene move. Furthermore, owing to the noise properties of an event camera, it is very challenging to use an event camera as a mapping sensor. In order to make optimal use of the properties of an event camera, at MPL we have therefore more actively started to to investigate cross-modal prior map-based tracking of event cameras. Appending to previous methods for regular cameras (i.e. Canny-VO, see page on Visual SLAM), we have started with semi-dense map representations which can be created using off-the-shelf frameworks such as ORB-SLAM. Both monocular and single camera-inertial solutions have been proposed. More recently, MPL has also investigated wire-frame model based object tracking as well as tracking based on Gaussian Splatting representations. Owing to its ability to quickly render realistic frames, the latter approach works under more general conditions.

Y. Zuo, J. Yang, J. Chen, X. Wang, Y. Wang, and L. Kneip. DEVO: Depth-Event Camera Visual Odometry in Challenging Conditions. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2022. [pdf] [youtube]

R. Yuan, T. Liu, Z. Dai, Y.-F. Zuo, and L. Kneip. EVIT: Event-Based Visual-Inertial Tracking in Semi-Dense Maps Using Windowed Nonlinear Optimization. In Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS), 2024. [pdf] [youtube] [code]

Y. Zuo, W. Xu, X. Wang, Y. Wang, and L. Kneip. Cross-Modal Semi-Dense 6-DoF Tracking of an Event Camera in Challenging Conditions. IEEE Transactions on Robotics (T-RO), 40:1600–1616, 2024. [pdf] [code]

Z. Liu, B. Guan, Y. Shang, Q. Yu, and L. Kneip. Line-based 6-DoF Object Pose Estimation and Tracking With an Event Camera. IEEE Transactions on Image Processing (TIP), 33:4765–4780, 2024. [pdf]

T. Liu, R. Yuan, Y. Ju, X. Xu, J. Yang, X. Meng, X. Lagorce, and L. Kneip. GS-EVT: Cross-Modal Event Camera Tracking based on Gaussian Splatting. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2025. [pdf] [code] [youtube]

Simultaneous Tracking and Optical Camera Communication with Event Cameras

As the ubiquity of smart mobile devices continues to rise, Optical Camera Communication systems have gained more attention as a solution for efficient and private data streaming. This system utilizes optical cameras to receive data from digital screens via visible light. Despite their promise, most of them are hindered by dynamic factors such as screen refreshing and rapid camera motion. CMOS cameras, often serving as the receivers, suffer from limited frame rates and motion-induced image blur, which degrade overall performance. To address these challenges, this paper unveils a novel system that utilizes event cameras. We introduce a dynamic visual marker and design event-based tracking algorithms to achieve fast localization and data streaming. Remarkably, the event camera’s unique capabilities mitigate issues related to screen refresh rates and camera motion, enabling a high throughput of up to 114 Kbps in static conditions, and a 1 cm localization accuracy with 1% bit error rate under various camera motions.

H. Su, L. Gao, T. Liu, and L. Kneip. Motion-aware optical camera communication with event cameras. Robotics and Automation Letters (RAL), 10(2):1385–1392, 2025. [pdf] [video] [code]

Miscellaneous

Below is a list of further MPL contributions on event-based vision, in particular on datasets and image processing problems. Feel free to explore those works, too, and head over to the datasets page for a list of all our datasets.

L. Gao, Y. Liang, J. Yang, S. Wu, C. Wang, J. Chen, and L. Kneip. VECtor: A Versatile Event-Centric Benchmark for Multi-Sensor SLAM. Robotics and Automation Letters (RAL), 7(3): 8217–8224, 2022. [pdf] [supplementary] [code] [youtube]

J. Chen, Y. Zhu, D. Lian, J. Yang, Y. Wang, R. Zhang, X. Liu, S. Qian, L. Kneip, and S. Gao. Event-based video frame interpolation. In Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS), 2023. [pdf]

X Ji, J Wei, Y Wang, H Shang, and L Kneip. Cross-modal place recognition in image databases using event-based sensors. ArXiv e-prints, 2023. [pdf]