Skip to main content

A Baseline for 3D Multi-Object Tracking

-By Xinshuo Weng, Kris Kitani Carnegie Mellon University


Abstract
 3D multi-object tracking (MOT) is an essential component technology for many real-time applications such as autonomous driving or assistive robotics. However, recent works for 3D MOT tend to focus more on developing accurate systems giving less regard to computational cost and system complexity. In contrast, this work proposes a simple yet accurate real-time baseline 3D MOT system. We use an off-the-shelf 3D object detector to obtain oriented 3D bounding boxes from the LiDAR point cloud. Then, a combination of 3D Kalman filter and Hungarian algorithm is used for state estimation and data association. Although our baseline system is a straightforward combination of standard methods, we obtain the state-of-the-art results. To evaluate our baseline system, we propose a new 3D MOT extension to the official KITTI 2D MOT evaluation along with two new metrics. Our proposed baseline method for 3D MOT establishes new state-of-theart performance on 3D MOT for KITTI, improving the 3D MOTA from 72.23 of prior art to 76.47. Surprisingly, by projecting our 3D tracking results to the 2D image plane and compare against published 2D MOT methods, our system places 2nd on the official KITTI leaderboard. Also, our proposed 3D MOT method runs at a rate of 214.7 FPS, 65× faster than the state-of-the-art 2D MOT system.



Multi-object tracking (MOT) is an essential component technology for many vision applications such as autonomous driving , robot collision prediction and video face alignment. Due to the significant advance in object detection, there has been much progress on MOT. For example, for the car class on the KITTI MOT benchmark, the MOTA (multi-object tracking accuracy) has improved from 57.03 to 84.24 in two years. Although the accuracy has been significantly improved, it has come at the cost of increasing system complexity and computational cost. Complex systems make modular analysis challenging and it is not always clear which part of the system contributes the most to performance. For example, leading works have substantial different system pipelines but only minor differences in performance. Also, the adverse effect of increased computational cost is obvious. Despite having excellent accuracy, real-time tracking is out of reach. In contrast to prior work which tends to focus more on accuracy over system complexity and computational cost, this work aims to develop an accurate, simple and real-time 3D MOT system. We show that our proposed system which combines the minimal components for 3D MOT works extremely well. On the KITTI dataset, our system establishes new state-of-the-art performance on 3D MOT. Surprisingly, if we project our 3D tracking results to the 2D image plane and compare against all published 2D MOT methods, our system places 2nd on the official KITTI leaderboard . In addition, due to the simplicity of our system, it can run at a rate of 214.7 FPS on KITTI test set, 65 times faster than the state-of-the-art MOT system BeyondPixels. When comparing against other real-time MOT systems such as Complexer-YOLO, LP-SSVM, 3D-CNN/PMBM, and MCMOT-CPD, our system is at least twice as fast and achieves much higher accuracy.

Related Work



2D Multi-Object Tracking. 

Recent 2D MOT systems can be mostly split into two categories based on the data association: batch and online methods. The batch methods attempt to find the global optimal solution from the entire sequence. They often create a network flow graph and can be solved by the min-cost flow algorithms. On the other hand, the online methods consider only the detection at the current frame and are usually efficient for real-time application. These methods often formulate the data association as a bipartite graph matching problem and solve it using the Hungarian algorithm. Beyond using the Hungarian algorithm in a post-processing step, modern online methods design the deep association networks that are able to construct the association using neural networks. Our MOT system also belongs to online methods. For simplicity and real-time efficiency, we adopt the original Hungarian algorithm without using neural networks. Independent of the data association, designing a proper cost function for affinity measure is also crucial to the MOT system. Early works employ hand-crafted features such as spatial distance and colour histograms as the cost function. Instead, modern methods apply the motion model and learn the appearance features. In contrast to prior works which combine both appearance and motion models in a complicated way, we choose to employ only the simplest motion model, i.e., constant velocity, without using an appearance model.

3D Multi-Object Tracking. 

Most 3D MOT systems share the same components with the 2D MOT systems. The only distinction lies in that the detection boxes are in 3D space instead of the image plane. Therefore, it has the potential to design the motion and appearance models in 3D space without perspective distortion. proposes an image-based method which estimates the location of objects in image space and also their distance to the camera in 3D. Then a Poisson multi-Bernoulli mixture filter is used to estimate the 3D velocity of the objects. [30] applies an unscented Kalman filter in the bird’s eye view to estimate not only the 3D velocity but also the angular velocity. proposes a 2D-3D Kalman filter to jointly utilize the observation from the image and 3D world. Instead of using the hand-crafted filters, Design Siamese networks to learn the filters from data. Unlike previous works which use complicated filters, our proposed system employs only the original Kalman filter for simplicity, but extends its state space to full 3D domain, including not only 3D velocity but also the 3D size, 3D location and heading angle of the objects.

3D Object Detection. 

As an indispensable component of the 3D MOT, the quality of the detected 3D bounding box matters. Prior works mainly focus on processing the LiDAR point cloud inputs. Divide the point cloud into equally-spaced 3D voxels and apply 3D CNNs for 3D bounding box prediction. Converts the point cloud to a bird’s eye view representation for efficiency and exploit 2D convolutions. Other works such as directly process the point cloud inputs using PointNet++ for 3D detection. In addition, instead of using the point cloud, achieve the 3D detection from only a single image.


Please refer to paper for implementation and metrics.


Comments

Popular posts from this blog

ABOD and its PyOD python module

Angle based detection By  Hans-Peter Kriegel, Matthias Schubert, Arthur Zimek  Ludwig-Maximilians-Universität München  Oettingenstr. 67, 80538 München, Germany Ref Link PyOD By  Yue Zhao   Zain Nasrullah   Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada  Zheng Li jk  Northeastern University Toronto, Toronto, ON M5X 1E2, Canada I am combining two papers to summarize Anomaly detection. First one is Angle Based Outlier Detection (ABOD) and other one is python module that  uses ABOD along with over 20 other apis (PyOD) . This is third part in the series of Anomaly detection. First article exhibits survey that covered length and breadth of subject, Second article highlighted on data preparation and pre-processing.  Angle Based Outlier Detection. Angles are more stable than distances in high dimensional spaces for example the popularity of cosine-based sim...

Ownership at Large

 Open Problems and Challenges in Ownership Management -By John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Shan He, Ralf Lämmel, Erik Meijer, Silvia Sapora, and Justin Spahr-Summers Facebook Inc.  Software-intensive organizations rely on large numbers of software assets of different types, e.g., source-code files, tables in the data warehouse, and software configurations. Who is the most suitable owner of a given asset changes over time, e.g., due to reorganization and individual function changes. New forms of automation can help suggest more suitable owners for any given asset at a given point in time. By such efforts on ownership health, accountability of ownership is increased. The problem of finding the most suitable owners for an asset is essentially a program comprehension problem: how do we automatically determine who would be best placed to understand, maintain, ev...

Hybrid Approach to Automation, RPA and Machine Learning

- By Wiesław Kopec´, Kinga Skorupska, Piotr Gago, Krzysztof Marasek  Polish-Japanese Academy of Information Technology Paper Link Courtesy DZone   Abstract One of the more prominent trends within Industry 4.0 is the drive to employ Robotic Process Automation (RPA), especially as one of the elements of the Lean approach.     The full implementation of RPA is riddled with challenges relating both to the reality of everyday business operations, from SMEs to SSCs and beyond, and the social effects of the changing job market. To successfully address these points there is a need to develop a solution that would adjust to the existing business operations and at the same time lower the negative social impact of the automation process. To achieve these goals we propose a hybrid, human-centred approach to the development of software robots. This design and  implementation method combines the Living Lab approach with empowerment through part...