By
Hans-Peter Kriegel, Matthias Schubert, Arthur Zimek
Ludwig-Maximilians-Universität München
Oettingenstr. 67, 80538 München, Germany
By
Yue Zhao
Zain Nasrullah
Department of Computer Science, University of Toronto,
Toronto, ON M5S 2E4, Canada
Zheng Li jk
Northeastern University Toronto, Toronto, ON M5X 1E2, Canada
I am combining two papers to summarize Anomaly detection. First one is Angle Based Outlier Detection (ABOD) and other one is python module that uses ABOD along with over 20 other apis (PyOD). This is third part in the series of Anomaly detection. First article exhibits survey that covered length and breadth of subject, Second article highlighted on data preparation and pre-processing.
Angle Based Outlier Detection.
Angles are more stable than distances in high dimensional spaces for example the popularity of cosine-based similarity measures for text data.
Object o is an outlier if most other objects are located in similar directions
Object o is no outlier if many other objects are located in varying directions.
Measure the variance of the angle spectrum
Weighted by the corresponding distances (for lower dimensional data sets where angles are less reliable)
Note: Small ABOD => Outlier
High ABOD => no Outlier
Results
Comparision
Outcome
Python Module PyOD
Abstract
PyOD is an open-source Python toolbox for performing scalable outlier detection on multivariate data. Uniquely, it provides access to a wide range of outlier detection algorithms, including established outlier ensembles and more recent neural network-based approaches, under a single, well-documented API designed for use by both practitioners and researchers. With robustness and scalability in mind, best practices such as unit testing, continuous integration, code coverage, maintainability checks, interactive examples and parallelization are emphasized as core components in the toolbox’s development. PyOD is compatible with both Python 2 and 3 and can be installed through Python Package Index (PyPI) or https://github.com/yzhao062/pyod.
Outlier detection modules in PyOD
Compared to existing libraries, PyOD has six distinct advantages. Firstly, it contains more than 20 algorithms which cover both classical techniques such as local outlier factor and recent neural network architectures such as autoencoders or adversarial models. Secondly, PyOD implements combination methods for merging the results of multiple detectors and outlier ensembles which are an emerging set of models. Thirdly, PyOD includes a unified API, detailed documentation and interactive examples across all algorithms for clarity and ease of use. Fourthly, all models are covered by unit testing with cross platform continuous integration, code coverage and code maintainability checks. Fifthly, optimization instruments are employed when possible: just-in-time (JIT) compilation and parallelization are enabled in select models for scalable outlier detection. Lastly, PyOD is compatible with both Python 2 and 3 across major operating systems (Windows, Linux and MacOS).
Conclusion
This paper presents PyOD, a comprehensive toolbox built in Python for salable outlier detection. It includes more than 20 classical and emerging detection algorithms and is being used in both academic and commercial projects. As avenues for future work, we plan to enhance the toolbox by implementing models that work well with time series and geo-spatial data, improving computational efficiency through distributed computing and addressing engineering challenges such as handling sparse matrices or memory limitations.
Comments