Skip to main content

ABOD and its PyOD python module



By 

Hans-Peter Kriegel, Matthias Schubert, Arthur Zimek
 Ludwig-Maximilians-Universität München 
Oettingenstr. 67, 80538 München, Germany


By 
Yue Zhao  
Zain Nasrullah  
Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada 
Zheng Li jk 
Northeastern University Toronto, Toronto, ON M5X 1E2, Canada


I am combining two papers to summarize Anomaly detection. First one is Angle Based Outlier Detection (ABOD) and other one is python module that  uses ABOD along with over 20 other apis (PyOD). This is third part in the series of Anomaly detection. First article exhibits survey that covered length and breadth of subject, Second article highlighted on data preparation and pre-processing. 


Angle Based Outlier Detection.


Angles are more stable than distances in high dimensional spaces for example the popularity of cosine-based similarity measures for text data.


Object o is an outlier if most other objects are located in similar directions

Object o is no outlier if many other objects are located in varying directions.

Measure the variance of the angle spectrum

Weighted by the corresponding distances (for lower dimensional data sets where angles are less reliable)





Note:  Small ABOD  => Outlier
        High ABOD => no Outlier

Results



Comparision



Outcome






Python Module PyOD



Abstract 

PyOD is an open-source Python toolbox for performing scalable outlier detection on multivariate data. Uniquely, it provides access to a wide range of outlier detection algorithms, including established outlier ensembles and more recent neural network-based approaches, under a single, well-documented API designed for use by both practitioners and researchers. With robustness and scalability in mind, best practices such as unit testing, continuous integration, code coverage, maintainability checks, interactive examples and parallelization are emphasized as core components in the toolbox’s development. PyOD is compatible with both Python 2 and 3 and can be installed through Python Package Index (PyPI) or https://github.com/yzhao062/pyod.


Outlier detection modules in PyOD



Compared to existing libraries, PyOD has six distinct advantages. Firstly, it contains more than 20 algorithms which cover both classical techniques such as local outlier factor and recent neural network architectures such as autoencoders or adversarial models. Secondly, PyOD implements combination methods for merging the results of multiple detectors and outlier ensembles which are an emerging set of models. Thirdly, PyOD includes a unified API, detailed documentation and interactive examples across all algorithms for clarity and ease of use. Fourthly, all models are covered by unit testing with cross platform continuous integration, code coverage and code maintainability checks. Fifthly, optimization instruments are employed when possible: just-in-time (JIT) compilation and parallelization are enabled in select models for scalable outlier detection. Lastly, PyOD is compatible with both Python 2 and 3 across major operating systems (Windows, Linux and MacOS).




Conclusion

This paper presents PyOD, a comprehensive toolbox built in Python for salable outlier detection. It includes more than 20 classical and emerging detection algorithms and is being used in both academic and commercial projects. As avenues for future work, we plan to enhance the toolbox by implementing models that work well with time series and geo-spatial data, improving computational efficiency through distributed computing and addressing engineering challenges such as handling sparse matrices or memory limitations. 


Comments

Popular posts from this blog

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

 - By Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, Dongmei Zhang Microsoft Research, Beijing 100080, China. Beihang University, Beijing 100191, China Paper Link Abstract Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreadsheet and a pixel matrix as image, and encouraged by the successful application of Convolutional Neural Networks (CNN) in computer vision, we have developed TableSense, a novel end-to-end framework for spreadsheet table detection. First, we devise an effective cell featurization scheme to better leverage the rich information in each cell; second, we develop an enhanced convolutional neural network model for tab

Rule Extraction Algorithm for Deep Neural Networks: A Review

-By Tameru Hailesilassie Department of Computer Science and Engineering National University of Science and Technology (MISiS) Moscow, Russia Today's blog is the continuation of XAI series. Rule Extraction from Neural Networks Abstract—Despite the highest classification accuracy in wide varieties of application areas, the artificial neural network has one disadvantage. The way this Network comes to a decision is not easily comprehensible. The lack of explanation ability reduces the acceptability of neural network in data mining and decision system. This drawback is the reason why researchers have proposed many rule extraction algorithms to solve the problem. Recently, Deep Neural Network (DNN) is achieving a profound result over the standard neural network for classification and recognition problems. It is a hot machine learning area proven both useful and innovative. This paper has thoroughly reviewed various rule extraction algorithms, considering the classifi