Skip to main content

Ownership at Large

 Open Problems and Challenges in Ownership Management

-By John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Shan He, Ralf Lämmel, Erik Meijer, Silvia Sapora, and Justin Spahr-Summers

Facebook Inc. 



Software-intensive organizations rely on large numbers of software assets of different types, e.g., source-code files, tables in the data warehouse, and software configurations. Who is the most suitable owner of a given asset changes over time, e.g., due to reorganization and individual function changes. New forms of automation can help suggest more suitable owners for any given asset at a given point in time. By such efforts on ownership health, accountability of ownership is increased.

The problem of finding the most suitable owners for an asset is essentially a program comprehension problem: how do we automatically determine who would be best placed to understand, maintain, evolve (and thereby assume ownership of) a given asset. This paper introduces the Facebook Ownesty system, which uses a combination of ultra large scale data mining and machine learning and has been deployed at Facebook as part of the company’s ownership management approach.

Managing software asset ownership in any organization is important. Many pressing industrial concerns such as security, reliability, and integrity depend crucially on well-defined ownership so that there are clear lines of responsibility for maintenance tasks, code review, incident response, and others. Ownership management requires and connects research on a wide variety of topics including program comprehension, and more generally, software engineering, programming languages, and machine learning.

Work at Facebook on the problem of ownership management with a focus on ultra large scale data mining and machine learning, subject to collaboration with other teams focusing on additional aspects such as tooling and workflow integration. This work has resulted in the Ownesty system.

THE OWNESTY SYSTEM

The term asset refers to any sort of entity that is a part of a system or is possessed by a company of interest. (hardware not considerd here for simplicity.) Examples of assets are these: a file in the repository for a system, a database that is part of the system, a VM to run the system, or a configuration of the VM. Ownership is also lifted to compound or distributed entities such as components, products, apps, or the scattered implementation of a logging feature. 

The term owner candidate refers to any sort of individual or group entity which is associated with the system (or company) of interest and could possibly be accountable for any number of assets in this scope. 

Ownesty at Facebook considers types of owner candidates: 
  • Individual owner, 
  • Team (supported by a director), 
  • Reporting team (supported by a manager), 
  • Oncall rotation (some sort of response team type). 


In the engineering practice, the types individual owner and oncall rotation are particularly important. 

Special part of a system: its asset-to-owner attribution mapping or just attribution, which maps assets to owner candidates, which are thus referred to as owners. Individuals or processes with appropriate permissions may modify the mapping. In particular, when an asset is mapped to a new owner, then this may be referred to as ownership transfer. 

The main purpose of a system like Ownesty is to recommend suitable owners and thereby also to validate ownership health, i.e., the suitability of the currently attributed owners. To this end, machine learning and heuristics are leveraged. Humans may be in the loop for the purpose of confirmation, also depending on the degree of confidence for the available recommendations.


The ML Architecture




In Figure 1, the arrows denote data flow (computation). The gray shapes and arrows (see on the left) exist regardless of Ownesty. Several of the arrows are supported by metadata, which we do not further detail here for brevity. The gray arrows on the left express that the asset-to-owner attribution mapping is partially encoded in the assets themselves such as by annotation within files or a metastore for tables, in which case extraction can be applied to assets (1) or possibly to logs (2) that record the owners ‘in action’. Ownesty extracts features from the available logs (3) that record some relevant form of interactions between assets and owner candidates. (For instance, a log for a database admin tool would record who was taking what administrative action when.) This is a data and feature engineering challenge because of the plethora of logs and the fact that they were not designed with ownership management in mind. Feature extraction also involves assets and attribution (4–5), e.g., features obtained by source-code analysis. (For instance, we may extract a feature regarding an oncall annotation from a build file.) The individually extracted features are composed into feature vectors (6) – these are specific to the asset type. Ownesty leverages supervised learning and thus relies on labeled data for positive and negative attribution. To this end, so-called ‘labeling events’ are extracted from the logs (7), e.g., events that recorded reliable human decisions to accept or reject owner recommendations for attributing assets to owner candidates. The labeled data for training and test is then obtained by joining labeled events with the feature vectors for those events (8–9). We build interpretable models and provide prediction sets (10– 12) for the various asset types. Interpretable or explainable models (e.g., basic decision trees or linear models lifted to scoring systems) are essential because the predictions and the underlying models need to be understood by humans. Subject to further metadata (e.g., documentation for the features), predictions are mapped to actionable ‘explanations’ and surfaced through project/ownership management tooling (13) so that humans in the loop can accept or reject, thereby modifying the asset-to-owner attribution mapping (14) (and providing more labeled data eventually). 


Conclusion

This paper characterizes the general notion of ownership management and the specific aspects of using ownership recommendation for attributing assets to owners and measuring the health of any such attribution for large and complex projects and systems. The recommendation of suitable owners and the assessment of ownership health relies on data extracted from assets (per-asset data as well as asset dependencies), workflows and organizational structures. We hope to stimulate interest and activity in this exciting area. We have introduced the Facebook Ownesty system to illustrate the practical industrial relevance of the accompanying ownership research agenda. We also set out open problems and challenges and their relationships to existing research activities and communities. We are keen to collaborate with the research communities working on software engineering, programming languages, and machine learning on these open problems and challenges.





Comments

Unknown said…
Great Work Sir !! Keep it up !!
Jagadeesh said…
Really liked the way it was explained with the examples. It helps... Thanks

Popular posts from this blog

ABOD and its PyOD python module

Angle based detection By  Hans-Peter Kriegel, Matthias Schubert, Arthur Zimek  Ludwig-Maximilians-Universität München  Oettingenstr. 67, 80538 München, Germany Ref Link PyOD By  Yue Zhao   Zain Nasrullah   Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada  Zheng Li jk  Northeastern University Toronto, Toronto, ON M5X 1E2, Canada I am combining two papers to summarize Anomaly detection. First one is Angle Based Outlier Detection (ABOD) and other one is python module that  uses ABOD along with over 20 other apis (PyOD) . This is third part in the series of Anomaly detection. First article exhibits survey that covered length and breadth of subject, Second article highlighted on data preparation and pre-processing.  Angle Based Outlier Detection. Angles are more stable than distances in high dimensional spaces for example the popularity of cosine-based similarity measures for text data. Object o is an out

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

 - By Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, Dongmei Zhang Microsoft Research, Beijing 100080, China. Beihang University, Beijing 100191, China Paper Link Abstract Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreadsheet and a pixel matrix as image, and encouraged by the successful application of Convolutional Neural Networks (CNN) in computer vision, we have developed TableSense, a novel end-to-end framework for spreadsheet table detection. First, we devise an effective cell featurization scheme to better leverage the rich information in each cell; second, we develop an enhanced convolutional neural network model for tab

DEEP LEARNING FOR ANOMALY DETECTION: A SURVEY

-By  Raghavendra Chalapathy  University of Sydney,  Capital Markets Co-operative Research Centre (CMCRC)  Sanjay Chawla  Qatar Computing Research Institute (QCRI),  HBKU  Paper Link Anomaly detection also known as outlier detection is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data. Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or errors in a text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions Hawkins defines an outlier as an observation that deviates so significantly from other observations as to arouse suspicion that it was generated by a different mechanism. Aim of this paper is two-fold, First is a structured and comprehensive overview of research methods in deep learning-based anomaly detection. Furthermore the adoption of these methods