Skip to main content

Ownership at Large

 Open Problems and Challenges in Ownership Management

-By John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Shan He, Ralf Lämmel, Erik Meijer, Silvia Sapora, and Justin Spahr-Summers

Facebook Inc. 



Software-intensive organizations rely on large numbers of software assets of different types, e.g., source-code files, tables in the data warehouse, and software configurations. Who is the most suitable owner of a given asset changes over time, e.g., due to reorganization and individual function changes. New forms of automation can help suggest more suitable owners for any given asset at a given point in time. By such efforts on ownership health, accountability of ownership is increased.

The problem of finding the most suitable owners for an asset is essentially a program comprehension problem: how do we automatically determine who would be best placed to understand, maintain, evolve (and thereby assume ownership of) a given asset. This paper introduces the Facebook Ownesty system, which uses a combination of ultra large scale data mining and machine learning and has been deployed at Facebook as part of the company’s ownership management approach.

Managing software asset ownership in any organization is important. Many pressing industrial concerns such as security, reliability, and integrity depend crucially on well-defined ownership so that there are clear lines of responsibility for maintenance tasks, code review, incident response, and others. Ownership management requires and connects research on a wide variety of topics including program comprehension, and more generally, software engineering, programming languages, and machine learning.

Work at Facebook on the problem of ownership management with a focus on ultra large scale data mining and machine learning, subject to collaboration with other teams focusing on additional aspects such as tooling and workflow integration. This work has resulted in the Ownesty system.

THE OWNESTY SYSTEM

The term asset refers to any sort of entity that is a part of a system or is possessed by a company of interest. (hardware not considerd here for simplicity.) Examples of assets are these: a file in the repository for a system, a database that is part of the system, a VM to run the system, or a configuration of the VM. Ownership is also lifted to compound or distributed entities such as components, products, apps, or the scattered implementation of a logging feature. 

The term owner candidate refers to any sort of individual or group entity which is associated with the system (or company) of interest and could possibly be accountable for any number of assets in this scope. 

Ownesty at Facebook considers types of owner candidates: 
  • Individual owner, 
  • Team (supported by a director), 
  • Reporting team (supported by a manager), 
  • Oncall rotation (some sort of response team type). 


In the engineering practice, the types individual owner and oncall rotation are particularly important. 

Special part of a system: its asset-to-owner attribution mapping or just attribution, which maps assets to owner candidates, which are thus referred to as owners. Individuals or processes with appropriate permissions may modify the mapping. In particular, when an asset is mapped to a new owner, then this may be referred to as ownership transfer. 

The main purpose of a system like Ownesty is to recommend suitable owners and thereby also to validate ownership health, i.e., the suitability of the currently attributed owners. To this end, machine learning and heuristics are leveraged. Humans may be in the loop for the purpose of confirmation, also depending on the degree of confidence for the available recommendations.


The ML Architecture




In Figure 1, the arrows denote data flow (computation). The gray shapes and arrows (see on the left) exist regardless of Ownesty. Several of the arrows are supported by metadata, which we do not further detail here for brevity. The gray arrows on the left express that the asset-to-owner attribution mapping is partially encoded in the assets themselves such as by annotation within files or a metastore for tables, in which case extraction can be applied to assets (1) or possibly to logs (2) that record the owners ‘in action’. Ownesty extracts features from the available logs (3) that record some relevant form of interactions between assets and owner candidates. (For instance, a log for a database admin tool would record who was taking what administrative action when.) This is a data and feature engineering challenge because of the plethora of logs and the fact that they were not designed with ownership management in mind. Feature extraction also involves assets and attribution (4–5), e.g., features obtained by source-code analysis. (For instance, we may extract a feature regarding an oncall annotation from a build file.) The individually extracted features are composed into feature vectors (6) – these are specific to the asset type. Ownesty leverages supervised learning and thus relies on labeled data for positive and negative attribution. To this end, so-called ‘labeling events’ are extracted from the logs (7), e.g., events that recorded reliable human decisions to accept or reject owner recommendations for attributing assets to owner candidates. The labeled data for training and test is then obtained by joining labeled events with the feature vectors for those events (8–9). We build interpretable models and provide prediction sets (10– 12) for the various asset types. Interpretable or explainable models (e.g., basic decision trees or linear models lifted to scoring systems) are essential because the predictions and the underlying models need to be understood by humans. Subject to further metadata (e.g., documentation for the features), predictions are mapped to actionable ‘explanations’ and surfaced through project/ownership management tooling (13) so that humans in the loop can accept or reject, thereby modifying the asset-to-owner attribution mapping (14) (and providing more labeled data eventually). 


Conclusion

This paper characterizes the general notion of ownership management and the specific aspects of using ownership recommendation for attributing assets to owners and measuring the health of any such attribution for large and complex projects and systems. The recommendation of suitable owners and the assessment of ownership health relies on data extracted from assets (per-asset data as well as asset dependencies), workflows and organizational structures. We hope to stimulate interest and activity in this exciting area. We have introduced the Facebook Ownesty system to illustrate the practical industrial relevance of the accompanying ownership research agenda. We also set out open problems and challenges and their relationships to existing research activities and communities. We are keen to collaborate with the research communities working on software engineering, programming languages, and machine learning on these open problems and challenges.





Comments

Unknown said…
Great Work Sir !! Keep it up !!
Jagadeesh said…
Really liked the way it was explained with the examples. It helps... Thanks

Popular posts from this blog

ABOD and its PyOD python module

Angle based detection By  Hans-Peter Kriegel, Matthias Schubert, Arthur Zimek  Ludwig-Maximilians-Universität München  Oettingenstr. 67, 80538 München, Germany Ref Link PyOD By  Yue Zhao   Zain Nasrullah   Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada  Zheng Li jk  Northeastern University Toronto, Toronto, ON M5X 1E2, Canada I am combining two papers to summarize Anomaly detection. First one is Angle Based Outlier Detection (ABOD) and other one is python module that  uses ABOD along with over 20 other apis (PyOD) . This is third part in the series of Anomaly detection. First article exhibits survey that covered length and breadth of subject, Second article highlighted on data preparation and pre-processing.  Angle Based Outlier Detection. Angles are more stable than distances in high dimensional spaces for example the popularity of cosine-based sim...

Cybersecurity Threats in Connected and Automated Vehicles based Federated Learning Systems

  Ranwa Al Mallah , Godwin Badu-Marfo , Bilal Farooq image Courtesy: Comparitech Abstract Federated learning (FL) is a machine learning technique that aims at training an algorithm across decentralized entities holding their local data private. Wireless mobile networks allow users to communicate with other fixed or mobile users. The road traffic network represents an infrastructure-based configuration of a wireless mobile network where the Connected and Automated Vehicles (CAV) represent the communicating entities. Applying FL in a wireless mobile network setting gives rise to a new threat in the mobile environment that is very different from the traditional fixed networks. The threat is due to the intrinsic characteristics of the wireless medium and is caused by the characteristics of the vehicular networks such as high node-mobility and rapidly changing topology. Most cyber defense techniques depend on highly reliable and connected networks. This paper explores falsified informat...

MLOps Drivenby Data Quality using ease.ml techniques

 Cedric Renggli, Luka Rimanic, Nezihe Merve Gurel, Bojan Karlas, Wentao Wu, Ce Zhang ETH Zurich Microsoft Research Paper Link ease.ml reference paper link Image courtesy 99designes Developing machine learning models can be seen as a process similar to the one established for traditional software development. A key difference between the two lies in the strong dependency between the quality of a machine learning model and the quality of the data used to train or perform evaluations. In this work, we demonstrate how different aspects of data quality propagate through various stages of machine learning development. By performing joint analysis of the impact of well-known data quality dimensions and the downstream machine learning process, we show that different components of a typical MLOps pipeline can be efficiently designed, providing both a technical and theoretical perspective. Courtesy: google The term “MLOps” is used when this DevOps process is specifically applied to ML. Diffe...