Open Problems and Challenges in Ownership Management
-By John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna Dvortsova, Johann
George, Natalija Gucevska, Mark Harman, Shan He, Ralf Lämmel, Erik Meijer, Silvia Sapora, and
Justin Spahr-Summers
Facebook Inc.
Software-intensive organizations rely on large numbers of software
assets of different types, e.g., source-code files, tables in the data
warehouse, and software configurations. Who is the most suitable
owner of a given asset changes over time, e.g., due to reorganization and individual function changes. New forms of automation can
help suggest more suitable owners for any given asset at a given
point in time. By such efforts on ownership health, accountability
of ownership is increased.
The problem of finding the most suitable
owners for an asset is essentially a program comprehension problem: how do we automatically determine who would be best placed
to understand, maintain, evolve (and thereby assume ownership of)
a given asset. This paper introduces the Facebook Ownesty system,
which uses a combination of ultra large scale data mining and machine learning and has been deployed at Facebook as part of the
company’s ownership management approach.
Managing software asset ownership in any organization is important. Many pressing industrial concerns such as security, reliability,
and integrity depend crucially on well-defined ownership so that
there are clear lines of responsibility for maintenance tasks, code
review, incident response, and others. Ownership management requires and connects research on a wide variety of topics including program comprehension, and more generally, software engineering,
programming languages, and machine learning.
Work at Facebook on the problem
of ownership management with a focus on ultra large scale data
mining and machine learning, subject to collaboration with other
teams focusing on additional aspects such as tooling and workflow
integration. This work has resulted in the Ownesty system.
THE OWNESTY SYSTEM
The term asset refers to any sort of entity that is a part of a system or
is possessed by a company of interest. (hardware not considerd here
for simplicity.) Examples of assets are these: a file in the repository
for a system, a database that is part of the system, a VM to run the
system, or a configuration of the VM. Ownership is also lifted to
compound or distributed entities such as components, products,
apps, or the scattered implementation of a logging feature.
The term owner candidate refers to any sort of individual or
group entity which is associated with the system (or company) of
interest and could possibly be accountable for any number of assets
in this scope.
Ownesty at Facebook considers types of owner candidates:
- Individual owner,
- Team (supported by a director),
- Reporting team (supported by a manager),
- Oncall rotation (some sort of response team type).
In the engineering practice, the types
individual owner and oncall rotation are particularly important.
Special part of a system: its asset-to-owner attribution mapping or just attribution, which maps assets to owner
candidates, which are thus referred to as owners. Individuals or processes with appropriate permissions may modify the mapping. In
particular, when an asset is mapped to a new owner, then this may
be referred to as ownership transfer.
The main purpose of a system
like Ownesty is to recommend suitable owners and thereby also
to validate ownership health, i.e., the suitability of the currently
attributed owners. To this end, machine learning and heuristics are
leveraged. Humans may be in the loop for the purpose of confirmation, also depending on the degree of confidence for the available
recommendations.
The ML Architecture
In Figure 1, the arrows denote data flow (computation). The gray shapes and arrows (see on the left) exist regardless of Ownesty. Several of the arrows are supported by metadata, which we do not further detail here for brevity. The gray arrows on the left express that the asset-to-owner attribution mapping is partially encoded in the assets themselves such as by annotation within files or a metastore for tables, in which case extraction can be applied to assets (1) or possibly to logs (2) that record the owners ‘in action’. Ownesty extracts features from the available logs (3) that record some relevant form of interactions between assets and owner candidates. (For instance, a log for a database admin tool would record who was taking what administrative action when.) This is a data and feature engineering challenge because of the plethora of logs and the fact that they were not designed with ownership management in mind. Feature extraction also involves assets and attribution (4–5), e.g., features obtained by source-code analysis. (For instance, we may extract a feature regarding an oncall annotation from a build file.) The individually extracted features are composed into feature vectors (6) – these are specific to the asset type. Ownesty leverages supervised learning and thus relies on labeled data for positive and negative attribution. To this end, so-called ‘labeling events’ are extracted from the logs (7), e.g., events that recorded reliable human decisions to accept or reject owner recommendations for attributing assets to owner candidates. The labeled data for training and test is then obtained by joining labeled events with the feature vectors for those events (8–9). We build interpretable models and provide prediction sets (10– 12) for the various asset types. Interpretable or explainable models (e.g., basic decision trees or linear models lifted to scoring systems) are essential because the predictions and the underlying models need to be understood by humans. Subject to further metadata (e.g., documentation for the features), predictions are mapped to actionable ‘explanations’ and surfaced through project/ownership management tooling (13) so that humans in the loop can accept or reject, thereby modifying the asset-to-owner attribution mapping (14) (and providing more labeled data eventually).
The ML Architecture
In Figure 1, the arrows denote data flow (computation). The gray shapes and arrows (see on the left) exist regardless of Ownesty. Several of the arrows are supported by metadata, which we do not further detail here for brevity. The gray arrows on the left express that the asset-to-owner attribution mapping is partially encoded in the assets themselves such as by annotation within files or a metastore for tables, in which case extraction can be applied to assets (1) or possibly to logs (2) that record the owners ‘in action’. Ownesty extracts features from the available logs (3) that record some relevant form of interactions between assets and owner candidates. (For instance, a log for a database admin tool would record who was taking what administrative action when.) This is a data and feature engineering challenge because of the plethora of logs and the fact that they were not designed with ownership management in mind. Feature extraction also involves assets and attribution (4–5), e.g., features obtained by source-code analysis. (For instance, we may extract a feature regarding an oncall annotation from a build file.) The individually extracted features are composed into feature vectors (6) – these are specific to the asset type. Ownesty leverages supervised learning and thus relies on labeled data for positive and negative attribution. To this end, so-called ‘labeling events’ are extracted from the logs (7), e.g., events that recorded reliable human decisions to accept or reject owner recommendations for attributing assets to owner candidates. The labeled data for training and test is then obtained by joining labeled events with the feature vectors for those events (8–9). We build interpretable models and provide prediction sets (10– 12) for the various asset types. Interpretable or explainable models (e.g., basic decision trees or linear models lifted to scoring systems) are essential because the predictions and the underlying models need to be understood by humans. Subject to further metadata (e.g., documentation for the features), predictions are mapped to actionable ‘explanations’ and surfaced through project/ownership management tooling (13) so that humans in the loop can accept or reject, thereby modifying the asset-to-owner attribution mapping (14) (and providing more labeled data eventually).
Conclusion
This paper characterizes the general notion of ownership management and the specific aspects of using ownership recommendation for attributing assets to owners and measuring the health of any such attribution for large and complex projects and systems. The recommendation of suitable owners and the assessment of ownership health relies on data extracted from assets (per-asset data as well as asset dependencies), workflows and organizational structures. We hope to stimulate interest and activity in this exciting area. We have introduced the Facebook Ownesty system to illustrate the practical industrial relevance of the accompanying ownership research agenda. We also set out open problems and challenges and their relationships to existing research activities and communities. We are keen to collaborate with the research communities working on software engineering, programming languages, and machine learning on these open problems and challenges.
This paper characterizes the general notion of ownership management and the specific aspects of using ownership recommendation for attributing assets to owners and measuring the health of any such attribution for large and complex projects and systems. The recommendation of suitable owners and the assessment of ownership health relies on data extracted from assets (per-asset data as well as asset dependencies), workflows and organizational structures. We hope to stimulate interest and activity in this exciting area. We have introduced the Facebook Ownesty system to illustrate the practical industrial relevance of the accompanying ownership research agenda. We also set out open problems and challenges and their relationships to existing research activities and communities. We are keen to collaborate with the research communities working on software engineering, programming languages, and machine learning on these open problems and challenges.
Comments