Why AI or ML Software Projects need Heroes

- By Suvodeep Majumder, IEEE Member, Joymallya Chakraborty, Amritanshu Agrawal, Tim Menzies, IEEE Fellow

I felt executing ML or AI project is not only purely technical,l but also clear communication between team members and the size of the team also matters, Hence I picked up this topic

Heroes are those who participate in 80% (or more) of the communications associated with a commit.

Abstract

A “hero” project is one where 80% or more of the contributions are made by the 20% of the developers. In the literature, such projects are deprecated since they might cause bottlenecks in development and communication. However, there is little empirical evidence on this matter. Further, recent studies show that such hero projects are very prevalent. Accordingly, this paper explores the effect of having heroes in project, from a code quality perspective. We identify the heroes developer communities in 1100+ open source GitHub projects. Based on the analysis, we ﬁnd that (a) hero projects are majorly all projects; and (b) the commits from “hero developers” (who contribute most to the code) result in far fewer bugs than other developers. That is, contrary to the literature, heroes are standard and very useful part of modern open source projects.

A “hero” project is one where 80% or more of the contributions come from 20% of developers. In the literature, such hero projects are deprecated since, it is said, they are bottlenecks that slow project development and causes information loss. Recent studies have motivated a re-examination of the implications of heroes. In 2018, Agrawal et al. studied 661 open-source projects and 171 in-house proprietary projects. In that sample, over 89% of all projects were hero-based1. Only in small open source projects (with under 15 core developers) where non-hero projects were more prevalent. To say the least, this widespread prevalence of heroes is at odds with established wisdom in the SE literature. Hence, it is now an open and pressing issue to understand why so many projects are hero-based. To that end, this paper checks the Agrawal et al. result. All of the project data recollected from scratch from double the number of open source projects (over 1100 projects)thanusedbyAgrawaletal.Also, we use a different method for recognizing a hero project. Agrawal et al. just counted the number of commits made by each developer. In this study, we say Heroes are those who participate in 80% (or more) of the communications associated with a commit.

We clearly demonstrate the beneﬁts of hero-based development, which is contrary to much prior pessimism.
Our conclusions come from over 1100+ projects, whereas prior work commented on heroes using data from just a handful of projects.
Our conclusions come from very recent projects instead of decades-old data.
We show curves that precisely illustrate the effects on code quality for different levels of communication. This is different to prior work that only offered general qualitative principles.
This paper makes its conclusions using more metrics than prior work. Not only do we observe an effect (using process and resource metrics) to report the frequency of developer contribution, but we also report the consequence of that effect (by joining to produce metrics to reveal software quality).
Instead of just reporting an effect (that heroes are common, as done by Agrawal et al.) we can explain that effect(heroes are those that communicate more and that communication leads to fewer bugs).
Asaservicetootherresearchers, all the scripts and data of this study can be downloaded from https://github.com/ai-se/Git_miner mine

Firstly, when we say 1100+ projects, that is shorthand for the following. Our results used the intersection of two graphs of code interaction graph (of who writes what code) from 1327 projects with a social interaction graph (who discusses what commits) from 1173 projects. Secondly, by code interaction graphs and social interaction graphs, we mean the following. Each graph has own nodes and edges{N,E}.

For code interaction graphs:

Individual developers have their own node Na;
The edge Eb connects two nodes and indicates if ever one developer has changed another developer’s code. For social interaction graphs like Figure 1:
A node Nc is created for each individual who has created or commented on an issue.
An edge Ed indicates communication between two individuals (as recorded in the issue tracking system. If this happens N times then the weight Wd = N. Thirdly,our deﬁnition of “hero” is not “writes 80% of the software” since such a deﬁnition is hard to operationalize for modern agile projects (where many people might lend a hand to much of the code). Instead we say heroes are those that “participate in 80% of the discussions prior to the commits

Table 1

Software Quality Metrics

Table 1 shows that most papers do not use a wide range of metrics. Xenos distinguishes these kinds of metrics as follows. Product metrics are metrics that are directly related to the product itself, such as code statements, delivered executable, manuals, and strive to measure product quality, or attributes of the product that can be related to product quality. Also, process metrics focus on the process of software development and measure process characteristics, aiming to detect problems or to push forward successful practices. Lastly, personnel metrics (a.k.a. resource metrics) are those related to the resources required for software development and their performance. The capability, experience of each programmer and communication among all the programmers are related to product quality

Code interaction graph is a process metrics;
Social interaction graphs is a personnel metrics;
Defect counts are product metrics

This paper combines all three kinds of metrics and applies the combination to exploring the effects of heroism on software development. There are many previous studies that explore one or two of these types of metrics. Fig 2 summarizes Table 1 and shows that, in that sample, very few papers in software metrics and code quality combine insights of the product and process and personnel metrics.

Fig 2

Some of our own engineering judgement to ﬁlter our data as follows:

Collaboration: refers to the number of pull requests. This is indicative of how many other peripheral developers work on this project. We required all projects to have at least one pull request.
Commits: The project must contain more than 20 commits.
Duration: The project must contain software development activity of at least 50 weeks.
Issues: The project must contain more than 10 issues.
Personal Purpose: The project must not be used and maintained by one person. The project must have at least eight contributors.
Software Development: The project must only be a placeholder for software development source code.
Project Documentation Followed: The projects should follow proper documentation standard to log proper commit comment and issue events to allow commit issue linkage.
Social network validation: The Social Network that is being built should have at least 8 connected nodes in both the communication and code interaction graph.

Target projects selection

Release: (based on Git tags) mark a speciﬁc point in your repository’s history. A number of releases deﬁnes different versions published, which signiﬁes considerable amount of changes done between each version.
Duration: length of the project from its inception to current date or project archive date, and signiﬁes how long a project has been running and in active development phase.
Stars: signiﬁes number of people liking a project or use them as bookmarks so they can follow what’s going on with the project later.
Forks: A fork is a copy of a repository. Forking a repository allows you to freely experiment with changes without affecting the original project. This signiﬁes how people are interested in the repository and actively thinking of modiﬁcation of the original version.
Watcher: Watchers are GitHub users who have asked to be notiﬁed of activity in a repository, but have not become collaborators. This is a representative of people actively monitoring projects, because of possible interest or dependency.
Developer: Developers are the contributors to a project, who work on some code, and submit the code using commit to the codebase. The number of developers signiﬁes the interest of developers in actively participating in the project and volume of the work

Fig 4

Process Metrics

Project commits were extracted from each branch in git history.
Commits are extracted from the git log and stored in a ﬁle system.
To access the ﬁle changes in each commit we recreate the ﬁles that were modiﬁed in each commit by (a) continuously moving the git head chronologically on each branch. Changes were then identiﬁed using git diff on two consecutive git commits.
The graph is created by going through each commit and adding a node for the committer. Then we use git blame on the lines changed to ﬁnd previous commits following a similar process of SZZ algorithm. We identify all the developers of the commits from git blame and add them as a node as well.
After the nodes are created, the edges were drawn between the developer who changed the code, and whose code was changed. Those edges were weighted by the changing size between the person.

Personnel Metrics

A node is created for the person who has created the issue, then another set of nodes are created for each person who has commented on the issue. So essentially in Social interaction graph, each node in the graph is any person (developer or non-developer) ever created an issue or commented in an issue.
The nodes are connected by edges, which are created by (a) connecting the person who has created the issue to all the persons who have commented in that issue and (b) creating edges between all the persons who have commented on the issue, including the person who has created the issue.
The edges are weighted by the number of comments between two persons.
The weights are updated using the entire history of the projects. The creation and weight update are similar to Figure 5.

Fig 5

Product Metrics

• It ﬁrst starts with all the commits from git log and identifies the commit messages as this is often an excellent source of information regarding what the commit is about.
• Then to use the commits messages for labelling it uses a natural language-based processor, which includes stemming and other nltk preprocessors to normalize the commit messages.
• Then to identify commit messages which is a representation of bug/issue ﬁxing commits, a list of words and phrases extracted from previous studies of 1000+ projects (Open Source and Enterprise) are used. The system checked for these words and phrases in the commit messages and if found, it marks these as commits which ﬁxed some bugs.
• To perform a sanity check a portion of the commits was manually veriﬁed using random sampling from different projects.
• These labeled commits are now processed to extract the ﬁle changes as the process mentioned in-process metrics.
• Next git blame is used to go back in the git history each line of the changes in each ﬁle to identify a responsible commit where each line was created or changed last time.

Finally, top contributors (or heroes) and non-heroes were deﬁned as :

Node Degree of Ni = D(Ni) = n X j=1 aij (1)

Hero = Rank(D(Ni)) > P 100 ∗(N + 1) (2)

Non-Hero = Rank(D(Ni)) < P 100 ∗(N + 1) (3)

where:
N = Number of Developers
P = Percentile(95)
Rank() = The percentile rank of a score is the percentage = of scores in its frequency distribution that are = equal to or lower than it.
a = Adjacency matrix for graph where = aij > 0 denotes a connection.

Categorization of the developers into 2 groups:

• The hero developers, the core group of the developers of a certain project who makes regular changes in the codebase. In this study this is represented by the developers whose node degree is above 95th percentile of the node degree (developers communication and code interaction of the system graph).
• The non-hero developers are all other developers; i.e. developers associated with nodes with a degree below the 95th percentile. This study compares the performance these 2 sets of developers using the percentage of bugs introduced by them in the codebase.

Analysis

RQ1 How common are hero projects?

We say a project is a “hero project” if, when we isolate the developers who handle 95% of the interactions (or more), we see only 5% (or less) of the developers. To compute “interaction”, we mean the weighted in-degree counts to each vertex. The top 95% group are all vertices with a count above min +.2∗(max −min) (where min,max come from the smallest,largest counts). This deﬁnition could be applied to either the code interaction graph or the social interaction graph. Regardless, the observed pattern is the same. No matter what the source, the pattern is the same. Measured interms of code or social interaction, hero projects comprise over 80% of our sample.

RQ2 What impact does heroism have on code quality?

RQ2 explores what kind of effect heroism have on code quality. In order to explore this, we created the developer social interaction graph and developer code interaction graph, then we identiﬁed the developer responsible for introducing those bugs into the codebase. Then we ﬁnd the percentage of buggy commits introduced by those developers by checking
(a) the number of buggy commit introduced by those developers and
(b) their number of total commits.

RQ3: Does team size alter the above results?

Projects are sectioned into three categories:
• Small: A project is considered small if number of developers is greater than 8 but less than 15.
• Medium: A project is considered medium if number of developers is greater than 15 but less than 30.
• Large: A project is considered big if number of developers is greater than 30.

Critical for sucess of projects

Chief Programmer

One strange feature of our results is that what is old is now new. Our results (that heroes are important) echo a decade's old concept. In 1975, Fred Brooks wrote of “surgical teams” and the “chief programmer” [108]. He argued that

Much as a surgical team during surgery is led by one surgeon performing the most critical work while directing the team to assist with less critical parts.
Similarly, software projects should be led by one “chief programmer” to develop critical system components while the rest of a team provides what is needed at the right time

Brooks conjecture that “good” programmers are generally much more as productive as mediocre ones. This can be seen in the results that hero programmers are much more productive and less likely to introduce bugs into the codebase. Heroes are born when developers become e so skilled at what they do, that they assume a central position in a project. In our view, organizations need to acknowledge their dependence on such heroes, perhaps altering their human resource policies and manage these people more efﬁciently by retaining them.

CONCLUSION

The established wisdom in the literature is to depreciate “heroes”, i.e., a small percentage of the staff who are responsible for most of the progress on a project. But, based on a study of 1100+ open source GitHub projects, we assert:

Overwhelmingly, most projects are hero projects. This result holds true for small, medium, and large projects.
Hero developers are far less likely to introduce bugs into the codebase than their non-hero counterparts. Thus having heroes in projects signiﬁcantly affects the code quality. Our empirical results call for a revision of a long-held truism in software engineering. Software heroes are far more common and valuable than suggested by the literature, particularly from code quality perspective. Organizations should reﬂect on better ways to ﬁnd and retain more of these software heroes.

SRI Blog

Search This Blog