- Open Access
Who drives company-owned OSS projects: internal or external members?
Journal of the Brazilian Computer Societyvolume 24, Article number: 16 (2018)
Open-source software (OSS) communities leverage the workforce of volunteers to keep the projects sustainable. Some companies support OSS projects by paying developers to contribute to them, while others share their products under OSS licenses, keeping their employees in charge of maintaining the projects. In this paper, we investigate the activity of internal (employees) and external (volunteers) developers in this kind of setting. We conducted a case study using a convenience sample of five well-known OSS projects: atom, electron, hubot, git-lfs, and linguist. Analyzing a rich set of ∼ 12K contributions performed by means of pull requests to these projects, complemented with a manual analysis of ∼ 500 accepted pull requests, we derived a list of interesting findings. For instance, we found that both internal and external developers are rather active when it comes to submitting pull requests and that the studied projects are receptive for external developers. Considering all the projects, internal developers are responsible for 43.3% of the pull requests performed (external developers placed 56.7%). We also found that even with high support from the external community, employees still play the central roles in the project. We also found that the majority of the external developers are casual contributors (developers that placed only a single contribution to the project). However, we also observed that some external members play core roles (in addition to submitting code), like triaging bugs, reviewing, and integrating code to the main branch. Finally, when manually inspecting some code changes, we observed that external developers’ contributions range from documentation to complex code. Our results can benefit companies willing to open-source their code and developers that want to take part and actively contribute to company-owned code.
Open-source software (OSS) is one of the cornerstones of modern software development practice. Many existing software projects rely on OSS solutions either at compile time (e.g., build tools or testing tools) or runtime (e.g., webservers or databases). In spite of its ubiquitousness, several OSS projects rely on a single contributor to perform most of their needed tasks . Due to this grim scenario, it is not uncommon to see core developers becoming tired and abandoning their own software projects .
To alleviate this situation, recently, many software companies started to support open-source activities. For instance, open-source programming languages such as SwiftFootnote 1 and ScalaFootnote 2 have their development process primarily driven by employees of a software company (Apple and Typesafe, respectively). In fact, there is a recurrent belief that most of the OSS contribution softwares are made by paid developers. As a recent article pointed out, “More than 80 percent of [the Linux] kernel development is done by developers who are being paid for their work”Footnote 3. While commercial contributions to the Linux kernel have been widely acknowledged, in a large-scale study of more than 9000 OSS projects, Riehle and colleagues  observed that about 50% of the OSS contributors are actually paid ones. However, in their work, the authors consider “paid developers” the ones that performed commits from 9am to 5pm, local time. Using this simple rule, students, unemployed, or workers with flexible time schedules could be wrongly sampled as “paid developers.” Therefore, we believe that more systematic approaches should be employed to shed additional light on the proportion of paid/non-paid developers. There are at least two reasons that support our claim:
If there are, indeed, too many paid developers, OSS communities may need to better explore these workforces. For instance, instead of concentrating too many paid developers in one single OSS project, OSS communities could try to gather some paid developers to OSS projects that are more in need.
On the other hand, if there are too few paid developers, this finding might not only refute previous studies, but yet can be used to better motivate software companies to support OSS projects.
It is important to note that the source of payment can vary greatly. For instance, one can get paid to fix a bug via a crowdsourcing system, whereas others can be full-time OSS contributors. In this study, we pay particular attention to developers that contribute to company-owned OSS projects.
|Company-owned OSS projects: This term refers to software companies that started and curated OSS projects in a private environment, but later on decided to open-source them. Therefore, the OSS project that was previously restricted to the company’s employees could now potentially receive contributions from contributors that are not anyhow affiliated with the given company.|
This transition from proprietary to open-source is particularly relevant to our work. Although proprietary from birth, the software companies that built these software projects have perceived benefits that motivated them to open-source their software . In order to differentiate the developers that are paid by the companies to work in the OSS projects and the OSS contributors that contribute for personal reasons, throughout this paper, we refer to contributors that work for the company that open-sourced the project as “internal developers.” Developers that do not work for the company that open-sourced the given OSS project are referred as “external developers.” More technical details on how we differentiate external and internal developers can be seen at the “Internal and external classification” section.
In this paper, we extend a previous analysis , bringing a multi-case study investigating the contribution behavior of pull requests provided by internal and external developers in OSS projects. We used a convenience sample, composed by five GitHub-owned projects: atom, electron, hubot, git-lfs, and linguist. We chose these projects because they were initially developed by (and are maintained at) GitHub; therefore, we could take advantage of GitHub features to understand whether a contributor is an internal or external one (more details at the “Method” section). Through a set of quantitative and qualitative analysis, this paper makes the following contributions:
We provide evidence that there is a workforce of developers who are external to the company who opened the code contributing to the project, creating a community that extends the boundaries of the company. The number of external developers can be up to 32 × greater than internal ones.
We show that, although the external community is engaging, external members face a hard time to get a contribution accepted. In 4 out of the 5 studied projects, most of the rejected pull requests were made by external developers. In terms of time taken to process a pull request, on average, externals take 11.37 days to be processed. Internals, on the other hand, take 2.61 days.
We find that internal developers still play a crucial role in the project, playing the integrator role in two of the analyzed projects. However, external members are also acquiring this role. In project hubot, for instance, ∼ 80% of the team of integrators is composed by external developers.
We provide an in-depth investigation of the contributions (i.e., a pull request) made to five well-known OSS projects. They are as follows:
git-lfs, a git extension for versioning large files. It has ∼ 6300 commits, ∼ 1300 pull requests, 99 contributors, ∼ 5300 stars, and ∼ 900 forks. It is mostly written in Go and has ∼ 5 years of historical records. GitHub started its development in September 2013Footnote 10 and open-sourced it on April 2015Footnote 11.
linguist, a library to detect blob languages. It has 5600 commits, ∼ 2400 pull requests, 684 source code contributors, ∼ 5400 stars, and ∼ 2000 forks. It is mostly written in Ruby and has ∼ 7 years of historical records. GitHub started its development in May 2011Footnote 12 and open-sourced it in October 2015Footnote 13.
When analyzing the software history of these projects, we perceived that all of them but linguist started as a stand-alone software project. linguist, on the other hand, started as a unification of code scattered around the whole software system. Such a pattern of open-sourcing software projects was already reported elsewhere .
Figure 1 shows a distribution of additional characteristics of these projects.
We followed a mix-methods approach, combining quantitative and qualitative research method. In this section, we will present the common ground for all the research questions—including pull requests data collection, and how internal and external developers are classified—and, afterwards, we dive in the details of each specific RQ.
Pull request collection
The data reported in this paper is based on pull requests that were performed from the very beginning of the studied projects, up to January 2018—when we collected data. All data used in this study is available online at https://github.com/fronchetti/JBCS-2018.
We started our study by investigating all performed pull requests. A pull request can be found in three different stages:
open: waiting for code reviews and/or a final decision.
closed: the code reviews were done, but the pull request was not accepted (the status in GitHub is closed/unmerged).
merged: the code reviews were done, and the pull request was accepted (the status in GitHub is closed/merged).
We studied the contribution behavior of internal and external developers taking into account each possible stage of a pull request. Additionally, we investigated other characteristics associated with the pull request, such as:
The time taken to process a pull request
The number of comments during the code reviews per pull request
The number of commits per pull request
The number of changes (e.g., additions/deletions) per pull request
Internal and external classification
Since the analyzed projects are developed by (and maintained at) GitHub, we reduce false positives by taking advantage of GitHub features used to identify developers’ roles. Within GitHub organizations, one coordinator can set the site_admin flag true for another user. If enabled, this flag promotes an ordinary user to be a site administrator. According to GitHub official documentation, a site administrator can “manage high-level application and VM settings, all users and organization account settings, and repository data.Footnote 14” Therefore, for each pull request investigated, we verified whether the author has the site_admin flag enabled. If so, we marked she as internal; external otherwise.
To avoid false negatives (a paid developer that does not have its site_admin flag enabled), we analyzed the public profiles (e.g., GitHub affiliation, LinkedIn information, personal web page, among other sources) of the top 10 contributors (either internal or external). From the 48 profiles analyzed (2 members appeared in 2 different projects), we found 12 that worked for GitHub previously, but were not categorized as staff members. We manually identified these users as internal developers for our analysis. This misidentification is a potential threat and is further described in the “Limitations” section.
To guide our research, we investigated the following important but overlooked research questions:
|RQ1. Are OSS contributions mostly made by internal developers?|
Rationale: This exploratory research question guides our case study on GitHub company-owned OSS projects. It also provides evidence to understand the role that the external developers play in this kind of endeavor.
Approach: To answer this RQ, we quantitatively compared the number of internal and external contributors, as well as the number of pull requests submitted by them. In addition to characterizing and discussing the values using descriptive statistics, we compared the evolution of the number of pull requests submitted monthly by external and internal members, in a per project basis. It is important to mention that we computed the number of pull requests submitted per state (open, closed, merged). Since we compared the number of pull requests per month by two different samples, we applied the Wilcoxon signed-rank test for paired samples  to perform this comparison. We used Cliff’s delta to verify how often values in one distribution are larger than values in another distribution. The thresholds are defined as follows: delta<0.147 (negligible), delta<0.33 (small), delta<0.474 (medium), and delta>=0.474 (large) .
We also graphically reported the distributions to enable the visualization of the temporal evolution of contributions. The results for this question are presented throughout the “RQ1. Are OSS contributions mostly made by internal developers?” section.
|RQ1.1. Are internals the top contributors of company-owned OSS projects?|
Rationale: In this question, we are aimed to provide a fine-grained perspective about the involvement of the contributors of company-owned OSS project. Answers to this question will further substantiate the role that our subjects play.
Approach: To answer this question we, firstly, analyzed the top 10 contributors for each project to check how many of them are internal and how many are external members. Then, we compared internals and externals in terms of the number of pull requests per contributor in each project.
|RQ2. Who faces a harder time to get the contributions accepted?|
Rationale: In this research question, we focus our interest on understanding the how pull requests of internal and external are received. We focused on (i) acceptance and rejection rates and (ii) on the priority given to the pull requests. As the literature suggests, it is not always easy to contribute to open-source projects . We, therefore, explore whether external developers are facing a harder time in terms of rejections and time to process when compared to their internal peers. If that is the case, answers to this question might help improve how company-owned OSS projects treat external developers.
Approach: We built upon the results of the comparisons made for RQ1 to understand the acceptance rate (number of merged pull requests versus the total submitted pull requests) for internal and external members. To compare the time to process, we first computed the number of days from the submission date until the decision date (when the pull request was closed or merged). Then, we compared this characteristic for pull requests submitted by internal and external members. We also used Mann-Whitney-Wilcoxon (MWW) tests  and, as for RQ1, Cliff’s delta effect size measures  to perform this comparison.
|RQ3. Are externals more participative in the pull request review cycle?|
Rationale: This research question explores the degree of involvement of externals and internals in company-owned OSS projects in terms of (1) commenting/discussing pull requests and (2) playing the integrator role. Commits or pull requests are not the only ways to measure participation. In fact, contributors might provide comments to pull requests under review as an attempt to contribute to the project. On GitHub, anyone can freely provide comments to a pull request, regardless if the GitHub user has contributed before to the project. Another facet of participation regards integrating pull requests. Since processing pull requests is a notorious activity that only experienced contributors are willing to perform , it is more likely that internal developers should conduct this process. However, if external developers are also playing this role, this might indicate that the company-owned OSS project succeeds in decreasing the barriers for external developers to join the project.
Approach: To analyzed these two participation perspectives of internal and external members in the code review cycle, we collected the number of comments per pull request, classifying them as comments made by internal or external members. We also verified who was responsible for integrating the pull requests submitted: internal or external members. We characterized this aspect in terms of the number of pull requests integrated by internal and external members, and the number of internal and external members who played the integrator role. We used descriptive statistics and graphics, in addition to the MWW test and Cliff’s delta effect size to compare the involvement of internal and external members as commenters and integrators.
|RQ4. What are the characteristics of the contributions made by external developers?|
Rationale: We are intended to understand what are the kinds of contributions performed by external members. We complemented this analysis with an investigation over the differences of pull requests placed by internal and external members in terms of the size of the commits, including the number of files changed and code churn. Answers to this question might enable companies to have a better picture of what to expect from the external community. Moreover, the literature is particularly rich when it comes to changes made by internal developers [11–13].
Approach: We selected a representative sample of a small number of pull requests that reflect the larger population. We selected 334 random pull requests made at atom for manual analysis, which represents a confidence level of 95% with a ± 5% confidence interval. We also validated this analysis with another manual analysis in a random sample of 150 pull requests accepted at hubot. The qualitative analysis was conducted in parallel by two researchers, who investigated the pull requests individually. We also quantitatively compared the characteristics of the pull requests placed by internal and external members in terms of number files changed, added lines, deleted lines, and the number of commits per pull request. We considered each pull request as an observation and, once again, we used MWW tests  and Cliff’s delta effect size measures  to compare the groups. The results for this question are presented throughout the “RQ4. What are the contributions’ characteristics made by externals?” section.
In this section, we discuss the results of our study organized in terms of the research questions.
RQ1. Are OSS contributions mostly made by internal developers?
Generally speaking, both internal and external developers are rather active when it comes to submitted pull requests, as it can be observed in Table 1. On the one hand, internals contribute more pull requests on atom and git-lfs; on the other hand, external developers made a higher number of pull request in electron, hubot, and linguist. For hubot and linguist, external developers are responsible for more than 75% of the pull requests in the project. If we consider all projects, we found 5895 pull requests provided by internal developers (43.3%) and 6266 by external ones (56.7%). However, the number of contributors greatly differ between internals and externals, as it can be observed in Table 1. As an extreme case, project electron has 681 external contributors, and only 21 internal (while the number of contributions made by external developers is almost two times greater than those made by internal developers). That is, although the number of external developers is up to 32 × greater than internal ones, most of external developers perform few contributions.
To provide a more detailed perspective, Fig. 2 depicts the evolution of pull requests, grouped by their states (open, closed, and merged) at collection time (Jan 2018). Each observation corresponds to the total number of pull request submitted per month by each type of member (internal and external). The same data was used to statistically compare (p values and effect size values) submissions by internal and external members, which is shown in Table 2. Our effect size test follows the order internals, then externals. Therefore, negative values indicate effect size greater to the external developers. Positive values, otherwise.
From the figures, it is first possible to see that the maintainers do a great job in processing pull requests, given the small number of pull requests kept open. Projects electron, git-lfs, and hubot present low rates of open pull requests, 0.88, 0.13, and 0.08%(!), respectively. For the latter, at the time of data collection, only 3 pull requests were left open.
RQ1.1. Are internals the top contributors of company-owned OSS projects?
By analyzing the top 10 contributors for each project, we could observe that the “top contributor” of all projects are internal developers. As it can be observed in Table 3, in only one of the projects (git-lfs), the number of external developers is greater than the number of internal developers in the top 10 (6 externals, 4 internals). This finding suggests that externals are well participative. However, even in this case, by analyzing the code-churn, the top 2 developers (both internal) are by far the main contributors of the git-lfs project (top 1: 124,197 additions and 75,831 deletions; top 2: 89,065 additions and 74,576 deletions; sum of top 3 to top 5: ≈61,300 additions and ≈33,600 deletions).
Finally, we also analyzed the number of pull requests placed per contributor, as shown in Fig. 3. It is possible to observe that the small number of internal contributors place a higher amount of pull requests than external developers. It is also straight-forward to notice that the external contributors’ population is mostly composed of casual contributors [14, 15], that is, developers that contributed only once to the project. Table 4 brings the absolute number and the percentage of internal and external casual contributors per project. Overall, 76% of the external contributors of the analyzed projects made only one pull request to the project. This finding complements the study of Pinto and colleagues , which suggests that casual contributors are responsible for 49% of the whole population of contributors. More interestingly, we observe that there are internal developers that only contributed once (e.g., for hubot, 65% of the internals are casual).
RQ2. Who faces a harder time to get the contributions accepted?
As aforementioned, we focus on understanding how pull requests of internal and external are received. We analyzed the reception in terms of acceptance rate and time to process the pull requests from internal and external members.
Regarding the acceptance rate, when studying the merged pull requests (the accepted ones) in RQ1, we can see that both groups are also fairly active in all the five projects analyzed. We can observe, though, different patterns depending on the project. For example, for linguist, we can see that the number of pull requests from externals outperforms those from employees by far, and for every month. However, analyzing the closed but unmerged pull requests (the ones that were not accepted), we could notice that many external developers are having a hard time attempting to get their contributions accepted. This is noticeable in the second column of graphics in Fig. 2. In Table 2, we could confirm that most of the unmerged (closed) pull requests were done by external developers for 4 out of 5 projects (p value ≤ 0.001), with a medium or large (negative) effect size. A possible explanation is that employees work on critical and follow project directions (defined inside the company), while external submissions are, sometimes, motivated by specific needs, not necessarily aligned with the project’s direction.
In terms of time to process the pull requests, we analyzed the number of days taken between when the pull request was opened to when the pull request was merged. On average, pull requests filled by externals take 11.37 days to be processed (min: 0, max: 1144, 3rd quartile: 5, std deviation: 55). In comparison, pull requests from internals take 2.61 days (min: 0, max: 558, 3rd quartile: 1, std deviation: 18). Figure 4 shows the average number of days for each studied project.
As we can see in the figure, for all studied projects, on average, pull requests submitted by internals are process faster than the ones submitted by externals; a small effect size confirmed this trend (p value = 0.001, delta = 0.243). In particular, projects hubot and linguist are the ones that take more time to process pull requests, either from internals (333 and 426 days for hubot and linguist, respectively) or externals (1144 and 832 days for hubot and linguist, respectively). To better understand why these pull requests made by externals are taking too much time to be processed, we investigated the ones that lasted the most.
The pull request #678 submitted to hubot project was aimed to improve the documentation (it adds 32 lines in a Markdown file); five commits had been made to this pull request. Although project maintainers needed some time to review the contribution (the final modification suggested was about 300 days after the pull request was created), it seems that the pull request was forgotten, and only 2 years after the last change was made, another project maintainer passed through the pull request and merged the patch. On the other hand, the pull request #2070 submitted to the project linguist is a bit more complex. It was aimed to introduce PEP8 support, which is the code convention for writing Python code. Similar to the previous pull request, in this one, the maintainers also seem to forgot to follow-up with the code review. The external member brought back the attention to this pull request, mentioning: “I’m recalling this pull request has been open for over a year now (wow, nearly two, time flies), is there anything I can do to help it being merged into master aside from fixing the conflicts that have arisen since its opening?”. Four months after this message, another maintainer provided additional comments, and 1 month after the pull request was merged.
RQ3. Are externals more participative in the pull request review cycle?
In this RQ, we are interested in exploring how internal and external members participate in the process by both commenting/discussing pull requests and acting as integrators.
We first investigated how internal and external contributors differ in terms of the number of comments received during the code review of a pull request. Figure 5 shows the distribution of this metric.
As we can see, both groups receive comments on their pull requests, with external developers receiving more in most of the projects. Although internal developers might be more aware of project domain, the integration process, and their peers, they face a similar pull request review process (in terms of receiving comments), when compared to external developers. By analyzing Table 5, it is possible to confirm what is shown in Fig. 5: external developers receive more comments than internal developers (p value < 0.01 for four out of 5 projects, with small and medium effect size). This finding, to some extent, shows that our studied projects welcome external developers, by providing comments, which might be used for reviewing, guiding, and supporting developers getting their changes merged.
To understand the participation in the review cycle, we also studied whether the integrator role (the developer that integrates a pull request play) is performed by an internal or by an external member. Figure 6 shows the percentage of pull request processed by internals and externals members.
As we can see, the majority of the pull requests submitted to projects atom and electron are processed by internal developers (83 and 94%, respectively). However, for the remaining projects, the number of pull requests processed by external developers is indeed greater than the ones processed by internal developers. In particular, project linguist is an extreme example, with 78% of the pull requests being processed by external developers. However, after a closer look at the data, we found that few integrators are responsible for processing the majority of the pull requests. For instance, two internal integrators processed 85% of the pull requests submitted to project electron. Figure 7 shows a different perspective: the percentage of unique integrators that are internal or external developers.
The number of unique integrators for both kind of contributors is roughly similar in four out of the five analyzed project (e.g., the linguist project has 15 internal integrators and 17 external). The only exception to this trend is the project hubot, in which 11 (78%) of the integrators are external developers (which corroborates with the findings of the “RQ1.1. Are internals the top contributors of company-owned OSS projects?” section, that indicates a large proportion of internals are casual contributors for this particular project). Regarding the amount of work devoted to each kind of contributor (either internal or external), we observed that internal integrators processed more pull requests on projects atom, hubot, and electron. In particular, internal integrators of project electron processed 28 × more pull requests than their counterparts. Moreover, although the project hubot has more unique external integrators (11 externals and 3 internals), internal integrators are responsible for managing the majority of the pull requests (internals integrators processed 3 × more than external ones). On the other hand, on projects linguist and git-lfs, external integrators processed more pull requests than internals (3.26 × and 1.82 ×, respectively).
Additionally, we also investigated the proportion of pull requests submitted by internals that are also processed by internals (and vice-versa). We observed that 86.4% of the pull requests submitted by internals are also processed by internals. In comparison, 55.4% of the pull requests submitted by externals are also processed by externals.
RQ4. What are the contributions’ characteristics made by externals?
To better understand the characteristics of the accepted contributions, we conducted a qualitative analysis aimed at investigating the reasons for pull request acceptance, in particular, the ones proposed by external members.
For the atom project, before creating a pull request, internal developers create an issue that describes what are the project needs. Therefore, most of the pull requests proposed are accepted because internal developers were expecting it. For externals, pull requests that fix documentation problems are the most common ones (we found 27 instances of them). Some examples include broken URLFootnote 15, not enough informationFootnote 16, and code commentsFootnote 17. Notwithstanding, non-trivial code changes often come with a detailed description (images are common). We found a similar pattern for hubot. Most of the pull requests from external developers are related to documentation issuesFootnote 18, although complex code changes existFootnote 19. Finally, these two projects seem to welcome external users: they not only answer most of the requests from external developers, but they also guide their contributions to an acceptable state (as mentioned before, providing comments to improve the pull request).
In addition, as presented in Fig. 8, contributions from external developers are, in general, slightly shorter than internal ones in terms of lines added, lines removed, and files changed. For electron, for example, internal developers added 173,319 lines in total (mean = 130.51 lines per pull request, median = 19.5, q3 = 71.25, stdev = 630.50) and changed 10,092 files (mean = 7.60 files per pull request, median = 3, q3 = 6, stdev = 20.57), while external added 150,667 lines (mean = 75.30 lines per pull request, median = 12, q3 = 52, stdev = 267.56) and changed a total of 8067 files (mean = 4.03 files per pull request, median = 1, q3 = 4, stdev = 10.52).
As one can observe in Table 6, in general, internal developers indeed include more files that external ones in all analyzed projects. For number of deleted lines, this does not hold true for project hubot; for additions, there is no statistically significance for both hubot and linguist. Overall, we can see that both internal and external contributions are small (few files, and small additions and deletions). As noted elsewhere, smaller changes are more likely to be accepted  and can also reduce the chance of breaking the continuous integration build .
By observing Fig. 9, we also notice that external developers’ pull requests are also smaller in terms of the number of commits. Single-commit pull requests are rather common, accounting for more than 50% of the pull requests received from externals (overall, and for each project). This is expected since shorter contributions (mainly documentation and typo fixes) are made in single files. For internals, we can observe a higher number of commit per pull request—which can be noticed by comparing the median and the whiskers. This was statistically confirmed for all projects (p values ≪ 0.01), with small effect size for all projects, except for hubot in which we found a medium effect size (delta = 0.350). of commits are not common. This finding suggests that both groups follow well-known guidelines for contributing to OSS (small commits and few commits per pull request [10, 17]).
The main takeaways
External developers are welcome. Our results showed that the external community is supporting the companies maintaining the project by means of contributing to them. In particular, we found cases which external members play crucial roles in the projects, such as reviewing and integrating pull requests. This could only be possible because the studied projects welcome external members (which is not always the case of open-source software ). We further support this claim by inspecting welcoming-community features Footnote 20 available in the studied projects. All of the studied projects present a description, a README.md file, a Code of Conduct file, a CONTRIBUTING.md file, and a license file.
External developers still need guidance. Some projects tag the issues to make it easier for externals to find a task to solve (including atom, electron, and linguist which provide specific tags for newcomer-friendly tasks). However, given the high number of unmerged pull requests from external developers (Fig. 2), external developers have to understand the project’s direction and follow its guidelines when submitting a pull request; otherwise, their contributions are more likely not to be accepted .
Few external developers become long-term contributors. Even though we found external developers supporting the studied projects, few of them have a long-term contribution history (the only exceptions are the outliers). As one can observe in Fig. 3, the majority of external developers place a single contribution to the projects and never show up again. For some projects (hubot and linguist, in particular), even internal developers do not place too many pull requests. However, when looking from a different perspective, the total number of pull requests placed by external developers is greater than those submitted by employees, as it can be noticed from Table 1. Similarly, there are projects with small participation from employees (although they company keep contributing to it). This result might indicate that the company-owned project is now a community effort.
External developers can wear the integrator hat. Although integrators are usually employees, we also found externals that play this role, which indicates a high involvement from the external community in company-owned OSS projects. However, when analyzing atom, we could find external developers who are in charge of triaging and commenting on issues (who are also among the top contributors). These externals describe themselves as “@atom community volunteer” or “@atom maintainer.” Therefore, further research is needed to understand what are the actual roles played by external and internal developers in this kind of project. Figuring out the boundaries of responsibilities is an interesting future direction for this research that can benefit companies and communities.
From previous studies on casual contributors  and quasi-contributors , we found out that the main reason for a developer to place a contribution to a project is to “scratch his/her own itch.” In many cases, this motivation was triggered by the company where the developer worked. We hypothesize that this can be the case for many contributors to these company-owned projects. Interestingly, we found cases in which developers voluntarily contribute, for a long period. It is the case of one of the top 10 contributors of atom, who, in his personal home page, mention that “In my free time I contribute to Atom, GitHub’s text editor, as one of the community maintainers of the project.” We found similar when analyzing the top contributors of git-lfs and electron. This might suggest that altruism is still present in open-source communities.
However, we are not aware of the motivations that drive external contributors that volunteer to these projects. One can hypothesize that this can be a way to showcase their skills to the project maintainers, so they can be hired by the company. However, an interesting point of discussion is whether the company is indeed interested in hiring key or highly productive members of the external community. From the hiring perspective, observing potential candidates contributing to the project can be seen as a live screening process, in which the company can cherry pick good contributors. From a community perspective, taking “core external contributors” can harm the externals structure, since the role they play outside the company can change. Moreover, it is also important to understand the goals of the company when they open their code and if they are willing to pay for someone who is already contributing voluntarily to the project. Although we did not investigate this specific point (using the community contributions as a hiring area), we believe that our findings might foster other researchers to conduct more research, especially from the perspective of the company willing to make that move.
In this section, we discuss some of the studies that relate with the scope of this work.
Commercial involvement/paid developers in OSS projects
It is possible to notice an increase in the participation of companies in OSS and in the contributions of employees paid to work on OSS projects [3, 20]. Homscheid and Schaarschmidt  investigated the role of external developers who are paid by third-party companies (“firm-sponsored developers”). By conducting a survey with Linux developers, they found that the perceived external reputation of the employing organization reduces turnover intention towards the company, and the perceived own reputation dampens turnover intention towards the OSS community. Atiq and Tripathi  explored how the developers perceive the differences in rewards in OSS projects, by analyzing their opinion on how the project’s financial resources influence the progress of the project. By analyzing an open question sent to OSS developers, they found that OSS projects where only some people get directly paid may fail if they are mismanaged.
Riehle et al.  analyzed more than 5000 active OSS projects, from 2000 to 2007, and found that around 50% of all contributions have been paid work. Their perspective is that any contribution made from Monday to Friday, between 9am and 5pm are paid contributions. However, as highlighted by Crowston , even employed developers are not paid directly by the projects to which they contribute, so from the project perspective, they are volunteers. Thus, differently from Riehle and colleagues, we analyzed the amount of effort put by the developers of the company that open-sourced the project—directly paid by the “owner”—comparing with the contributions made by any external developer. Our results showed that, for the analyzed projects, 45% of the pull requests are placed by internal developers (GitHub employees). The results seem to be in line with previous work, except for the fact that the concept of paid developers used previously, is not the same as the concept of external developers applied here.
In a previous work, we studied the challenges that software companies face when ope-sourcing their software products . In this work, we studied 8 well-known proprietary projects that kept their software history while transitioning to open-source. Two of these eight projects were also studied in this current work: atom and hubot. Analyzing the software history, we observed that external developers often onboard company-owned OSS projects in the very first weeks after open-sourcing, but abandon few commits ahead (the so-called newcomers’ wave). In this work, we also observed that the majority of external contributors are casual ones (e.g., have contributed at most with one commit). We also observed a burst in the number of issues and pull requests right after open-sourcing the software project. In a follow-up study, we studied the reasons that motivated 50 company-owned OSS projects to delete their software history before going open-source . Among the reasons, we observed that code that contains sensitive information (e.g., user credentials) is one of the most common reasons for deleting the history, although other so far uncommon reasons such as the lawyers having to inspect each commit was also observed.
Casual contributors’ phenomenon
In a study such as this, there are always many limitations and threats to validity.
Our first limitation is regarding the number of projects studied. Although one might consider difficult to draw conclusions based on five projects owned by the same software company, we argue that our intention was not to picture a definitive landscape of company-owned OSS projects. Instead, our intention was two-fold: (1) to call the attention to this relevant yet not fully understood problem and (2) to help us to better evaluate the approach used to classify the contributors manually. With our approach, we expect similar analysis can be conducted in the future when other aspects of company-owned open-source projects become relevant.
Moreover, we rely on our inference algorithm to verify whether a contributor is an internal or external one. We made use of a flag (site_admin) made available in the pull request to make this decision. We acknowledge that this can be a threat, since even relying on this flag, it is possible that some developers had left the company previously, so they would be incorrectly identified. Still, we got in touch with GitHub support regarding this issue and they mentioned that “not every employee will have that flag set as some employees choose not to make their affiliation with the company known.” To minimize this threat, we analyzed the profile of the top 10 external contributors (in terms of numbers of pull requests) and found that 12 of them left GitHub and were working in other companies. We classified these developers as external to conduct our analysis, reducing the threats.
For those classified as internal developers, all listed themselves as GitHub staff in their profile. Still, we got in touch with GitHub representatives whether this flag can be employed in other OSS projects, and they answered that “The site_admin flag is only true for GitHub employees.”
One might argue that we could differentiate paid and non-paid developers by looking at the email address used for their contributions (if it is a company email, then the developer is a paid one). We argue that many developers are free to choose whenever email account they want to use at the git repository. Therefore, a paid developer can also contribute with her personal email account (which would represent a false positive). We use the site_admin flag to mitigate this threat.
Another limitation is related to the GitHub API. We found some inconsistencies while mining data and metadata of the studied projects. For instance, in the API, some pull requests appear with strange characteristics such as zero additions, zero deletions in zero filesFootnote 21, even though the original pull request on the web interface does have additions and deletionsFootnote 22. We found 1107 pull requests with this characteristic. Instead of discarding them, we manually verified the number of changes in the web interfaced and fixed these numbers in our dataset. However, we also found 8 pull requests with zero changes in the GitHub API and on its web interface. We removed these pull requests.
In this paper, we analyzed the contribution behavior of internal and external developers of five well-known company-owned open-source projects: atom, electron, git-lfs, linguist, and hubot projects. We found that these projects are very receptive for external developers: many externals play important role in the studied projects, such as reviewing and integrating pull requests. Considering all the projects, internal developers are responsible for 43.3% of the pull requests performed (external developers placed 56.7%). Analyzing just hubot project, we observed that only 18% of the pull requests had been placed by internal developers. However, the absolute number of external members is many more times greater than internal ones. As a consequence, many externals are casual contributors (i.e., developers that only contributed once (although we also identified internals that are also casual contributors).
These differences indicate that it is necessary to analyze each project individually to better understand this phenomenon, since there can be different factors influencing the behavior, like the priority the company is giving to the project; the project attractiveness; and vendors who make use of the project. We also noticed that contributions from external developers are shorter than those sent by internal ones and that external developers contribute more documentation related pull request, although we also found complex code pull request.
This study can be a fruitful research area which can benefit companies willing to open-source their code and developers who are afraid of contributing to recently open-sourced projects. For future work, we plan to expand the scope of this study by investigating additional OSS projects. In addition, we plan to conduct surveys and interviews with developers in order to cross-validate the findings from the repositories.
Avelino G, Passos LT, Hora AC, Valente MT (2016) A novel approach for estimating truck factors In: 24th IEEE International Conference on Program Comprehension, ICPC 2016, Austin, TX, USA, May 16-17, 2016, 1–10.. IEEE, Washington, DC.
Coelho J, Valente MT (2017) Why modern open source projects fail In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017, 186–196.. IEEE, Washington, DC.
Riehle D, Riemer P, Kolassa C, Schmidt M (2014) Paid vs. volunteer work in open source In: HICSS ’ 14, 3286–3295.. IEEE, Washington, DC. https://doi.org/10.1109/HICSS.2014.407.
Pinto G, Steinmacher I, Gerosa MA (2018) Leaving behind the software history when transitioning to open source: reasons and implications. Proceedings, Athens In: Open Source Systems: Enterprise Software and Solutions - 14th IFIP WG 2.13 International Conference, OSS 2018, June 8-10, 2018, 50–60.
Dias LF, Santos J, Steinmacher I, Pinto G (2017) Who drives company-owned OSS projects: employees or volunteers? In: V Workshop on Software Visualization, Evolution and Maintenance, 10.. Sociedade Brasileira de ComputaÃğÃčo, Porto Alegre.
Wilks DS (2011) Statistical methods in the atmospheric sciences. Academic Press, Cambridge. https://books.google.com.br/books?id=IJuCVtQ0ySIC.
Romano J, Kromrey J, Coraggio J, Skowronek J (2006) Should we really be using t-test and Cohen’s d for evaluating group differences on the NSSE and other surveys? In: Annual Meeting of the Florida Association of Institutional Research.
Steinmacher I, Wiese IS, Conte T, Gerosa MA, Redmiles D (2014) The hard life of open source software project newcomerCHASE ’14 In: Proceedings of the International Workshop on Cooperative and Human Aspects of Software Engineering, 72–78.. ACM, New York.
Grissom RJ, Kim JJ (2005) Effect sizes for research: univariate and multivariate applications. Routledge, Abingdon.
Gousios G, Zaidman A, Storey MD, van Deursen A (2015) Work practices and challenges in pull-based development: the integrator’s perspective In: 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, May 16-24, 2015, Volume 1, 358–368.. IEEE Press, Piscataway.
Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: an empirical study of microsoft windows In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, ICSE 2010, 1-8 May 2010, 495–504.. ACM, New York.
Potvin R, Levenberg J (2016) Why google stores billions of lines of code in a single repository. Commun ACM 59(7):78–87.
Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and Mozilla. ACM Trans Softw Eng Methodol 11(3):309–346.
Pinto G, Steinmacher I, Gerosa MA (2016) More common than you think: an in-depth study of casual contributors In: IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016, March 14-18, 2016 - Volume 1, 112–123.. IEEE Press, Piscataway.
Lee A, Carver JC, Bosu A (2017) Understanding the impressions, motivations, and barriers of one time code contributors to FLOSS projects: a survey In: Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, May 20-28, 2017, 187–197, Buenos Aires.
Rebouças M, Santos RO, Pinto G, Castor F (2017) How does contributors’ involvement influence the build status of an open-source software project? MSR ’17 In: Proceedings of the 14th International Conference on Mining Software Repositories, 475–478.. IEEE Press, Piscataway.
Gousios G, Pinzger M, van Deursen A (2014) An exploratory study of the pull-based software development model In: 36th International Conference on Software Engineering, ICSE ’14, May 31 - June 07, 2014, 345–355.. ACM, New York.
Dias LF, Steinmacher I, Pinto G, da Costa DA, Gerosa MA (2016) How does the shift to GitHub impact project collaboration? In: 2016 IEEE International Conference on Software Maintenance and Evolution, ICSME 2016, Raleigh, NC, USA, October 2-7, 2016, 473–477.. IEEE, Washington, DC.
Steinmacher I, Pinto G, Wiese IS, Gerosa MA (2018) Almost there: a study on quasi-contributors in open-source software projects In: Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, May 27 - June 03, 2018, 256–266.. ACM, New York.
Zhou M, Mockus A, Ma X, Zhang L, Mei H (2016) Inflow and retention in OSS communities with commercial involvement: a case study of three hybrid projects. ACM TOSEM 25(2):13.
Homscheid D, Schaarschmidt M (2016) Between organization and community: investigating turnover intention factors of firm-sponsored open source software developers In: WebSci ’16, 336–337.. ACM, New York.
Atiq A, Tripathi A (2016) Impact of financial benefits on open source software sustainability In: 37th International Conference on Information Systems (ICIS 2016), 10.. Association for Information Systems, Atlanta.
Crowston K (2016) Open source technology development. In: Bainbridge W Roco M (eds)Handbook of Science and Technology Convergence, 475–486.. Springer, Cham.
Pinto G, Steinmacher I, Dias LF, Gerosa M (2018) On the challenges of open-sourcing proprietary software projects. Empir Softw Eng. https://doi.org/10.1007/s10664-018-9609-6.
Pham R, Singer L, Liskin O, Figueira Filho F, Schneider K (2013) Creating a shared understanding of testing culture on a social coding siteICSE ’13 In: Proceedings of the 2013 International Conference on Software Engineering, 112–121.. IEEE, Washington, DC.
Pham R, Singer L, Schneider K (2013) Building test suites in social coding sites by leveraging drive-by commits In: 35th International Conference on Software Engineering, ICSE ’13, San Francisco, CA, USA, May 18-26, 2013, 1209–1212.. IEEE, Washington, DC.
Vasilescu B, Filkov V, Serebrenik A (2015) Perceptions of diversity on GitHub: a user survey In: 8th IEEE/ACM International Workshop on Cooperative and Human Aspects of Software Engineering, CHASE 2015, May 18, 2015, 50–56.. IEEE, Washington, DC.
Lee A, Carver JC (2017) Are one-time contributors different? A comparison to core and periphery developers in floss repositories In: Proceedings of the 11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 1–10.. IEEE Press, Piscataway. https://doi.org/10.1109/ESEM.2017.7.
Barcomb A (2016) Episodic volunteering in open source communities EASE ’16 In: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering, 3–133.. ACM, New York. http://doi.acm.org/10.1145/2915970.2915972.
We thank the reviewers for their valuable comments.
This work is supported by the CNPq (grants nos. 406308/2016-0 and 430642/2016-4), PROPESP/UFPA, and FAPESP (grant no. 2015/24527-3).
Availability of data and materials
All data used in this paper can be found online at https://github.com/fronchetti/JBCS-2018.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.