Characterizing the hyperspecialists in the context of crowdsourcing software development

de Neira, Anderson Bergamini; Steinmacher, Igor; Wiese, Igor Scaliante

doi:10.1186/s13173-018-0082-2

Research
Open access
Published: 22 December 2018

Characterizing the hyperspecialists in the context of crowdsourcing software development

Anderson Bergamini de Neira¹,
Igor Steinmacher^2,3 &
Igor Scaliante Wiese²

Journal of the Brazilian Computer Society volume 24, Article number: 17 (2018) Cite this article

2551 Accesses
2 Citations
Metrics details

Abstract

Companies around the world use crowdsourcing platforms to complete simple tasks, collect product ideas, and launch advertising campaigns. Recently, crowdsourcing has also been used for software development to run tests, fix small defects, or perform small coding tasks. Among the pillars upholding the crowdsourcing business model are the platform participants, as they are responsible for accomplishing the requested tasks. Since successful crowdsourcing heavily relies on attracting and retaining participants, it is essential to understand how they behave. This exploratory study aims to understand a specific contributor profile: hyperspecialists. We analyzed developers’ participation on challenges in two ways. First, we analyzed the type of challenge that 664 Topcoder platform developers participated in during the first 18 months of their participation. Second, we focused on the profile of users who had more collaborations in the development challenges. After quantitative analysis, we observed that, in general, users who do not stop participating have behavioral traits that indicate hyper-specialization, since they participate in the majority of the same types of challenge. An interesting, though troubling, finding was the high dropout rate on the platform: 66% of participants discontinued their participation during the study period. The results also showed that hyperspecialization can be observed in terms of technologies required in the development challenges. We found that 60% of the 2,086 developers analyzed participated in at least 75% of challenges that required the same technology. We found hyperspecialists and non-specialists significantly differ in behavior and characteristics, including hyperspecialists’ lower winning rate when compared to non-specialists.

Introduction

A new business model is gaining steam in the software development industry and drawing the attention of companies [1, 2], developers [3, 4], and researchers [5, 6]. Crowdsourcing for software development benefits from the pool of globally distributed developers to accomplish tasks for companies from all around the world [7]. Crowdsourcing provides engaged participants with a way to earn money, notoriety, and even professional opportunities [8]. Companies find in crowdsourcing an economical and reliable way to develop software, relying on the “wisdom of crowds” [9] to accomplish tasks.

To maintain a prosperous and advantageous environment for all those involved in crowdsourcing, a high level of interaction must occur among companies (that need software artifacts), developers (who are able to produce these artifacts), and platforms (that manage the needs) [10]. To create a sustainable environment, the onboarding and retention of new developers to these platforms must be ensured. Due to the importance of the developers to the success of the crowdsourcing model, researchers have been discussing characteristics of contributions and the profile of crowdsourcing developers [10, 11]. Although the studies found in the literature analyze some characteristics about the different ways to contribute [1, 8], and there is an increasing number of studies related to crowdsourcing for software [12], much is still unknown about the contributor profile and the behavior of developers working in this type of environment.

In 2011, Malone et al. [13] predicted that we would enter the age of hyperspecialization. For these authors, hyperspecialization means “breaking work previously done by one person into more specialized pieces done by several people.” Still, for the authors, participants would follow this concept of hyperspecialization, or the in-depth knowledge of some specific subjects, which goes against the current full-stack developers. For example, full-stack developers may have to perform a task in which they are not completely proficient, what may lead to delays and lower quality solutions. On the other hand, with available hyperspecialists, more qualified people can handle these tasks, delivering faster and higher quality results. Developers with this hyperspecialist profile benefit from the growth of crowdsourcing; they make it possible for companies to count on a global pool of specialists at a low cost, since it is not always possible to find professionals with specific skills in the region or the high cost can make unfeasible to hire them.

The goal of this work is to characterize the hyperspecialist profile in crowdsourcing software development environments. The Topcoder platform was chosen as our case study because it is one of the largest crowdsourcing-based software development platforms in the world, with more than one million registered participants^{Footnote 1}. It received more than 22,000 assignments and distributed more than 80 million dollars in reward since its founding^{Footnote 2}. Companies with international reputations use the Topcoder platform such as NASA, IBM, eBay, and Honeywell.

In Topcoder, the companies create the tasks that reflect their development needs, providing details about the problem, deadlines, and reward value. After that, the tasks are made available in the platform and the developers can register to work on them. The registered developers may then work on producing the artifacts, ultimately submitting them to accomplish the task. The artifacts are reviewed following a predefined criteria set, and the results are then published. After the appeal period, the owner of the winning submission is asked to follow up, providing potential changes and revisions on their artifacts. Revising and delivering a new version of the artifacts with the suggested fixes is part of the process of guaranteeing the quality of the deliverable. During the whole process, companies can opt to hire experienced members of the platform, the co-pilots. These co-pilots support the interaction between the company and developers, helping developers throughout the task and reviewing and following up the process on behalf of the company.

Based on this, in this study, our goal was to answer the following main RQ (research question) :

RQ. How is the hyperspecialization phenomenon observed in the TopCoder platform?

Our study was conducted in two phases. For both phases, we relied on data collected from Topcoder using their public API (application programming interface). In the first phase, we investigated the initial 18 months of 664 developers to verify whether we could identify the hyperspecialist phenomenon in terms of the type of challenges the developers participate in. In Topcoder, the challenges are classified into three different types: development, design, and data science. We used this classification to conduct our analysis. Preliminary results indicate that 94% of the users who contributed during the analyzed period continued to contribute to similar challenges, indicating the possible existence of the hyperspecialization mentioned by Malone et al. [13]. Another important result was the high dropout rate found: about 66% of the participants participating in at least one challenge on the platform stopped collaborating.

In the second phase, we decided to further explore the phenomenon by focusing on the challenges classified as “development.” We chose this specific type since companies propose these challenges (as opposed to data science, which are proposed by the Topcoder, and mainly related to marathon-like challenges) and give financial rewards. We analyzed all challenges between August 2003 and September 2016 in the “development” category, resulting in a total of 18,659, with the participation of 2086 developers. The obtained data was quantitatively analyzed. The results indicate that 60% of the 2086 developers were specialists (since at least 75% of the challenges they submitted require the same technology). A great majority of the specialists contributed only to challenges requiring the technology in which they specialize. We also found a high correlation between the number of challenges available for technologies and the number of specialists attracted by the challenges. Therefore, technologies that are required in most part of the tasks at Topcoder, like Java, Javascript,.NET, HTML (HyperText Markup Language), and iOS, also present a high number of specialist participants. Interestingly, we could not identify hyperspecialists for important technologies like MySQL, PostgreSQL, or Docker.

The main contribution of this work is the characterization of the hyperspecialization phenomenon in the context of software development crowdsourcing, considering different actions of specialists in Topcoder platform. We believe that our results can aid crowdsourcing-based software engineering stakeholders to better understand how crowdsourcing users interact with these platforms and how to benefit from the hyperspecialists. By understanding how the hyperspecialists behave, platforms could create challenges that attract specialists, which can ultimately improve the quality of the software artifacts received. Analysis of historical data could also inform decisions on what kind of challenges would inspire contributions by specialists.

The rest of this paper is structured as follows. The “Related work” section presents the related work. In the “Overall research setting” section, we report the high-level description of the research method. In the “Phase 1: A high-level analysis of hyperspecialization” section, we present the details about the phase 1 of this study, including method and results, while in the “Phase 2: Hyperspecialization in development challenges” section, we present the details about the phase 2. A discussion about our results is presented in the “Discussions” section. “Limitations and threats to validity” section presents the limitations and potential threats to validity, and in the “Conclusion” section, we draw conclusions.

Related work

In many domains, crowdsourcing has become an advantageous option for completing tasks that generally require intensive human interaction. In the scope of software development, this phenomenon has also been observed in recent years. Mao et al. [7] affirm that crowdsourcing for software development can be understood as the action of undertaking any task in the field of software engineering outside the company. These demands are made available to a generally large group of people, enabling them to decide on which tasks to work.

For LaToza and van der Hoek [10], platforms that implement crowdsourcing for software development can operate in three distinct ways: peer production, competition, and the micro-task model. In the first approach, the participants collaborate to build a single artifact and, in general, receive no payment for these collaborations. One example of this approach is the open-source model. In the competition model, companies describe their needs in the form of tasks and open a public competition for the best contribution, for which they usually pay the developer. Finally, in the microtasks model, the needs of the client company are broken down into small (micro) tasks that can be completed in minutes. The owner of the contribution receives the reward offered by the work, which is later attached to the results of the other tasks completing the company’s demand.

According to Hosseine et al. [14], the success of crowdsourcing rests on four pillars: the company, the platform, the tasks, and the workers. Despite the importance of workers, many aspects related to the interaction and behavior of users on the platforms are not yet understood.

Some recent studies analyze the behavioral traces of developers in crowdsourcing platforms. For example, Gadiraju [15] suggested analyzing and classifying users who breaks the rules of the platform, or who are not unable to provide good contributions. In a different line, Gray et al. [16] describe cases in which crowdworkers help each other, collaborating to keep the crowd motivated to continue contributing to the platform. Although these papers analyze behavior in crowdsourcing, none of them focuses on better understanding the hyperspecialization phenomena.

The crowdsourcing model leverages community members’ diversity of experiences and knowledge to attract companies that invest time and money in providing tasks for participants to complete. In order to improve the quality of the submissions received in a task, some authors focus their efforts on creating recommendation approaches to suggest the most appropriate users to participate on a task [1, 4, 17]. These approaches use information such as reward value, required skills, task description and creation, and closure dates to build member participation profiles which allow the model to recommend the best fit for the tasks.

Despite the importance of recommending the most appropriate people for a task, Karim et al. [4] focused on a way to identify the people who would not win a challenge. Doing this saves time and effort of participants and reviewers, reduces competition, and helps participants to be available to work on tasks that are a better fit for them. When recommending winners, Karim et al. [4] achieved a recall of 94.07%; their goal was to predict a participant who would be among the most well-suited for the task. The authors also showed that using the recommendation in a 30-day period, it would be possible to save about 3.5 days for more experienced members and about 4.6 days for less experienced members. Similar to the aforementioned studies, we leverage data extracted from crowdsourcing platforms, like skills, challenge participation, number of winning submissions, etc. However, in contrast to the literature, we focus on analyzing one specific profile: the hyperspecialist.

Other studies analyze the contribution profile of software crowdsourcing participants. For example, Saremi and Yang [8] mention that more experienced members of Topcoder platform are more prone to work on tasks from internationally renowned companies or with high rewards. They point out that more experienced people produce more, increasing their odds to win challenges. Mao et al. [1] report that the most qualified members of Topcoder register as soon as the task is made available, which ultimately inhibits the registration of other top-level competitors.

In line with the previously mentioned studies—which refer to the characteristics of users and tasks of the platform—this work also aims to identify characteristics of users of crowdsourcing. However, the phenomena of hyperspecialization foreseen by Malone et al. [13] was neglected by the existing literature. Therefore, to complement the state-of-the-art, in this paper, we are interested in examining the behavior and characteristics of the so-called hyperspecialists. We believe that this classification can help in improving the existing mechanisms of recommendation, as well as benefit companies and platform maintainers who can better understand this specific profile.

Overall research setting

As mentioned in the “Introduction” section, this work was conducted in two phases, each of which involved its own data collection, curating, and analysis. Figure 1 presents a high-level snapshot of the method followed for both phases, which this section describes, including details on data collection and analysis for each phase (“Phase 1: A high-level analysis of hyperspecialization” section for phase 1 and “Phase 2: Hyperspecialization in development challenges” section for phase 2).

In phase 1, we conducted an exploratory study to explore the hyperspecialists phenomenon at a high level, analyzing the behavior of the participants in terms of types of challenges chosen. We collected data from more than 350,000 Topcoder users and randomly sampled 664 to conduct phase 1. For each user, we focused on their first 18 months of interaction, starting from the date of each users’ first challenge. We split the 18-month period into three 6-month periods. We counted the number of challenges that each user registered for in each these periods, classifying the participation by type (in TopCoder there are three main types of challenge: design, development, and data challenge). We then analyzed the hyperspecialization phenomenon in terms of type of challenge, verifying whether their participation changed or not during the three periods, answering RQ1-2.

Given the promising results of phase 1, we decided to explore one specific kind of challenge in more depth. Therefore, in phase 2, we conducted a more in-depth analysis of the development challenges focusing on competition tasks with financial rewards. We analyzed all the developers who submitted responses to at least three challenges classified as development. We collected all the technologies that had been required by the challenges that these developers submitted to. Then, we analyzed whether the developers participated in challenges recurrently requiring a specific technology (hyperspecialists) or not and compared the groups to answer RQ3–RQ6.

For the sake of readability, we present the method and results for each of the phases separately: phase 1 in the next section and phase 2 in the “Phase 2: Hyperspecialization in development challenges” section.

Phase 1: A high-level analysis of hyperspecialization

The goal of this phase was to preliminarily explore the phenomenon of hyperspecialization in crowdsourcing for software development. In this phase, we aimed to verify the possible manifestation of hyperspecialization in a broad context. To achieve this, we analyzed how Topcoder user’s participation evolved over time according to the type of challenge. The details about the method are presented in the following.

Research method

We defined the following research questions to guide this phase:

RQ1. Is it possible to identify hyperspecialists based on the challenge type?
RQ2. What is the relationship between the number of challenges and participant abandonment?

The method followed in this phase to answer the research questions is presented in Fig. 2. Data collection and filtering (steps 1–2) and analysis (steps 2–4) are specified in the subsections below.

Data collection and filtering

The data collection (Fig. 2, Step 1) was performed using a public API offered by the Topcoder platform. Firstly, we queried the challenges’ API^{Footnote 3} to retrieve the information from past challenges and collect the usernames, so it would be possible to obtain detailed information about each challenge in which the users participated. By querying the users’ API using the usernames previously collected, we obtained data about the users’ participation in the challenges^{Footnote 4} and other user information^{Footnote 5}.

In this phase, we collected and made use of the relationship between users and challenges users’ participation in the challenges. All necessary data was stored in a local database to facilitate analysis. Data collection was performed from August 2016 to January 2017. The database stored all the participation in the various types of challenges of about 350,000 platform users.

In step 2 (Fig. 2), we defined our population, imposing the following criteria: (i) users should have participated in at least one challenge and (ii) the first challenge date should be at least 18 months before the beginning of the data collection, since this was the timeframe in which we analyzed the users. For each of these users, we collected the 18 months of their participation, i.e., each participant has a specific 18-month timeline. Among the users that met the criteria, 664 were randomly sampled. Sampling population size was defined with a confidence level of 99% and a margin of error of 5%.

Data analysis

The data analysis included two other steps. In step 3, we counted the number of challenges in which each user in our sample participated. Since it was a preliminary analysis, submissions to the challenges were not mandatory; we only analyzed the registration in the tasks. After this, we analyzed if the hyperspecialization was observed, considering the types of tasks chosen by the users. In Topcoder platform, tasks were classified into three major types: development, design, and data science. The purpose of this step was to conduct a temporal analysis for each user, checking the number of challenges that each user participated in, according to the type (development, design, or data science). This classification served as the basis for the analysis of the existence (or not) of hyperspecialization over time. We individually defined the timeline of each participant, considering the date that the users participated in their very first challenge. This timeline was set to 18 months, starting with the date of the first challenge. To analyze hyperspecialization, this period was split into three 6-month periods, as depicted in Fig. 2.

Based on the users’ timeline (split into 6-month periods), we classified each user as hyperspecialist or non-specialist for each period analyzed. We considered that a participant would be considered as a hyperspecialist if at least 75% of the challenges in which they participated were the same type. If the 75% threshold was not achieved, the participant was classified as “non-specialist.” In addition, we classified those users who did not participate in any challenge in a given 6-month period as “no contribution.” This only occurred in the second or third analysis period, since we only sampled developers with at least one challenge. The 75% threshold was defined by the authors, since we found no values in the literature that could be used for this purpose. We determined this value as fair to study the phenomenon in this preliminary work, since it is based on the distribution of the developers in the platform, yet we understand that this may pose a threat to validity.

In step 4, we analyzed user classification according to hyperspecialization over the three time periods. We verified whether there was a “change” or a “maintenance” of the participants’ specialty, comparing the initial 6-month period with the following periods. Through this comparison, a descriptive analysis of the data was conducted in order to verify if the hyperspecialization phenomenon was observed.

We also verified whether those users who kept contributing over the three periods varied in the number of challenges in which they participated. For this analysis, we used the ANOVA one-way repeated measure statistical test to compare three results from the observation of the same group of samples. In the context of this study, all the users who participated in the three periods (including non-specialist users) were selected. We tested the following null hypothesis (H₀): participation in the three periods is equal regarding the number of challenges.

We used the chi-square test to evaluate whether a low number of challenges related to abandonment or permanence in the platform. For this test, were created two groups. The first group was composed of the number of challenges that the participants who continued in the platform in the second semester, the second group was composed of the participants who abandoned the platform (did not take part in any challenge) in the second semester. The null hypothesis (H₀) is: the amount of participation in the challenges is not associated with the permanence or abandonment of the users in the platform.

Results

In this section, we present the results of phase 1, organized according to the research questions.

RQ1. Is it possible to identify hyperspecialists based on the type of the challenges?

Among the 664 participants analyzed, only 98 (14% of the sampled population) continued contributing throughout the three periods (18 months) of the study. Of these 98, 92 (93.8% of the participants who remained) kept contributing to the same type of challenge over the 18-month period. This result indicates that hyperspecialization may manifest itself in relation to the type of challenge. The number of hyperspecialists in the three periods according to the type of challenge was: 8 hyperspecialist users in development, 4 in design, and 80 in data science.

In addition to the above, we observed that 35 participants were absent in the second half of the analysis, but returned in the third. Among these participants, only 3 (8.6%) changed their specialty from the first to the third period—all of them were classified as data science specialists in the first semester, whereas in the third they were classified as development (2) and design (1) specialists. The remaining 32 (91.4%) were classified with the same specialty in both periods.

Interestingly, the users who kept contributing throughout the three periods differed in the number of contributions they made. To analyze this characteristic, we used the ANOVA test, comparing the number of challenges they participated in over the three periods. The result shows a difference in the number of challenges (F=6.07; p value = 0.003), rejecting H₀. The results of the multi-comparisons p values were adjusted using the Tukey method, which showed that the number of disputed challenges in the first period differed from the second period (t-ratio = 2.48; p value = 0.04). There was also a difference in the number of challenges between the first and third periods (t-ratio = 3.36; p value = 0.002). However, there is no evidence that the values for the second and third periods differ (t-ratio = 0.881; p value = 0.653). By means of this analysis, we verified that the contribution of users who contribute in all periods peaks in the first semester, reducing the number of contributions in the second period, which remains constant in the third (Fig. 3).

RQ2. What is the relationship between the number of challenges and participant abandonment

The fact that about 66% of our sample only contributed in the first analyzed period (defined from the first user participation) indicates the possible abandonment of their platform. This fact corroborates the results presented by Zanatta et al. [5], in which the authors present the barriers that newcomers face while attempting to participate in challenges in the platform. In our study, we observed that 272 of the 441 users who stopped contributing participated in only one challenge. The other 169 users who quit varied between two and 28 challenges (median = 3, standard deviation = 3.44). This characteristic may indicate at least five potential situations: (i) users did not adapt themselves to the platform standards (difficulty to find appropriate tasks, problems with interacting with the platform and with other users, among others), (ii) users lacked knowledge to complete tasks, (iii) inefficiency of the training methods proposed by the platform, (iv) users already achieved their goal to participate/train in some specific technology or to earn a given amount of money, or (v) users looking to make money quickly did not win the first challenges and thus invested their time without the expected “return of investment.”

In Fig. 4, it is possible to observe the distribution of the amount of participation of our sample in the first period (outliers are not presented for a better visualization). Looking at the boxplots, it is possible to notice that, in general, users who stop contributing (abandon) participate in fewer challenges than users who remain active on the platform in the following periods.

To validate the analysis of the boxplots and verify if there is a relationship between the participation amount and the abandonment/permanence in the platform, we performed a chi-square test. The result of the test indicates that the number of challenges that a user participated in is an indication of abandonment/permanence, rejecting H₀ (X²=197.18, p value = 0.001). However, this is only a preliminary analysis. Other studies still need to be conducted to better understand this phenomenon.

Phase 2: Hyperspecialization in development challenges

Given the promising results of phase 1, in which we evidenced the hyperspecialization phenomenon in terms of the type of challenge, in phase 2, we decided to further explore the hyperspecialization in a more specific context. We decided to focus on the challenges classified as “development.” We chose this specific type since these challenges are proposed by companies (as opposed to data science, which are mainly related to marathon-like challenges) and offer financial rewards. Thus, our goal in this phase was to characterize the hyperspecialists in the context of development challenges in Topcoder.