Industrial and OSS developers’ profiles: a family of experiments to evaluate a pioneering neuro-linguistic method for preferred representational systems automatic detection

Software projects use mailing lists as the primary tool for collaboration and coordination. Mailing lists can be an important source for extracting behavioral patterns in the software development. A new approach for that is the use of Neurolinguistic theory to determine what is the Preferred Representational cognitive System (PRS) of software engineers in that specific context. Different resources and cognitive channels are used by developers in order to achieve software understanding. An important question on this matter is: What types of representational systems are preferred by software engineers? This paper presents a psychometrically based neurolinguistic method to identify the PRS of software developers. Experimental evaluation of the approach was carried out in three experiments to assess the Preferred Representational System of developers at Industrial and OSS (Apache server and Postgresql) mailing lists. For the OSS projects, the results showed that the PRS scores of the top-committers clearly differ from the general population of the projects. For industry, the experiment showed that the developers indeed have a PRS. Finally, for both scenarios, the qualitative analysis also indicated that the PRS scores obtained are aligned with the developers’ profiles, considering that alignment is essential to effective communication within the team and enhances the development process due to a better software comprehension.


Introduction
Developing and maintaining software systems is an arduous task. Large systems are complex and difficult to understand. In order to understand them, the developer must construct a mental model of the software works and structure, i.e., its domain, architecture, and execution flow [1].
In the comprehension process, developers use different resources and representational systems, such as (1) examples, analogies, and code execution; (2) visual descriptions, diagrams, and graphic models of the system; and (3) textual descriptions and source code analyses. Clearly, these resources are complementary and may be combined. However, is there a Context-Specific Preferred Representational System (PRS)? Or, is there a preferred order or combination of the representational systems in the understanding process?
Visual resources, like diagrams and non-conventional visualization metaphors, are being increasingly used in software engineering [2]. Studies show that the way software engineers process those resources impacts on the success of that processing [3], for both text [4] and diagrams [5]. However, we do not know complete studies that evaluate what types of representational systems are preferred by software engineers.
This is a broad question in the sense that different people may have different preferences in different contexts. Actually, the conception that different representational ways for cognition exist is well accepted in the psychology area [6][7][8]. However, this statement has raised new theories such as Neuro-linguistic, which proposes the use of a PRS in specific contexts [9]. Internal mental processes such as problem-solving, memory, and language consist of visual, auditory, and kinesthetic representations that are engaged when people think about or engage in problems, tasks, or activities. Internal sensory representations are constantly being formed and activated. Whether making conversation, writing about a problem, or reading a book, internal representations have an impact on one's performance. The Preferred Representational System is the one that the person tends to use more than the others to create his/her internal representation.
Bandler and Grinder, Neuro-linguistic Programming (NLP) champions, claim that people say sensory-based words and phrases, or verbal cues, which indicate a contextspecific visual, kinesthetic, or auditory processing [9,10]. These affirmations divide researchers of cognitive psychology area. Some have not found evidences for the declarations [11]; hence, they were criticized by the lack of concept understanding [12]. Meanwhile, others have shown empirical scientific evidence and the need to expand researches [13,14].
Thus, motivated by the psychometric text analysis presented by Rigby and Hassan [15], we developed a psychometrically based neurolinguistic analysis tool. Our tool, NEUROMINER, uses Linguistic Inquiry and Word Count (LIWC) to classify developers' Preferred Representational Systems (PRS) from mailing lists of their projects. NEUROMINER combines text mining and statistic analysis techniques with NLP sensory-based words in order to classify programmers.
NEUROMINER was used in three experiments which analyzed top committers and subjects of two large-scale OSS projects (Apache Server and Postgresql), as well as industry developers of closed-source projects. For OSS projects, the results showed that the measured PRS scores can indeed differentiate top committers from the general population. For industry, the developers indeed have a PRS. Finally, for both scenarios, the qualitative analysis also indicated that the PRS scores obtained are aligned with the developers' profiles.
The rest of this paper is organized as follows. The next section presents the cognitive and learning styles. The Neuro-linguistic programming section introduces NLP. The Text mining basis section reports text mining definitions used throughout the article. The Linguistic Inquiry and Word Count for Neuro-linguistic section describes our approach to LIWC and to mining software development mailing lists. In The family of experiments section, we detail the experimental evaluation of our approach. Related works is the seventh section. Finally, the Conclusion and future work section closes the paper.

Cognitive and learning styles
In the scientific community focused on cognitive research, it is widely accepted that the way people choose, or tend to choose, to learn has an impact on the learning performance [16]. There is a wide range of definitions, theories, models, interpretations, and measures dealing with the learning process. Among them, two items have led to several valuable insights in many research fields: learning styles [16] and cognitive styles [17]. This article does not aim to theorize on these concepts or even create new definitions for them. It will just use them to build the necessary theoretical background for the work presented here.
The terms learning style and cognitive style have been defined in different ways by different researchers. Allport [18] described cognitive style as a common habit or a notably personal way of solving, thinking, noticing, and recalling problems. Garity [19] noticed that a cognitive style has been used to define the cognitive process of thinking, noticing, and recalling. Cognitive style is how subjects process information and prefer to learn. Badenoch [20], in his study about "personality type, preference of learning style and instructional strategies", claims that the learning style theory intends to investigate the learning process and product, in order to understand the interactions in the learning environment. In his opinion, the type of cognitive personality, however, is a classification of the theory of the learning style. Hartley [21] defines "cognitive styles" as the ways that the subjects lead their cognitive tasks and "learning styles" as the ways that the subjects lead their learning tasks.
In order to cope with these concepts, avoiding those many and sometimes confusing definitions, this paper considers that learning styles are ways that each individual uses to process information and to understand some subject, and cognitive styles are common ways in which individuals process information, transform it in internalized knowledge, and recall it when necessary. Each individual uses learning styles to understand and comprehend a subject; however, in order to process and transform the learned information in readily recoverable knowledge, or even to better assimilate and optimize the subject, cognitive styles are used. In summary, there is tenuous difference between learning and apprehending, between learning and cognition, between styles of "acquiring" knowledge and styles of "using, optimizing, and transforming" knowledge.
We emphasize that learning styles have been cited to support every context involving representation systems, just as this work is based only on cognitive styles, which are used by individuals to process information based on their perception of reality and previous experiences (see Neuro-linguistic programming section); therefore, the dimensions of learning styles are outside the scope of this work.

Representative dimensions and measurements of the cognitive styles
In the 70's, the Cognitive Psychology researchers intensified discussions about how to measure the subjects' intellectual abilities. Hunt et al. [22] proposed to use laboratory tests to investigate the construction and usage of such human abilities. Estes [23] proposed tests to measure cognitive abilities and as a mean to find ways to improve cognitive performance. Underwood [24] proposed to apply tests to detect the differences between subjects and to use this information as the basis for nomethetics, a psychology/psychiatry theory that analyzes the influence of a patient's cognition about a disease in his self-healing process.
Cognitive style measurement is value based. This so called ability-factor model initially relied heavily on measuring how much as opposed to how come [25]. However, from the cognitive style point of view, it is almost useless to measure "how many hours one spent on a task" and not measuring "how the task was done". Style measurement should focus on "how it was done" as opposed to "how much was done". For that, Lohman and Bosma [25] propose the following principles: (a) Apply tasks in which the individual differences are clearly reflected, mostly in measurements of "how it was done", i.e., of the way the tasks that each one solves are evaluated according to different strategic solutions taken; (b) Have some guidance in order to make clear the inferences about the strategies from the answers given for each task. Even when facing different ways to solve some tasks, the ways one finds to solve them will be associated with the dependent measures to each subject, such as speed of reasoning and reflex; (c) Have a Measurement Model that captures both the subject profile, his strategies for problem resolution and correlated measures [26]. In order to guarantee consistency, this measurement model needs to be validated and packaged for reuse; (d) It is necessary that the Measurement Model supports association and relationship analysis between different strategies.
When trying to measure a cognitive style, the strategies assessed may not be classified as belonging to a category of a specific style. For instance, the fact that a Software Engineer prefers to use diagrams in a situation in which he/she needs to comprehend a class, does not imply that he/she has visual preference in general. Not all strategies represent a style with the same clarity [25].
The Cognitive Style Analysis (CSA) champions, Riding and Cheema [17], describe that cognitive styles have two fundamental and independent bipolar dimensions: the wholist-analytical (WA) and the verbalizer-imagery (VI). The wholist-analytical dimension of cognitive style means the habitual way in which an individual processes and organizes information. Some individuals process and organize information into its component parts (analytics), others retain a global or overall view of information (wholists).
The verbalizer-imagery dimension describes the common ways in which the subject represents the information in memory while he/she thinks. For Riding [27], verbalizers convert the information they read, see, or listen to, into words or verbal associations. Imagers, on the other hand, convert the information they read, see, or listen into spontaneous and frequent mental pictures.
According to Riding [28], the validity of the cognitive model proposed in the Cognitive Styles Analysis approach is supported by the evidence that the WA and VI dimensions are independent from each other, separated and independent from the intelligence, but they interact with personality and are related to behaviors such as learning performance, learning preferences, subject preferences, and social behavior.
The CSA approach is the basis for the work of Fleming [29], which is one of the main influences on our work. Fleming's work proposes a questionnaire to cluster subjects according to their main sensorial abilities in creating learning styles. Fleming [29] was influenced by neurolinguistics [9], correlating words or expressions used by subjects with their preferred ways of representing information in memory.
Based on neurolinguistics, our work considers that the words or expressions used by subjects are related with their preferred ways of representing information in memory. This paper proposes a method specifically conceived to capture the cognitive preferred representational system (PRS) by software engineers. The PRS is the way a person, in specific contexts, prefers to use to communicate and learn [6][7][8].
In the next section, the relevant concepts on Neurolinguistics and PRS will be presented.

Neuro-linguistic programming
History and some concepts Neuro-linguistic programming (NLP), created in the 70's, consists of a set of techniques in which the neurological processes, behavioral patterns, and a person's language are used and organized to achieve better communication and personal development. The term NLP is broadly adopted in education, management, and training fields. However, although evidences of NLP have been published as model for comprehension and learning [30], few academic works exist on the subject.
NLP claims that people are intrinsically creative and capable, acting according to how they understand and represent the world, instead of how the world is. Literature constantly cites Korzybski's statement [31] "the map is not the territory", a reference to individual understanding that everyone has-mental model-according to his/her experience, beliefs, culture, knowledge, and values.
For Tosey and Mathison [13], NLP scientific research group members, NLP is presented as an epistemological perspective, with scientific principles which are not usually presented. The first works published by Bandler and Grinder [9,10] were based on the models of Fritz Perl, Gestalt founder, Virginia Satir, researcher in family therapy, and Milton Erickson, doctor in medicine, master in psychology, and hypnotherapist recognized worldwide. As a consequence, the epistemological view of NLP presents a roadmap to develop the necessary scientific basis to support its beliefs. The research reported in this paper explores this path by scientifically characterizing the use of preferred representational systems for cognition.
This representational system (or internal representation) is highly dependent on context (i.e., it varies with the situation) [12]. This way, some people, in specific contexts, may prefer to use one or more basic systems to communicate and learn [6][7][8]. Most authors in the area recognize the following basic systems [6-8, 10, 12]: (a) Visual, that involves internal image creation and the use of seen or observed things, including pictures, diagrams, demonstrations, displays, handouts, films, and flip charts; (b) Auditory, that involves sound reminders and information transferred through listening; and (c) Kinesthetic, that involves internal feelings of touch, emotions, and physical experience (holding and doing practical hands-on experiences).
We use all of our senses all of the time and, depending on the circumstances, we may focus on one or more of them-for instance, when listening to a favorite piece of music, we may close our eyes to more fully listen and to experience certain feelings. In order to see things more clearly, we might need to close our eyes and visualize the situation, person, or place.
So, we all use each of the senses and each of us also has a Preferred Representational System (PRS), one that we use most when we speak, learn, or communicate in any way. For example, when learning something new, some of us may prefer to see it or imagine it performed, others need to hear how to do it, others need to get a feeling for it, and yet others have to make sense of it. In general, one system is not better than another, and sometimes, it depends on the situation or task that we are learning or doing as to which one or more representational systems might be more effective than another.
Supporters of NLP believe that word predicates let us know what is the person's state of consciousness. They believe that specific, sensory-based, word predicates are chosen when a person is using a specific representational system. The predicates indicate what portion-of internal representations -they bring into awareness [10]. Such predicates may be identified and used to improve communication among the analyzed subjects, for example.
One of the major problems in communication, be it informal or technical, is the difficulty to arouse interest on the receiving end, the person who is reading or listening to your message. Many times, the person who receives the message does not assimilate what is being transmitted, be it a simple message or a technical diagram. NLP can then be one approach to improve communication. The challenge lies in identifying the representational system that is being used by the subject and match the same system for empathy construction. Empathy is an emotional response to other person through sharing other's affective state, as well as it is a cognitive capability to think in other person's perspective [32,33]. The matching consists of identifying the predicates that indicate a representational system and use them, or other predicates that belong to the same system, for communication [10].
In order to exemplify this matching process, consider the following question "have you seen the logic of the algorithms that I showed you?", and the following answer "not yet, I am going to examine them carefully, once I get a clear picture of the whole system." This is a coherent answer to the question from the sensory system matching perspective. The sensory-based words "seen" and "showed" in the first phrase indicate a visual processing, and the response used the same system through the visual sensory words "examine them" and "clear picture".
In this context, detecting the developers' representational preferences may enhance the empathy in the team communication, i.e., each member may be more stimulated in his/her Preferred Representational System, enhancing the effectiveness of communication, software comprehension, and the solution of activities of development and maintenance.
Allocating a person in a task, considering his/her technical abilities as well as his/her personality, is essential for the success of any software project. The productivity secret is to adjust the project needs with its members' personalities. Detecting, for instance, that a system analyst barely uses his/her visual representational system may help solve his/her difficulties with project diagrams or stimulate his/her reallocation to another activity. Many times, a member is lost because of wrong job allocation. A good programmer may become a not-so-good analyst. In other situations, a person's preferential cognitive system may not match his/her colleagues' profile, or the way the organization works.
Our research deals with the identification of sensory-based words used by developers in discussion lists. We then use these words to characterize the preferred representational systems of the developers and analyze these against their profile and role in the projects.

Neurolinguistic criticism
NLP experimental research basis is insufficient. The literature in academic journals is minimal, and Thompson et al.'s study [34] is a good example. There has been virtually no published investigation into how NLP is used in practice. The experimental research consists largely of laboratory-based studies from the 1980's and 1990's, which investigated two particular notions from within NLP, the 'eye movement' model, and the notion of PRS.
Heap [35], in particular, has argued that on the basis of the existent studies, these particular claims of NLP cannot be accepted. Heap conducted a meta-analysis of these and appears entirely justified in criticizing the unequivocal claims made in NLP literature. It is notable, however, that Heap's meta-analysis included many postgraduate dissertations. His bibliography refers only to sources of abstracts of those dissertation studies, not to the dissertations themselves. Thus, his meta-analysis appears based on the reported outcomes of these studies, not on critical appraisal of their methodology or validity.
Einspruch and Forman [12] and Bostic St.Clair and Grinder [36] have also argued that the types of study reviewed by Heap are characterized by problems affecting their reliability, including inaccurate understanding of NLP claims and invalid procedures due to (for example) the inadequate training of interviewers, who therefore may not have been competent at the NLP techniques being tested. Heap himself offers only an 'interim verdict' and acknowledges Einspruch and Forman's view that 'the effectiveness of NLP therapy undertaken in authentic clinical contexts of trained practitioners has not yet been properly investigated' [35].
Given these concerns, for example, Tosey and Mathison [13] suggest that the existing body of experimental research cannot support definitive conclusions about NLP. It seems clear that there is no substantive support for NLP in this body of experimental research, yet it also seems insufficient to dismiss NLP.
Our study does not test NLP techniques, but rather shows an association between NLP based-measures and developers' roles and profiles.

Text mining basis
Our work is based on text mining (TM), a technology for analysis of large collections of unstructured documents, aiming to extract patterns or interesting and non-trivial knowledge from text [37].

Preprocessing
Similar to conventional data mining, text mining consists of phases that are inherent to knowledge discovery process [38]. Classification of knowledge discovery phases may vary for different authors, but most comprise at least data selection, preprocessing, mining, and assimilation. Text mining pays special attention to preprocessing because its data is unstructured for computer analysis. In other words, after setting the base with texts to be mined, it is necessary to convert each document to a format suitable for a computational algorithm.
One may use three different ways-Boolean, probabilistic, or vector-based models-to structure the information of a text document for computational analysis. The vector model utilizes geometry in order to represent documents. Introduced by Salton, Wong, and Yang [39], this model was developed to be used in a retrieval system called SMART. According to the vector model approach, each document is represented as a term vector, and each term receives a weight that indicates its importance in the document [39].
In more formal terms, each document is then represented as a vector, which is composed of elements organized as a tuple of values: d j = {w 1j ,.. , w ij }, where d j represents a document, and w ij represents a weight associated to each indexed term of a set of t terms of the document. For each element of the term vector, a dimensional coordinate is considered. This way, the documents can be placed in a Euclidian space of n dimensions (where n is the number of terms), and the position of the document in each dimension is given by the term weight in this dimension.
In this model, the consultations are also represented by vectors. This way, the document vectors can be compared with the consultation vector, and the similarity between them can be easily computed. The most similar documents (those that show the closest vectors to the consultation vector) are relevant and returned as a response to the user. Besides, documents that show the nearest vectors can be considered similar to the target document.
A term vector is built by the following steps.

Term extraction
Researchers from the information retrieval field claim that the main difference between data and information retrieval is exactly the relevance of the information obtained [40].
In general, not all terms that compose a document are relevant when one intends to extract high-level information. So, in order to compose a term vector for a text, it is necessary to identify words with high semantic content, selecting only those that are meaningful for the objective at hand.
The task of term extraction from a document consists of various steps, all of them contributing for the final purpose of producing a vector with high semantic content [41]. They are described as follows: (d) Lexical analysis: The original document is not always represented in a purely textual format. Therefore, it is necessary to convert it to a standardized format, eliminating any attributes of presentation formatting. (e) Character conversion to uppercase or lowercase: Such procedure enables equal words written with a character in a different format in uppercase or lowercase-for example, neuro and Neuro may be interpreted as the same term. (f) The use of a word list to be ignored: commonly called stopwords. This list consists of a relation of words that have no significative semantic content (e.g., prepositions, conjunctions, articles, numerals, etc.) and consequently are not relevant for text analysis. (g) Morphological normalization: aiming to cluster terms with the same conceptual meaning, e.g., the words compute and computation. A conversion algorithm of terms to radicals may be applied in this case. In the example, the words "compute and computation" have the same radical "comput", so they can be reduced to this term. (h) Selection of simple or compound words: in some cases, during the preprocessing of a document, several joint words (phrases) may be managed as a single term. This selection can be done using predefined word lists or statistical and syntactic techniques. (i) Normalization of synonyms: Words with the same meaning can be reduced to a specific term, for example, the acronym SEL and the composition Software Engineering Lab, both have the same meaning. (j) Structural analysis: This step consists of associating information to each term regarding its positioning in the document structure, in order to distinguish it from a homonym term situated in another position.

Assigning weights
The process of associating numeric values to each term previously extracted is known as assigning weights. In general, the settlement of the term weight in a document can be resolved with two paradigms [42]: (k) The more a term appears in the document, the more relevant the term is to the document subject; (l) The more a term occurs among all documents of a collection, the less important the term is to distinguish between documents.
This calculation can be done in two ways: (m)Binary or Boolean-The values 0 and 1 are used to represent, respectively, the absence or presence of a term in the document. (n) Numeric-It is based on statistical techniques regarding the term frequency in the document.

The numeric weights can be represented by measures such as
Term frequency (tf): simple method which consists of the number of times that a term w i occurs in a document d. This method is based on the premise that the term frequency in the document provides useful information about the relevance of this term for the document. Document frequency (DF): it is the number of documents in which the term w i occurs at least once. Inverse document frequency (idf): it defines the relevance of a term in a set of documents. The bigger this index is, the more important the term is to the document in which it occurs. The formula to calculate idf is as follows: where |D| represents the total of documents, and |{d: t i ε d}| represents the number of documents where the term t i appears.
tf-idf: It combines the term frequency with its inverse frequency in the document, in order to obtain a higher index of its representativeness. The formula to calculate tf-idf weight is as follows:

Grammatical classes and noun phrases
To further strengthen the semantic meaning of the structured data, our work uses word composition. Words that have similar semantic and syntactic behaviors can be clustered in the same class, creating syntactic or grammatical categories, more commonly named parts of speech (POS). The three main ones are noun, verb, and adjective. The nouns refer to people, animals, concepts, and things. The verb is used to express action in a sentence, whereas the adjectives express noun properties. The POS detection is important, because in specific contexts, two or more words with different grammatical categories may have one unique meaning. The semantical composition of words is known as a Noun Phrase [43]. Noun phrases (NPs) cluster words in a context, and its detection can improve the search accuracy in texts. Usually, a noun is the central element (head part) which determines the syntactical character of a NP, and a verb or an adjective modifies this noun (mod part).
In order to implement NP detection, it is necessary that a dictionary specifies which words can appear together. In general, it is not necessary to store words in a compound way because this process demands time and does not enhance the system efficiency significantly. What can be done is to store information about the distance between words, and the consultation technique is responsible for evaluating whether words are adjacent or not. NEUROMINER, the tool discussed in this article, uses the vector spatial model, transforming the developer's emails into vectors, classifying the words grammatically and identifying NPs, as well as assigning weights to the extracted terms.

Motivation
We identified research that tries to pinpoint people's preferred representational systems, but those researches are only in psychology, and in domains like sports and education [29]. We also found some software engineering papers that use text mining to identify developers' general emotional content. However, these papers do not try to relate the developer's personality, or other psychological aspect, to the software engineering activities themselves [15,44]. This gap of knowledge stimulated us to use text mining to investigate the association between a psychological concept-PRS-and software development roles and activities.
Our tool, NEUROMINER, uses Linguistic Inquiry and Word Count (LIWC) to classify the Preferred Representational Systems (PRS) of developers in a given context. We could not find any tools that make automated neurolinguistic text analysis and, as discussed later, our LIWC approach can be adapted to other domains.
Finally, due to the scarcity of scientific research about NLP itself, this paper generates the opportunity to show empirical results of applying one of its principles to our, human-intensive, domain.
Neurominer NEUROMINER combines statistic and text mining techniques with sensory predicates of NLP, aiming to classify programmers' PRS.
The basic characteristics of NEUROMINER are: Use of a neurolinguistic dictionary; Use of ANOVA for PRS classification. An ANOVA is an Analysis of the Variation present in an experiment. It is a test of the hypothesis that the variation in an experiment is no greater than that due to normal variation of individuals' characteristics and error in their measurement. In this way, ANOVA was used to classify the PRS of developers, statistically analyzing the differences between the means of each Representational System (RS) for each individual; Use of an ontology to identify Software Engineering and neurolinguistic terms combined in noun phrases; Use of synonym normalization resources with dictionaries for Brazilian Portuguese [45,46] and for English [47,48].
This paper will not focus on Neurominer internal architectural, but rather in its NLP and PRS classification approach.

Building and using a NLP dictionary
According to NLP, the words a person chooses to describe a situation-when they are specific to representational system (i.e., sensory-based)-let us know what his/her consciousness is. This predicate indicates what portion of internal representations the person brings into awareness [10].
The goal of our work is to identify the most used RS and the percentage of use of the others. For this, we have adopted a LIWC approach similar to the one presented by [15]. As shown in Table 1, it uses a NLP dictionary with four basic dimensions composed of sensory-based words or phrases [10,14].
The Concept dimension was created to increase contextual classification power. A noun phrase (NP) such as 'brilliant algorithm' indicates a visual PRS cue used in the context of software engineering. The tag column of Table 1 indicates that the dimension is part of a modifier (PRS) or head (SE context) of the NP. In this very simple way, NPs formed with SE ontological concepts have a bonus multiplied to the score in our text mining approach.
The concepts were extracted from software document ontology discussed by Witte [49] and described by Wongthongtham [50], which is based on various programming domains, including programming languages, algorithms, data structures and design decisions such as design patterns and software architectures. Our goal is to verify the direct relation of sensory-based words with Software Engineering context. This way, we can find noun phrases formed with ontological concepts and sensory-based words or phrases, our first innovation.
Email mining with Neurominer Figure 1 summarizes the text mining main steps. The approach is summarized only briefly, since details about preprocessing [49], and clean messages [15,51] have already been published.
Step 1 includes steps such as stemming, part-of-speech tagging, and noun-phrase detection. For example, in the latter step cited, we use the MuNPEx approach (Multi-Lingual Noun Phrase Extractor) [52].
After downloading the email archives, the system parses each email for meta-data as discussed by Bird [51], and places its relevant information into a data mart [53]. This data mart was designed based on a software engineering data warehousing architecture proposed by us in previous papers, our first innovation [54,55].
The process only uses the text actually written by the sender and its timestamp. It removes all diffs, attachments, quoted replies, signatures, code, and HTML that is not part of a diff. We adopted a daily frequency-based cumulative approach. In step 2, the system finds and counts the senders' sensory-based words and phrases by month, considering the NLP dimensions in the dictionary. In step 3, the system uses a text mining approach for the NLP classification of individuals, instead of the traditional document classification, our second innovation. In it, the set of all emails written by a developer is treated as a 'big text' to be classified. A simple approach for that is to count all the words found in all emails of a developer and verify the percentage of each representational system. However, aiming more detailed analyses of evolution, the system considers the daily frequencies of the words.
Our alternative to the basic tf-idf formulation (see Text mining section) computes weights or scores for sensory-based words. The values are positive numbers so that it captures the presence or absence of the word in a month. Equation (3) indicates that neuro weight assigned to a word j is the term frequency tf(j) (i.e., the ratio between word count and the sum of number of occurrences of all words) modified by scale factor for the importance of the word. The scale factor, for our approach, is called daily frequency df(j), which is the ratio between the number of days containing word j and the number of loaded days. Thus, when a word appears in many days, it is considered more important and scale is increased.
In addition, a bonus b is also multiplied to the measure. The bonus can be 1 or 2, where b will be equal to 2 if term is a NP or phrase, and 1 if term is a simple word. This bonus is determined in the LIWC dictionary used to classify individuals PRS. It was also agreed weight 2 for NP and sensorial phrases in order to highlight their importance in relation to simple terms.
At the end of each month, the term weights are recalculated and a general total of weights (final weight) are stored for each representational system. Lastly, each representational system monthly mean is computed.
In the step 4, we use ANOVA (analysis of variance) to determine if the monthly means for each different RS are statistically different.

The family of experiments
The rest of this paper describes an experimental evaluation of our approach. The presented experimental processes follow the guidelines by Wohlin et al. [56]. For each experiment, first sections will focus on the experiment definition and planning. The following sections will present the obtained experimental results.

First and second experiments
This section will focus on the first and second experiments realized in two large-scale OSS projects: Apache Server and Postgresql.

Goal definition
The main goal of our study is to evaluate if OSS top committers have a PRS. This goal is formalized using the GQM goal template proposed by Basili and Weiss [57] and presented by Solingen and Berghout [58]: Analyze Project top committers with the purpose of evaluation with respect to NLP context-specific Preferred Representational Systems from the point of view of software engineering researchers in the context of development mailing lists of OSS projects.

Planning
For the context selection, the experiment will target OSS projects.
Hypothesis formulation The issues we are trying to explore are as follows: (o) We are interested in verifying if OSS top committers have a PRS.
(p) Besides that, we believe top committers are more kinesthetic than auditory and visual. Our belief is that experienced programmers of the OSS community rely heavily on their experiences, and are less dependent on visual and auditory artifacts than the general population of OSS software engineers.
Considering the arduous manual work of searching for valid emails used by top committers and, as a consequence, the small sample size due to the low number of top committers, a formal statistical test will not be performed for the second issue.
However, considering the large number of emails that will be mined, the test of the existence of a PRS top committer for each selected will have large power. We will also do a detailed qualitative analysis of the top committers' profiles in order to sanity check NEUROMINER measures.
NEUROMINER will be used to calculate the final weights for each representational system, as well as representational systems monthly means (see Email mining section).
Formally, the hypothesis we are trying to confirm is: Null hypothesis H 0 : OSS top committers have the same representational system monthly mean. Those are the same developers studied by Rigby and Hassan [15]. For Postgresql, we analyzed the body of all email messages between 1997 and 2006 (57,159 messages) and also selected the four developers who had the greatest number of commits. In both projects, two top committers still contribute to the project, and others have already left.
We also created clusters of all other developers for both projects. During data reporting, we will refer to this general population measures as the cluster. The analysis is completely non-intrusive to developers as the data was drawn directly from the project mailing lists. For each developer and cluster, once a month, we calculated the PRS using the method described in the Email mining section. At the end, we had one data point of mined emails per month for each subject. Clusters were mined for 3 years (36 months). Top committers were mined for the last 10 years, but data points were produced only for those months in which they posted at least one email at the project discussion list. NEUROMINER then tested the population distribution and calculated the analysis of variance of the monthly PRS scores for each participant (all calculation was double checked using SPSS [61]). The population distribution for each sample is normal.
Instrumentation-Neurominer OLAP module The OLAP module was developed to provide graphics generation and analytical navigation in the data that describe the developers' profile. Next, the presentation of some OLAP features will be done with results of the Apache project.
Looking at the accumulated scores in Fig. 2a (See the chart on the left), it is observed that the predominant profile for the top committer B, one of Apache top committers, is the visual. In the chart on the right (See Fig. 2b), the evolution of the profile of this developer can be seen in the period from January to December 1999.
Another important aspect is the analysis of the mentioned terms that scored higher. For this, a drill-down can be done together with a ranking, which give access to 10 terms or the 10 phrases that have achieved the highest scores. The result is shown in Fig. 3, in which it is interesting to note the presence of concepts of Software Engineering, such as Server, Compiler, and Module, combined with sensory words. Going further, it is also interesting to note the term pain as one of the predominant. As the top committer B has already left the project, this may be an evidence of dissatisfaction in his latest posts, since the term pain indicate grief or discomfort.

Experiment operation
This section describes the data validations for the performed experiments.
Data validation In addition to analysis of metadata and selection described in the Neuromine section, the experiments performed the following validations: (q) With OLAP module, for each developer, the terms and phrases that most said were checked in emails of source (until 10 emails), checking if really contextualize something said by the individual. The concern was to avoid that the term or phrase was part of an unspoken text by the sender, such as quotes and phrases of others, common in fixed subscriptions. (r) All calculations for ANOVA were revalidated using SPSS [61].

Results
Tables 7 and 8 (see Appendix 1) summarize our results. The column Totals represents the number of months (data points for each participant), days, and emails. For each representational system, the final weight is shown for the set of all sensory-based words found and the monthly average of this weight. The column ANOVA p value reports p values for the null hypothesis.
Aiming to facilitate the visualization of the results, we create Tables 2 and 3 based on  Tables 7 and 8 in Appendix 1, containing only primary information (participants, project output signal, monthly mean for each PRS, and ANOVA p value) to analyze the results.

Analysis and interpretation
For the statistical testing, we established an apriority significance level (α) of 0.05. Tables 2 and 3 show that our first hypothesis is accepted as we obtained the p value of 0.000 for all means but one, developer G. The results for the clusters and developers A-F and H are significantly lower than 0.05, strongly rejecting the null hypotheses.  We observed that Developers B, D, E, F, and H did not have a higher value for the Kinesthetic RS. This contradicts our initial hypothesis that top committers are more Kinesthetic than Visual and Auditory. Moreover, this is also the PRS of the general population, i.e., the PRS of developers clusters for both OSS projects is Kinesthetic (see Cluster row in Tables 2 and 3).
With respect to the second issue defined in the Hypothesis formulation subsection, we found out that there are four visual, two kinesthetic, and one auditory top committer. Looking at their profiles, we realized that most of them are quite concerned with following procedures and documenting information, contradicting our initial stereotype of a hardcore OSS developer.
Moreover, the other developers being kinesthetic on average, leads us to believe that most people that post in the list are indeed involved with practical activities in the project. In this case, this contradicts with our initial belief that many posts were by newbies or people that were simply curious-wanted to hear-about the project.
Even where there is dominance of the Kinesthetic RS, the results show that OSS developers also have significant visual and auditory RS. This may indicate an opportunity to introduce better visualization tools and better support for cooperative work, increasing direct developer interaction, in OSS development.
According to the top committers' profiles, included in the websites of the projects Apache [62] and Postgresql [63], we found out that Developer B had a strong involvement with the project architecture and the work to hybridize Apache. This seems to support his/her Visual PRS (see Table 2 and Fig. 4).
Developer D-the most singular subject among the top committers-has an Auditory PRS and also a strong Visual RS. His/her profile indicates that he/she contributes heavily with the project documentation and his/her predominant working language is XML. This possibly matches the mined profile, as one would expect strong listening and reading capabilities from people involved in OSS documentation.
These insights are quite aligned with the results presented by Rigby and Hassan [15]. This paper reports that the measures collected for Developer D were the least associated with the other subjects in the study. Our study, however, went further and indicated a classification that directly matched the subject profile and project role.
Regarding the Postgresql top committers, the first thing that catches the eyes (see Fig. 5) is that three of them are highly visual. Moreover, the visual PRS is high even for Developer G and the project cluster itself. Top committers E, F, and G are highly involved with both documentation and implementation. Top committer G, the only one who is not classified in any category, p value 0.085, also works on performance testing and tuning, which may be related to his/her relatively high kinesthetic score. He/She also works with user groups and on providing general direction for the project advocacy, which may be related to his/her relatively high auditory score. Top committer H, by far the most active top committer of them all, is visual but also has a high auditory score, even higher than his/her kinesthetic score. His/her scores may be explained by the fact that he/she is not only highly involved with development, but also does training and maintains the project FAQ and TODO list.

Threats to validity
In spite of the fact that Apache and Postgresql are mature, real world, and large projects, and our results seem to be quite consistent with the obtained top committer profiles, the PRS measures still need further investigation to assure external validity. The next section will focus on the industrial setting. The completely different setup and higher control over the study environment will help to increase the generalization power of the results.
We obtained the top committer profiles through the project sites. Better analysis would be possible with more extensive information. Gathering more profiling data would help us improve our analysis. Aiming at this, we developed a questionnaire to characterize and assess the PRS of software engineers [64]. Moreover, it was performed; a survey with 209 software engineers, validating the effectiveness of the questionnaire and revealing a great diversity of PRSs in the population studied. This population served to calculate the IRT (Item Response Theory) scores of industry programmers, who answered the questionnaire in the third experiment (see next section). This questionnaire and the survey application results that refined it were published in [64].
We contacted the top committers by email and asked them to fill it out. Unfortunately, they could not find the time to fill it out.

Third experiment
Aiming to increase the generalization power of the results, and due to the participants of previous experiments not responding to our questionnaire (see Threats to Validity in section 6.1), we conducted a third experiment in the industry (closed-source projects), which was realized with the same circumstances of the previous.

Goal definition
The main goal of this new study is to evaluate if industry programmers have a PRS. This goal is formalized using the GQM goal template proposed by Basili and Weiss [57] and presented by Solingen and Berghout [58]: Analyze programmers of closed-source projects with the purpose of evaluation with respect to NLP context-specific Preferred Representational Systems from the point of view of software engineering researchers in the context of development mailing lists of projects.

Planning
The experiment targets developers of closed-source projects. As part of the experiment, it is necessary to replicate, for industry programmers, the experiment with OSS programmers.
Hypothesis formulation The issue we are trying to explore is: We are interested in verifying whether industry developers will have a PRS.
Formally, we will confirm the same hypothesis of the experiment performed with developers OSS: Null hypothesis H 0 : Industry programmers have the same monthly average of the scores for the three representational systems.
H 0 PRS : μ(Visual final weight) = μ(Auditory final weight) = μ(Kinesthetic final weight) Alternative hypothesis H 1 : At least one of the means is different from the others.
Participant and artifact selection The choice of developers was for convenience. The authors of this article managed to release five programmers from a company of which they are consultants. The company produces and distributes beverages and soft drinks for two provinces in Brazil.
For legal reasons, we will not use the names of the participants in this study. Letters are used to identify each developer. Table 4 lists these programmers along with two measures of experience in software maintenance.
The execution was non-intrusive because the data were taken directly from the mailing lists of programmers; not even the developer knew that the emails would be posted and analyzed one day.
For each programmer, NEUROMINER was performed to examine the body of the email posted by them on the mailing list of a software project developed by the company. This sample involves 4604 messages posted by five programmers between 2008 and 2010. Finally, each developer was also individually interviewed about his/her job profile in the company. Table 9 (see Appendix 1) summarizes the results obtained by NEUROMINER on the assumption formalized. The Total column shows the number of months (data points for each participant), of days, and of emails exploited. For each representational system, the final weight is displayed based on all predicates sensory found, as well as monthly averages of these weights. The column ANOVA depicts the p values calculated for the test of the null hypothesis.

Results
Aiming to facilitate the visualization of the results, we create Table 5 based on Table  9 in Appendix 1, containing only primary information (participants, monthly mean for each PRS, and ANOVA p value) to analyze the results.
For statistical tests, we established a significance level (α) of 0.05. Table 5 shows that the null hypothesis is rejected for 4 developers, who obtained a p value of 0.000. The only developer that was not rated was I, with p value of 0.105, higher than 0.05. In summary, the results for developers J, L, M, and N are significantly smaller than 0.05, allowing strong rejection of the null hypothesis and confirmation that they have a PRS (highlighted in bold).

Analysis and interpretation
Regarding profiles, as shown in the experiment performed in the OSS, there was an encouraging result.
In this case, we had a bonus, the questionnaire answered by participants (it discussed in the Threats to validity subsection under the First and second experiments section). This questionnaire follows an approach similar to the VARK model, a questionnaire developed and used at Lincoln University to identify the preferences of students for particular modes of information representation [29]. However, our questionnaire is strongly contextualized to Software Engineering.
Once the analysis of emails mining with NEUROMINER has been done, the next step was to analyze the developers according to their responses to the Neurolinguistic questionnaire. Table 6 shows a comparative of the generated classifications by NEUROMINER vs. the generated classifications by questionnaire. Whereas the classifications include the order of preference, as well as are made by completely different approaches, the results show good consistency between both approaches.
Addition to the use of NEUROMINER and questionnaire, we interviewed all programmers.
All 5 programmers work directly and deeply with the code; however, in his interview, the programmer L, classified as visual, was the only one who said he routinely used Visual diagrams and Entity relationship [65], creating, reviewing, and optimizing system models. Programmer M reported no constant activity documentation or use of diagrams. Thus, according to his VAK classification, we recommended the partner company to test his potential in tasks that require the use of visual artifacts, as well as surveys and interviews with validation requirements. His second preference, auditive, may indicate affinity in listening and better capturing user needs.

Threats to validity
The difficulty in getting cooperation from industry in the release of its programmers resulted in a small sample size. It is necessary to replicate this experiment with the largest possible number of programmers.

Related works
Regarding NLP, there are some scientific articles showing evidences of its assertions. In addition, there are several publications about preferences for some specific representational systems in the cognitive and learning processes, even in computing [66]. The basis for models and techniques presented by NLP can be found in psychological studies that involve the so-called "chameleon effect", which concerns nonmatching and matching stimuli to the empathy increase in communication. Van Baaren et al. [67] did an experiment at a restaurant in the south of Netherlands in which half of the studied waitresses used the "chameleon effect" to serve customers. Results showed that the average value of the tips almost doubled for the waitresses who used matching language and behavior. Bailenson and Yee [68] analyzed subjects who interacted with artificial intelligence-based software-an agent which simulates a subject giving an explanation. The agent that imitated subject's movements was more convincing, receiving more positive evaluations. It was the first virtual reality study that showed the effects of a nonverbal automatic imitator in order to gain empathy.
Turan and Stemberger [14] tested the NLP hypothesis about matching processes which enhance empathy in communication. The relation between matching and empathy increase were significant. Education was also related to the empathy increase; however, even when it was controlled, the relation between matching and empathy remained significant.
Paolo et al.
[66], presupposing some students' preferences for the kinesthetic processing in certain contexts, developed and tested a set of kinesthetic activities for a distributed systems course, with graduation and post-graduation students. The article presents detailed descriptions of the exercises and discusses the factors that contributed to their success and failure.
Fleming presented a questionnaire developed and used at Lincoln University to identify the preferences of students for particular modes of information representation [29]. Named the VARK model, the questionnaire is now the basis of a commercial service for educational planning (http://www.vark-learn.com/english/page.asp?p= questionnaire). The acronym originates from questionnaire classification of the learning styles: "V" is for visual learners, "A" is for auditory learners, "R" is for reader/writer learners (people that best learn through seeing printed words), and "K" is for tactile/ kinesthetic learners.
The VARK classification differs from the NLP classic classification because it includes the readers-writers category on top of the usual visual, kinesthetic, and aural categories. According to Fleming, results show that students with preferences for R and V information use their eyes to "take in the world" but they have preferences within that sensory mode; some like text, and others like diagrammatic or iconic material-information that is symbolically displayed [29].
Another point raised by the VARK data is that the same subject may have different profiles in different areas (martial arts, music, languages, etc.) for different time periods, i.e., a subject may be Visual (V) to learn martial arts for a period of time and become Kinesthetic (K) after that.
These evidences support some NLP techniques and establish an empirical basis for further studies.
Considering text mining in Software Engineering, independent from the database, linguistic analyses have been used to comprehend the development of OSS softwares. Witte at al [49]. considered the semantic importance of the documents written in natural language in the process of maintenance and reengineering. The result of the research consisted of creating a text mining system capable of filling software ontology with information extracted from these documents.
Other works have already considered email-specific analysis to study OSS development process [51,69]. Pattison et al. [70] studied the relation between the several software entities mentioned in emails and the number of times these entities are included in the changes made.
Three works are closest to the research presented here. In the first, Scialdone et al. [44] used emails to evaluate the social presence in maintenance groups of OSS projects. Social presence theory classifies different communication media along a one-dimensional continuum of social presence, where the degree of social presence is equated to the degree of awareness of the other person in a communication interaction. According to the social presence theory, communication is effective if the communication medium has the appropriate social presence required for the level of interpersonal involvement required for a task. On a continuum of social presence, the face-to-face medium is considered to have the most social presence, whereas written, text-based communication, the least. It is assumed in the social presence theory that in any interaction involving two parties, both parties are concerned both with acting out certain roles and with developing or maintaining some sort of personal relationship [71,72].
Core and peripheral members were compared, and the results showed that respect behavior to another one's autonomy may contribute to the survival of the group and continuity of the project. The work does not raise alternatives to social presence or solutions to increase empathy. It is based solely on psychological and social measures. It establishes no relation between these aspects and software engineering roles and profiles.
The second work is Rigby and Hassan's study [15], which analyzed the content of Apache discussion list to find the developer's personality and general emotional content. Like ours, this work uses a LIWC tool (Linguistic Inquiry and Word Count) [73] to help ratings. However, the work uses a general-purpose psychological analysis tool. It was neither developed to explore emails nor to preprocess text mining and score terms.
Lastly, in the context of Collaborative Systems, Santos et al. [74,75] developed a collaborative messenger library (NeuroMessenger) that uses neurolinguistics, psychometry, and text mining to promote empathy among interlocutors, from the PRS identification and suggestion of textual matching. After the experimental evaluation, the higher performance with the use of NeuroMessenger, in favor of empathy, was noticeable. In addition, the use of the same pattern of text between interlocutors, in Collaborative Systems, increased the empathy between them.
Previously, we presented initial results for the use of neurolinguistic ratings by mining development discussion lists [76][77][78][79]. This works motivated and guided the need for extended studies and details about innovations and technologies involved, which are now presented in this article.

Conclusion and future work
We presented a text Neurolinguistic mining tool that is capable of extracting sensorybased words from software mailing lists. The system is novel in four important aspects: (1) it automates parts of NLP practices; (2) it combines a SE taxonomy with sensorybased words; (3) it adapts traditional text mining process to NLP practices; and (4) it uses specific Text Mining Data Mart in a software engineering data warehouse. The approach itself is novel in its use of NLP concepts in the software engineering area.
This work is part of a family of experiments to detect and validate PRS. Previous studies showed that developers have different PRS using this method and other approaches [76,78,79]. In this paper, for example, we combine two approaches: survey and text mining. These works are the first steps on a promising road toward understanding latent traits of software engineers through the use of psychometrics techniques.
The results are encouraging. For OSS setting, in spite of being contrary to our expectation, the PRS scores clearly differentiate the top committers from the general population (i.e., clusters) of the projects, according to the monthly means in bold (PRS score), listed in Tables 2 and 3. In the industry environment, the developers also have a PRS. Moreover, the scores are aligned with the developers' profiles, indicating that they indeed can be used to profile people to software engineering tasks and, possibly, better communication. It is worth noting that the classifications presented in this work are not fixed, i.e., they initially represent only the greater use of one or other system within the context analyzed.
Thus, in specific contexts, a particular sensory system may take dominance (for example, (a) being primarily aware of external kinesthetic representations-bodily movements and sensations-while training and (b) concentrating preferentially on auditory comparisons while analyzing client requirements); representational system preferences thus tend to be a contextual artifact in that when an individual considers specific contexts, his/her language can reflect how he/she processes the information relating to the process of considering that context. In certain cases, a person may find himself/herself with certain rigid representations and strategies which preclude behavioral choice. In such a case, one representational system may predominate and be important for enhancing empathy.
Our future work will address three key issues: (1) examine the empathy of exchanged messages to assess communication success over PRS alignment; (2) better profile PRS scores with usage of software engineering artifacts and the roles that a person plays in a project; and (3) devise new ways to measure PRS. Thus, these next steps will make the results more accurate, conclusive, or ever.    Table 9 Results for the industry developers