Skip to main content

Table 1 Partial example of poor data quality faults

From: Testing data-centric services using poor quality data: from relational to NoSQL document databases

Data type Issue description Example
String Replace by null null
Replace by empty “”
Replace a word by a misspelled word (dictionary-based) or, if no match, use a random single edit operation (insertion, deletion, substitution of a single character, or transposition of two adjacent characters) over a randomly selected word John Locke ➔ Jon Locke
Replace with an imprecise value. Chooses a single random word and replaces it by the respective acronym or abbreviation (dictionary-based) Doctor John ➔ Dr. John
Replace by homonym (dictionary-based, randomly selects a word from the original string) allowed ➔ aloud
Replace by synonym (dictionary-based, randomly selects a word from the original string) happy ➔ cheerful
Add whitespace in a leading or trailing position, or between words (random choice) John Locke ➔ John Locke
Remove whitespace in a leading or trailing position, or between words (random choice) John Locke ➔ JohnLocke
Add extraneous data in leading, trailing, or random position (random choice) John Locke ➔ John Locke.
Add substring (randomly selected and inserted in the beginning, random middle, or end of the original string) John Locke ➔ Johnn Locke
Remove substring (initial position and length are randomly chosen) John Locke ➔ John Lock
Integer Set to zero 1904 ➔ 0
Add one 1904 ➔ 1905
Subtract one 1904 ➔ 1903
Add one random numeric character 1904 ➔ 19004
Remove one random numeric character 1904 ➔ 190
Flip sign 1904 ➔ -1904