Skip to main content

Table 1 Partial example of poor data quality faults

From: Testing data-centric services using poor quality data: from relational to NoSQL document databases

Data type

Issue description

Example

String

Replace by null

null

Replace by empty

“”

Replace a word by a misspelled word (dictionary-based) or, if no match, use a random single edit operation (insertion, deletion, substitution of a single character, or transposition of two adjacent characters) over a randomly selected word

John Locke âž” Jon Locke

Replace with an imprecise value. Chooses a single random word and replaces it by the respective acronym or abbreviation (dictionary-based)

Doctor John âž” Dr. John

Replace by homonym (dictionary-based, randomly selects a word from the original string)

allowed âž” aloud

Replace by synonym (dictionary-based, randomly selects a word from the original string)

happy âž” cheerful

Add whitespace in a leading or trailing position, or between words (random choice)

John Locke âž” John Locke

Remove whitespace in a leading or trailing position, or between words (random choice)

John Locke âž” JohnLocke

Add extraneous data in leading, trailing, or random position (random choice)

John Locke âž” John Locke.

Add substring (randomly selected and inserted in the beginning, random middle, or end of the original string)

John Locke âž” Johnn Locke

Remove substring (initial position and length are randomly chosen)

John Locke âž” John Lock

…

…

Integer

Set to zero

1904 âž” 0

Add one

1904 âž” 1905

Subtract one

1904 âž” 1903

Add one random numeric character

1904 âž” 19004

Remove one random numeric character

1904 âž” 190

Flip sign

1904 âž” -1904

…

…

…

…

…