What is Soundexing?

p4DefGene Art Soundex

An old legal principle states that if a reasonable person would use the same pronunciation for names that are spelled differently, the names are the same. Robert C. Russell of Pittsburgh, Pennsylvania, realized that it should be possible to apply this principle to indexing—in other words, to index names by their sounds rather than their spelling. Russell was issued patent number 1,261,167 on April 2, 1918 for inventing “certain new and useful Improvements in Indexes” that came to be known as soundexing.

“American Soundex”

The so-called “American” Soundex system is an improvement on Russell’s invention, and was used by the National Archives and Record Administration to index the 1880, 1890, 1900, 1910, and 1920 U.S. Censuses. The Soundex code consists of the first letter of the name followed by three digits from the following list.

    Soundex Codes
  • 1 – b f p v
  • 2 – c g j k s x z
  • 3 – d t
  • 4 – l
  • 5 – m n
  • 6 – r

There are three simple rules for creating the code:

  1. Double letters are coded as one letter:
         Williams = W452
  2. Letters of the same code not separated by other letters are coded as one letter:
         Schmidt = S530
  3. Zeroes are added to the end of the code to make up three digits:
         Lee = L000

Daitch-Mokotoff Soundex

Although the Soundex is useful, many names that sound the same are not coded the same—Carr is C600 but Kerr is K600, for example. Additionally, the Soundex code only adds three significant letters to the first letter of the name, so that long names may be coded the same as short ones (Peters and Peterson, for example). The Daitch—Mokotoff Soundex system resolves these problems.

The Daitch—Mokotoff Soundex system is quite a bit more complex than the “American” Soundex system. First, it is six digits long, providing more granularity. It is based on letter clusters rather than individual letters, and recognizes multiple phonetic possibilities for those clusters when appropriate. Each cluster consists of one or more letters, and is assigned three values in the range 0–9: one value for when the cluster begins the name; one value for when the cluster is followed by A, E, I, J, O, U, or Y; and one value for all other cases except A, E, H, I, J, O, U, and Y, which have no “all other cases” value. Finally, a name may have more than one Daitch—Mokotoff Soundex code. The complete rules are available in "Soundexing and Genealogy" by Gary Mokotoff.


Contact Information

Phone (801) 571-6122
Email info@lineages.com
Mailing Address PO BOX 1584, Draper, Utah 84020-1584

Click to verify BBB accreditation and to see a BBB report.