This site is kind of outdated as there is the wonderful Hunspell now (2005-10), which does almost everything (and more) which was needed for complex compound word support. Thanks Laci.

What needs to be done to get a good German spell checker

Existing spellcheck engines

There are more or less 3 major spell check engines available OpenSource:

Ispell

Ispell is the oldest one, written in C, having all basic features for an interactive spell checker and a pipe mode for scripts to use it. It has affix compression, making the maintenance of dictionaries of languages which have many different suffixes and prefixes for a single word stem a lot easier, for example I just need to add the word

bunt/A
to the dictionary to get the words bunt, bunte, bunter, buntes and bunten. With Ispell I can also "decompress" the so compressed wordlist, which is very helpfull to get a full list of words.

Aspell

Aspell is a newer spellcheck engine, written in C++. It is a great spell checker - great at least for English language and some others. It lacks however affix compression which makes dictionaries for some languages quite big. A great advantage of Aspell is it's phonetic comparison, which makes the suggestions of the spellchecker a bit more intellegent. If you write the German word "Philosophie" like "Fillosofie" (sorry this is an extreme example) Aspell would be able to bring you a good suggestion while spellcheckers without phonetic comparison would never bring a good suggestion, because the word is too heavily misspelled.

Myspell

Myspell is "just" a library and cannot be used interactively. It has affix compression in a similar way as Ispell has and has some hooks to define some 2-character-to-1-character mappings, which allow a kind of phonetic feature as Aspell has but not as full featured as Aspell's phonetic features.

Why some people think the German dictionary has too few words

The German language is one of the more complex ones (at least compared to the English language). In German, words which belong together by the meaning, for example "spell checker" ("Rechtschreibprüfung") are always written composed, without blank. The words not known by the German dictionary are in 9 of 10 cases compound words. It's a sysiphus work to add all needed compoound words, the dictionary would grow to sizes like a few GiB and even at that size the number of known compound words would not be satisfying.

Ispell has an option to allow compound words but Ispell doesn't do enough to fulfill our needs. Not every word may be used for composition, just certain words. That means we need to introduce a flag (like the affix flags) to mark words be compoundable. Some words may be compoundable when they are the first words in the composed word, some words may just be used for composition when they are used at the end.

How are words kitted together? - Linguistics from a technical point of view

Most words may be used in compound words just with some modifications at the end of the first (I should better say non-last word, as compound words may be composed of 2 or more words) word. Some need a "Binde-s", which need to be inserted inbetween like in "Verband" and "Päckchen", which must become "Verbandspäckchen. Some words may not need any modification, here I can use the same example again, as you can also say "Verbandpäckchen". There are other words where you cannot choose between the two spellings. Another case is that the first word is put into the plural by adding "n" like when you make "Sonnenbrand" from the words "Sonne" and "Brand". The last example I can'think of at the moment is to compose by leaving out a letter like in "Strafe" and "Lager", which becomes "Straflager".

By the way, one word is always modified in the same way, the modification does not depend of the following word of the composition.

The last word of the compound word should by the way be affix decompressed (spoken in linguistical terms: being declined or conjugated) as the single word would be affix decompressed in a non-compound word.

So what is missing now?

None of the mentioned spellcheckers gives me the possibility to add flags which then make the spell check engine know how to compound which words. This is urgently needed to get more satisfying spell check results and make a larger number of compound words known.

If you have some time and are interested in enhancing one of the existing spell checkers, please contact me, I will help as much as I can, and it's probably a good idea to contact the author of the spell check engine, too, to get some information and the status of compound words support.

What I wrote on this page is just a kind of brain storming, comments, additions and ideas are very welcome.


At the moment I'm about to discuss this topic with the Ispell and the Myspell authors and it looks like we will make at least some progress someday sooner or later.

back to igerman98 homepage


b j o e r n [at] j 3 e . d e