Sunday 27 March 2011

Online German Language Corpus, UCREL Summer School

At the University of Leipzig a German Language corpus is available (Projekt Deutscher Wortschatz). The database can be queried from different programming languages and access is also possible via a web service. Requests can ask for co-occurrences of words, base forms, about words that often occur to the right and to the left of the word, word frequency, synonyms and much more. If you develop text input systems this may be a very useful resource, see the web services overview page (with links to downloads), the list of web-service-requests offered or have a look at some php-examples.

You can try the service interactively at http://wortschatz.uni-leipzig.de/abfrage/. See the pictures for an example query on the term Internet. They also feature a German-English dictionary.

Since I shared and office at Lancaster University with Paul Rayson from UCREL (University Centre for Computer Corpus Research on Language) I find corpus linguistics an interesting topic. By the way UCREL runs a Summer School in Corpus Linguistics from 13 to 15 July 2011 - would love to go there...