Monolingual Document Retrieval: English versus other European Languages

Jaap Kamps, Christof Monz, Maarten de Rijke, and Börkur Sigurbjörnsson.

Proceedings of the Fourth Dutch Belgian Information Retrieval Workshop (DIR-2003). Pages: 35-39. 2003. [pdf]

The vast majority of research in information retrieval is done using English collections and topics. This raises questions about the effectiveness of retrieval strategies for other languages. To examine this issue, we focus on document retrieval in nine European languages. In particular, we investigate the effectiveness of language-dependent approaches to document retrieval, such as stemming and decompounding; of language-independent approaches, such as character n-gramming; and of the combination of the two types of approaches. The experimental evidence is obtained using the 2003 test-suite of the cross-language evaluation forum (CLEF).

@inproceedings{kamps03monolingual,
author = {Kamps, Jaap and Monz, Christof and de Rijke, Maarten and Sigurbj\"{o}rnsson, B\"{o}rkur},
title = {Monolingual Document Retrieval: English versus other European Languages},
year = {2003},
booktitle = {Proceedings of the Fourth Dutch Belgian Information Retrieval Workshop (DIR-2003)},
pages = {35–39},
numpages = {6},
location = {Leuwen, Belgium},
series = {DIR-03}
}