Authors
Jaap Kamps, Maarten Marx, Maarten de Rijke, and Börkur Sigurbjörnsson.
book
Proceedings of the INEX 2005 Workshop on Element Retrieval Methodology. 2005. [Pdf]
abstract
Document-centric XML is a mixture of text and structure. With the increased availability of document-centric XML content comes a need for query facilities in which both structural constraints and constraints on the content of the documents can be expressed. This has generated considerable interest in both the IR and DB communities, and has lead to the launch of evaluation efforts tailored for XML documents. One of the driving and long-standing research questions here is: How does the increased expressiveness of languages for querying XML documents help users to better, and more effectively, express their information needs? And closely related to this: How should we evaluate systems that enable users to express their information needs using both content
and structural constraints?
In this paper we address these research questions. Our analysis follows two lines: What requirements can in principle be expressed in query languages for document-centric XML documents? And: How do users actually use such languages? For the former, we provide mathematical characterizations of two query languages, one for users with next to no knowledge of the document structure (ignorant users), and one for users that have some, but not complete, knowledge of the document structure (semi-ignorant users). To address the latter issue, we examine the topics formulated in the second query language as part of the 2004 edition of the INEX XML retrieval initiative. Our main findings are as follows: First, while structure is used in varying degrees of complexity, over half of the queries can be expressed in the very restrictive ignorant user language. Second, structure is used as a search hint, and not a search requirement, when judged against the underlying information need. Third, the use of structure in queries functions as a precision device. Fourth, the underlying retrieval task of content-and-structure querying is no different from the ordinary natural language query retrieval task. From those findings we derive a number of recommendations for the evaluation of systems that cater for content-and-structure queries.
bibtex
@inproceedings{kamps05understanding,
author = {Jaap Kamps and Maarten Marx and Maarten de Rijke and Börkur Sigurbjörnsson},
title = {Understanding Content-and-Structure},
year = {2005},
booktitle = {Proceedings of the INEX 2005 Workshop on Element Retrieval Methodology}
}