Processing Content-and-Structure Queries for XML Retrieval

Börkur Sigurbjörnsson, Jaap Kamps, and Maarten de Rijke.

Proceedings of the First Twente Data Management Workshop (TDM’04). Pages: 32-38. 2004. [pdf]

Document-centric XML collections contain text-rich documents, marked up with XML tags. The tags add lightweight semantics to the text. Querying such collections calls for a hybrid query language: the text-rich nature of the documents suggest a content-oriented (IR) approach, while the mark-up allows users to add structural constraints to their IR queries. We propose an approach to such hybrid content-and-structure queries that decomposes a query into multiple content-only queries whose results are then combined in ways determined by the structural constraints of the original query. We report on ongoing work and present preliminary evaluation results, based on the INEX 2003 test set.

@inproceedings{sigurbjornsson04processing,
author = {Börkur Sigurbjörnsson and Jaap Kamps and Maarten de Rijke},
title = {Processing Content-and-Structure Queries for XML Retrieval},
year = {2004},
booktitle = {Proceedings of the First Twente Data Management Workshop (TDM'04)},
pages = {32--38},
location = {Twente, The Netherlands}
}