Best-Match Querying for Document-Centric XML

Jaap Kamps, Maarten Marx, Maarten de Rijke, and Börkur Sigurbjörnsson.

Proceedings of the Seventh International Workshop on the Web and Databases (WebDB 2004). Pages: 55-60. 2004. [acm]

On the Web, there is a pervasive use of XML to give lightweight semantics to textual collections. Such document-centric XML collections require a query language that can gracefully handle structural constraints as well as constraints on the free text of the documents. Our main contributions are three-fold. First, we outline two fragments of XPath tailored to users that have varying degrees of understanding of the XML structure used, and give both syntactic and semantic characterizations of these fragments. Second, we extend XPath with an about function having a best-match semantics based on the relevance of the document component for the expressed information need. Third, we evaluate the resulting query language using the INEX 2003 test suite, and show that best-match approaches outperform exact-match approaches for evaluating content-and-structure queries.

@inproceedings{10.1145/1017074.1017089,
author = {Kamps, Jaap and Marx, Maarten and de Rijke, Maarten and Sigurbj\"{o}rnsson, B\"{o}rkur},
title = {Best-match querying from document-centric XML},
year = {2004},
isbn = {9781450377881},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/1017074.1017089},
doi = {10.1145/1017074.1017089},
booktitle = {Proceedings of the 7th International Workshop on the Web and Databases: Colocated with ACM SIGMOD/PODS 2004},
pages = {55–60},
numpages = {6},
keywords = {XML retrieval, XPath, full-text XML querying},
location = {Paris, France},
series = {WebDB '04}
}