Authors
Börkur Sigurbjörnsson.
book
Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, (SIGIR 2005). 2005. (doctoral consortium) [ACM DL]
abstract
When a user is confronted with a list of relevant documents, in response to a query, her search task is not over. She still has to explore the documents in the list in order to get to the relevant information. When documents are long, this can turn out be a tedious task for the user. It is thus desirable if the retrieval system can give the user a more focused access to the relevant documents by giving a more direct access to the relevant parts within the relevant documents. This task is twofold. First of all, the system needs to identify the relevant subparts of the documents. Second, the system needs to display these sub-document results to the user.
I will address both tasks in the context of retrieving information from semi-structured (XML) documents. XML documents have the advantage that they are divided into a presumably meaningful hierarchy of retrievable elements. However, the element retrieval is difficult to evaluate since the elements overlap each other. I will evaluate my element retrieval methods using the INEX and HARD test collections, where relevance is assessed at the sub-document level. Both collections have many open questions regarding the evaluation methodology. By looking at the two collections in parallel I hope provide better understanding of the nature of the search tasks being evaluated. For the result representation task I will evaluate how the element retrieval can be used to improve the result representation. The evaluation will be performed through user studies.