    黄静 陆嘉恒 孟小峰
    (中国人民大学信息学院 北京 100872) (huangjingruc@ruc.edu.cn)
    Efficient XML Keyword Query Refinement with Meaningful Results Generation
    Huang Jing, Lu Jiaheng, and Meng Xiaofeng
    (School of Information, Renmin University of China, Beijing 100872) Abstract Keyword search method provides users with a friendly way to query XML data, but a user's keyword query may often be an imperfect description of their intention. Even when the information need is well described, a search engine may not be able to return the results matching the query as stated. The task of refining the user's original query is first defined to achieve better result quality as the problem of keyword query refinement in XML keyword search, and guidelines are designed to decide whether query refinement is necessary. Four refinement operations are defined, namely term deletion, merging, split and substitution. Since there may be more then one query refinement candidates, proposes the definition of refinement cost, which is used as a measure of semantic distance between the original query and refined query, and also a dynamic programming solution to compute refinement cost. In order to achieve the goal of finding the best refined queries and generate their associated results within a one-time node list scan, a stack-based algorithm is proposed, followed by a generalized partition-based optimization, which improves the efficiency a lot. Finally, extensive experiments have been done to show efficiency and effectiveness of the query refinement approach. Keywords XML; Keyword Search; Query Refinement;Query Rewriting; Query Suggestion; SLCA 摘要 用户使用关键字查询时,可能不能准确的表达他们的意图,即使用户正确的表达了查询意图,查询引擎也可能不能准确地返
    回查询结果.针对这一问题,重点研究了在 XML 关键字查询中如何进行有效的查询改写并生成有意义的结果.提出四种查询改写操 作和查询改写代价的概念,给出了动态规划的方法计算查询改写代价.为了找出最优的查询改写,给出了基于栈的查询改写和结果 生成算法,并提出了基于划分的优化算法.最后通过丰富的实验对提出的方法进行了验证. 关键词 XML; 关键字查询; 查询改写; 查询重写;查询推荐;SLCA 中图法分类号 TP391
    0 引言
    关键字查询为用户提供了友好便捷的查询方式, 如何使用关键字查询从XML数据中获取所需信息已 经成为学术界近期研究的一个热点问题[1-5].这些工作 主要研究如何过滤无关的查询结果来提高查准率.本 文关注的是另一个方面:当查询没有结果返回或是返 回太少结果时,如何通过改写原始查询,使得新的查询 获得好的查全率.这种情况是普遍存在于关键字查询 中的,由于用户可能不能准确表达查询意图,输入的查 询可能存在拼写错误或不相关的词,这样使得某些关 键字在文档中找不到匹配的结点,导致没有结果返回.


