– II –
中文摘要
多的研究工作利用搜索引擎来获取Web信息供研究所用,本文定义了“基于搜索 引擎的研究方法”,并选取WWW、SIGIR等七大学术会议在2001年至2007年的 所有学术论文作为研究对象,提出“所使用搜索引擎”、“搜索引擎访问方式”等 八种不同的维度,对146篇相关研究进行分类比较,并提出指导性的意见。 关键词:Web实体实例,Web实体属性类型集,搜索引擎,Web实体踪迹
– III –
英文摘要
Research on the extraction of Web entities and discovery of entity activities
Conglei Yao(Computer Science) Directed by Professor Xiaoming Li
For a given entity type, it is challenging and signi??cant to get all its instances in a web scale with the corresponding entity attributes extracted. It is also very important to retrieve the entity activities and organize them in a time sequential order to form the so-called entity tracks. This dissertation is aimed at exploring the related models and algorithms for these two issues to build ef??cient and effective entity-based web information systems. This thesis carries a comprehensive study of the extraction of the web entities, their relations, and their tracks. The research is based on the WebDigest project, which is started by the Lab of Computer Networks of Peking University, and aims to supply advanced entity-related service by extracting desired information from billions of pages. This dissertation focuses on two main problems: one is the effective and ef??cient extraction of web entity instances for a given entity type, under the assumption that the desired instances have already contained the determining attributes each; the other is the extraction of activities for a given web entity, and the organization of the activities into some proper form. Moreover, concern that all the models and methods in this thesis are based on search engines, this dissertation also focuses on the survey and analysis of the related studies based on search engines. In summary, the main contributions of this dissertation lie in the ??ve aspects as follows: (1) A novel framework for the extraction of web entity instances from large-scale web pages A main shortcoming in current studies is that the entity attribute types are mostly manually speci??ed and their real importance is actually unknown. This dissertation proposes a novel framework where entity attribute types can be extracted and evaluated automatically. The input of this framework consists of a speci??ed entity type and the initial knowledge of the entity type provided by users. Based on these input information, this framework ??rst creates a global web entity attribute schema, and makes sure
- 下载地址 (推荐使用迅雷下载地址,速度快,支持断点续传)
- PDF格式下载
- 更多文档...
-
上一篇:博士学位论文
下一篇:中国科学院研究生院博士学位论文
点击查看更多关于机器学习博士论文的相关文档
- 您可能感兴趣的
- 机器学习 机器学习实战 机器学习pdf 机器学习导论 机器学习实战pdf 神经网络与机器学习 机器学习算法 机器学习与数据挖掘 机器学习公开课 python机器学习
- 大家在找
-
- · QAY50起重机
- · 成都展宏扫地机器人
- · 整套qq群大全2011超拽
- · dnf虚空魔石
- · 恒温恒湿测试机
- · 运五活塞发动机的论文
- · 广汽丰田suv2010款
- · 技嘉945gcms2l
- · caiyilin
- · 全民创业五大体系建设
- · 高一数学必修1教案
- · 标准韩国语第一册视频
- · 细胞荧光染色
- · 吉林省建设厅网站
- · vmware7.1.3注册机
- · 建筑职称论文
- · 光纤通信技术下载
- · 农村社会养老保险政策
- · 大地之环军需官
- · 2008年高考成绩查询
- · 潜行狙击22集优酷播放
- · 热电厂新员工实习报告
- · 深圳市溢鸿塑胶模具厂
- · 报废汽车回收
- · s7200sim2.0
- · 胡梓欢佛山
- · 安委会18号文
- · dnf狂战士70大招视频
- · 公司面试题
- · 建设银行个人网上登入
- 赞助商链接