Federated search of text-based digital libraries in hierarchical peer-to-peer networks

合集下载

国外全文数据库

国外全文数据库

全文数据库特点

特点




直接性:即用户可以直接检索出原始文献,不必像参考 数据库那样先检索出书目信息,再去找原文。 综合性:全文数据库收录文献要求全,尽可能地扩大文 献来源。 检索方法:除一般检索外,增加全文检索技术,文献的 正文及其他相关部分(如引文)都可以被检索到。 原文格式:常见有pdf、html、文本和图片格式,大多 数全文数据库只采用pdf和html格式。
按学科分类浏览期刊(一)
按学科分类浏览期刊(二)
待刊文章,已收录 但尚未正式出版的 文章
可以将出版物加入最喜爱刊物列表
可以为特定出版物创建卷/刊数提示
按学科分类浏览期刊(三)
可在该刊内查找 特定文章
个性化功能提示 包含期刊投 稿信息的刊 物信息
检索界面
高级检索
专家检索
再调用检索查看保 存的检索列表 检索历史—打开 建立再检索或分阶段检索请求
检索历史界面
通过检索历史列表上适当的链接, 可以再次显示符合检索请求的文 档列表、再次运行检索请求以及 编辑原始的检索。
Search检索

检索语言/规则
FAST检索平台语言——2007年1月21日起,SD升级 至FAST检索平台,检索语言/规则有所变化,更加 智能、快速和灵活。 快速检索( Quick Search ) 高级检索( Advanced Search ) 专业检索( Expert Search )
登录后的主页
“Recent Actions” 将显示最近所做的 操作链接
可以通过打开或关闭新任 务个性化定制主页
个性化的主页
“Recent Actions” 显示的是最近所做 的操作链接
点击显示检索结果

互联网时代下信息真伪辨别的重要性英语作文范文

互联网时代下信息真伪辨别的重要性英语作文范文

互联网时代下信息真伪辨别的重要性英语作文范文Title: The Importance of Discerning Information Authenticity in the Internet AgeIntroduction:The rapid development of the internet has brought about unprecedented convenience in accessing information. However, it has also introduced significant challenges in distinguishing between authentic and deceptive information. This essay explores the crucial role of discerning information authenticity in the digital era and provides insights into effective strategies for individuals to navigate the vast online landscape.Body:1. The proliferation of misinformation:a. The ease of publishing and sharing information online.b. The lack of stringent fact-checking procedures.c. The spread of misinformation through social media platforms.2. The consequences of misinformation:a. Influencing public opinion and perception.b. Undermining credibility and trust in institutions.c. Potentially leading to adverse actions and behaviors.3. The importance of information verification:a. Preserving the accuracy and reliability of knowledge.b. Making informed decisions based on verified information.c. Safeguarding personal and collective well-being.4. Strategies for discerning information authenticity:a. Assessing the credibility of sources:i. Verifying author credentials and expertise.ii. Evaluating the reputation and reliability of the publishing platform.iii. Cross-referencing information with multiple credible sources.b. Analyzing the content:i. Scrutinizing the use of logical fallacies or emotional manipulation.ii. Identifying bias and agendas.iii. Fact-checking claims and statistical data through reputable fact-checking organizations.c. Utilizing critical thinking:i. Questioning the veracity and motives behind the information.ii. Considering alternative viewpoints and counterarguments.iii. Relying on evidence-based reasoning.5. The role of technology and media literacy:a. Promoting digital literacy education:i. Educating individuals about information verification techniques.ii. Teaching critical thinking skills in the digital landscape.iii. Encouraging media literacy to recognize misinformation patterns.b. Leveraging technology for fact-checking:i. Utilizing automated tools for identifying false information.ii. Supporting initiatives for AI-driven detection offake news.iii. Encouraging social media platforms to emphasize accurate information sharing.Conclusion:In the internet era, the ability to discern information authenticity is crucial to maintaining an informed societyand avoiding the detrimental consequences of misinformation. By implementing strategies like source credibility assessment, content analysis, and critical thinking, individuals can navigate the vast online landscape more effectively. Furthermore, technology and media literacy play a vital rolein promoting information verification and combating thespread of fake news. By collectively prioritizing the importance of information authenticity, we can safeguard the integrity of knowledge and make well-informed decisions inthe digital age.。

关于数字身份认证的文献

关于数字身份认证的文献

关于数字身份认证的文献数字身份认证是指使用数字技术来验证个人或实体的身份。

这种认证方法已经成为了现代社会中不可或缺的一部分,它在电子商务、在线银行业务、政府服务和许多其他领域中发挥着重要作用。

以下是关于数字身份认证的一些文献:1. "Digital Identity Authentication: A Comprehensive Review" 这篇综述性文献总结了数字身份认证的各种方法和技术,包括基于密码、生物识别、智能卡和公钥基础设施等。

它还讨论了当前数字身份认证系统的安全性和隐私问题,并提出了未来研究的方向。

2. "Challenges and Opportunities in Digital Identity Authentication" 这篇文献着重探讨了数字身份认证面临的挑战和机遇。

它分析了当前数字身份认证系统存在的安全漏洞和欺诈风险,并提出了改进和加强安全性的建议。

同时,它也探讨了新兴技术如区块链和人工智能在数字身份认证中的应用前景。

3. "Legal and Ethical Issues in Digital Identity Authentication" 这篇文献关注数字身份认证所涉及的法律和伦理问题。

它探讨了个人隐私权、数据保护和身份盗窃等问题,并分析了不同国家和地区对数字身份认证的法律法规和监管要求。

4. "Biometric-Based Digital Identity Authentication Systems" 这篇文献聚焦于基于生物识别技术的数字身份认证系统。

它介绍了指纹识别、虹膜扫描、面部识别等生物特征识别技术在数字身份认证中的应用,评估了其安全性和可靠性,并讨论了未来发展方向。

以上这些文献涵盖了数字身份认证的各个方面,包括技术、安全性、法律和伦理等问题,可以帮助你更全面地了解这一领域的研究和发展。

在数字时代培养有效的信息辨别技能英语作文

在数字时代培养有效的信息辨别技能英语作文

在数字时代培养有效的信息辨别技能英语作文In the digital age, we are constantly bombarded with vast amounts of information from various sources. With the rise of social media, fake news, and unreliable sources, it has become crucial for individuals to develop effective information discernment skills. In this article, we will explore the importance of cultivating these skills and provide tips on how to navigate the information overload in the digital world.One of the key reasons why it is essential to develop effective information discernment skills is the prevalence of fake news and misinformation online. With the ease of sharing information on social media platforms, it has become increasingly difficult to differentiate between credible sources and unreliable sources. This can lead to confusion, misinformation, and even harm if individuals act on false information.Furthermore, in today's fast-paced society, we are constantly inundated with information from multiple sources such as news websites, social media, and online forums. It is easy to feel overwhelmed and struggle to filter out the noise to focus on what is relevant and accurate. Developing information discernment skills can help individuals sift through theabundance of information and identify what is trustworthy and valuable.So, how can we cultivate effective information discernment skills in the digital age? Here are some tips to help you navigate the information overload and separate fact from fiction:1. Verify the source: Before believing or sharing information, always check the source to ensure it is credible and reliable. Look for reputable news outlets, official websites, or experts in the field.2. Cross-check information: Don't just rely on one source for your information. Cross-check facts and details from multiple sources to ensure accuracy and legitimacy.3. Question everything: Be critical of the information you come across and ask questions such as: Who is the author? What is their motive? Is the information biased or objective?4. Fact-check: Use fact-checking websites like Snopes or PolitiFact to verify claims and debunk myths. Don't spread information without verifying its accuracy.5. Stay informed: Keep yourself updated on current events, trends, and developments in your areas of interest. The moreinformed you are, the better equipped you will be to discern information.6. Develop critical thinking skills: Practice analyzing information, evaluating arguments, and making informed decisions based on evidence and logic.By honing these skills and adopting a critical mindset, you can navigate the digital landscape with confidence and discernment. Remember, in the era of fake news and information overload, it is more important than ever to cultivate effective information discernment skills.。

数据库系统英文文献

数据库系统英文文献

Database Systems1. Fundamental Concepts of DatabaseDatabase and database technology are having a major impact on the growing use of computers. It is fair to say that database will play a critical role in almost all areas where computers are used, including business, engineering, medicine, law, education, and library science, to name a few. The word "database" is in such common use that we must begin by defining what a database is. Our initial definition is quit general.A database is a collection of related data. By data, we mean known facts that can be recorded and that have implicit meaning. For example, consider the names, telephone numbers, and addresses of all the people you know. Y ou may have recorded this data in an indexed address book, or you may have stored it on a diskette using a personal computer and software such as DBASE III or Lotus 1-2-3. This is a collection of related data with an implic it meaning and hence is a database.The above definition of database is quite general; for example, we may consider the collection of words that make up thispage of text to be related data and hence a database. However, the common use of the term database is usually more restricted.A database has the following implicit properties:.A database is a logically coherent collection of data with some inherent meaning. A random assortment of data cannot bereferred to as a database..A database is designed, built, and populated with data for a specific purpose. It has an intended group of users and somepreconceived applications in which these users are interested..A database represents some aspect of the real world, sometimes called the mini world. Changes to the mini world are reflected in the database.In other words, a database has some source from which data are derived, some degree of interaction with events in the real world, and an audience that is actively interested in the contents of the database.A database can be of any size and of varying complexity. For example, the list of names and addresses referred to earlier may have only a couple of hundred records in it, each with asimple structure. On the other hand, the card catalog of a large library may contain half a million cards stored under different categories-by primary author’s last name, by subject, by book title, and the like-with each category organized in alphabetic order. A database of even greater size and complexity may be that maintained by the Internal Revenue Service to keep track of the tax forms filed by taxpayers of the United States. If we assume that there are 100million taxpayers and each taxpayer files an average of five forms with approximately 200 characters of information per form, we would get a database of 100*(106)*200*5 characters(bytes) of information. Assuming the IRS keeps the past three returns for each taxpayer in addition to the current return, we would get a database of 4*(1011) bytes. This huge amount of information must somehow be organized and managed so that users can search for, retrieve, and update the data as needed.A database may be generated and maintained manually or by machine. Of course, in this we are mainly interested in computerized database. The library card catalog is an example of a database that may be manually created and maintained. A computerized database may be created and maintained either by a group of application programs written specifically for that task or by a database management system.A data base management system (DBMS) is a collection of programs that enables users to create and maintain a database. The DBMS is hence a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications. Defining a database involves specifying the types of data to be stored in the database, along with a detailed description of each type of data. Constructing the database is the process of storing the data itself on some storage medium that is controlled by the DBMS. Manipulating a database includes such functions as querying the database to retrieve specific data, updating the database to reflect changes in the mini world, and generating reports from the data.Note that it is not necessary to use general-purpose DBMS software for implementing a computerized database. We could write our own set of programs to create and maintain the database, in effect creating our own special-purpose DBMS software. In either case-whether we use a general-purpose DBMS or not-we usually have a considerable amount of software to manipulate the database in addition to the database itself. The database and software are together called a database system.2. Data ModelsOne of the fundamental characteristics of the database approach is that it provides some level of data abstraction by hiding details of data storage that are not needed by most database users. A data model is the main tool for providing this abstraction. A data is a set of concepts that can beused to describe the structure of a database. By structure of a database, we mean the data types, relationships, and constraints that should hold on the data. Most data models also include a set of operations for specifying retrievals and updates on the database.Categories of Data ModelsMany data models have been proposed. We can categorize data models based on the types of concepts they provide to describe the database structure. High-level or conceptual data models provide concepts that are close to the way many users perceive data, whereas low-level or physical data models provide concepts that describe the details of how data is stored in the computer. Concepts provided by low-level data models are generally meant for computer specialists, not for typical end users. Between these two extremes is a class of implementation data models, which provide concepts that may be understood by end users but that are not too far removed from the way data is organized within the computer. Implementation data models hide some details of data storage but can be implemented on a computer system in a direct way.High-level data models use concepts such as entities, attributes, and relationships. An entity is an object that is represented in the database. An attribute is a property that describes some aspect of an object. Relationships among objects are easily represented in high-level data models, which are sometimes called object-based models because they mainly describe objects and their interrelationships.Implementation data models are the ones used most frequently in current commerc ial DBMSs and include the three most widely used data models-relational, network, and hierarchical. They represent data using record structures and hence are sometimes called record-based data modes.Physical data models describe how data is stored in the computer by representing information such as record formats, record orderings, and access paths. An access path is a structure that makes the search for particular database records much faster.3. Classification of Database Management SystemsThe main criterion used to classify DBMSs is the data model on which the DBMS is based. The data models used most often in current commercial DBMSs are the relational, network, and hierarchical models. Some recent DBMSs are based on conceptual or object-oriented models. We will categorize DBMSs as relational, hierarchical, and others.Another criterion used to classify DBMSs is the number of users supported by the DBMS. Single-user systems support only one user at a time and are mostly used with personal computer. Multiuser systems include the majority of DBMSs and support many users concurrently.A third criterion is the number of sites over which the database is distributed. Most DBMSs are centralized, meaning that their data is stored at a single computer site. A centralized DBMS can support multiple users, but the DBMS and database themselves reside totally at a single computer site. A distributed DBMS (DDBMS) can have the actual database and DBMS software distributed over many sites connected by a computer network. Homogeneous DDBMSs use the same DBMS software at multiple sites. A recent trend is to develop software to access several autonomous preexisting database stored under heterogeneous DBMSs. This leads to a federated DBMS (or multidatabase system),, where the participating DBMSs are loosely coupled and have a degree of local autonomy.We can also classify a DBMS on the basis of the types of access paty options available for storing files. One well-known family of DBMSs is based on inverted file structures. Finally, a DBMS can be general purpose of special purpose. When performance is a prime consideration, a special-purpose DBMS can be designed and built for a specific application and cannot be used for other applications, Many airline reservations and telephone directory systems are special-purpose DBMSs.Let us briefly discuss the main criterion for classifying DBMSs: the data mode. The relational data model represents a database as a collection of tables, which look like files. Mos t relational databases have high-level query languages and support a limited form of user views.The network model represents data as record types and also represents a limited type of 1:N relationship, called a set type. The network model, also known as the CODASYL DBTG model, has an associated record-at-a-time language that must be embedded in a host programming language.The hierarchical model represents data as hierarchical tree structures. Each hierarchy represents a number of related records. There is no standard language for the hierarchical model, although most hierarchical DBMSs have record-at-a-time languages.4. Client-Server ArchitectureMany varieties of modern software use a client-server architecture, in which requests by one process (the client) are sent to another process (the server) for execution. Database systems are no exception. In the simplest client/server architecture, the entire DBMS is a server, except for the query interfaces that interact with the user and send queries or other commands across to the server. For example, relational systems generally use the SQL language for representing requests from the client to the server. The database server then sends the answer, in the form of a table or relation, back to the client. The relationship between client and server can get more work in theclient, since the server will e a bottleneck if there are many simultaneous database users.。

win7几个版本的区别表

win7几个版本的区别表

)

多语言用户界面(MUI)语言包

位置感应打印


UNIX程序子系统

在Windows 7的这六个版本中,只有家庭基础版、家庭高级版、专业版和旗舰版会出现在零售市场上,且家庭基础版仅供发展中国家和地区,而入门版是提供给OEM厂商预装在上网本上的,企业版则只通过批量授权软件保障项目提供给大企业客户,在功能上和旗舰版完全相同。
可靠性功能:
入门版
家庭基础版
家庭高级版
专业版
专业版和旗舰版



MPEG-2扩展安装


n/a
n/a
n/a
媒体中心



电视卡支持数量
数字/模拟每类四个
数字/模拟每类四个
数字/模拟每类四个
Windows DVD Maker



Device Stage





同步中心





网络功能:
入门版
家庭基础版
家庭高级版
专业版
专业版和旗舰版
SMB连接数
20
20
20
Windows备份





系统镜像





备份至网络


EFS加密


BitLocker

BitLocker To Go

自动碎皮纳整理





以前的版本

图书管理制度英文

图书管理制度英文

图书管理制度英文Introduction:A library is a repository of knowledge where people can access a wide range of resources such as books, journals, magazines, and digital material. In order to efficiently manage these resources and provide excellent service to users, it is essential to have a well-structured and organized library management system in place. This system helps in the smooth operation of the library and ensures that the needs of patrons are met in a timely manner.Library management system is a software solution that automates various tasks related to the administration and operation of a library. It helps in cataloging and classifying of library materials, tracking of borrowed items, managing user accounts, and generating reports on library activities. This system is crucial for libraries of all sizes as it streamlines processes and enhances the overall user experience.Key Features of Library Management System:1. Cataloging and Classification:One of the primary functions of a library management system is cataloging and classification of library materials. This involves assigning unique identification numbers to each item, categorizing them based on subjects or genres, and creating metadata records for easy retrieval. The system should support various cataloging standards such as MARC (Machine Readable Cataloging), AACR2 (Anglo-American Cataloging Rules), and RDA (Resource Description and Access) to ensure consistency and accuracy in cataloging.2. Circulation Management:Circulation management is another important feature of a library management system. It includes tasks such as issuing and returning of library materials, tracking overdue items, placing holds on items, and managing fines and fees. The system should be able to handle multiple circulation policies, generate due date reminders, and automatically update item statuses based on user actions.3. User Management:User management involves creating and maintaining user accounts, assigning user access rights, and managing user information. The library management system should support user authentication mechanisms such as login credentials, barcode scanning, or RFID (Radio Frequency Identification) tags to ensure secure access to library resources. It should also allow users to request materials, renew items, and update their personal details online.4. Acquisitions and Budgeting:Acquisitions module of the library management system helps in managing the procurement of library materials, tracking vendor orders, and managing budgets. It should supportworkflows for selecting items for purchase, creating purchase orders, receiving and processing shipments, and updating inventory records. The system should also provide budgeting tools to monitor expenditures, analyze spending patterns, and generate financial reports.5. Reporting and Analytics:Reporting and analytics feature of a library management system enables librarians to track library usage, analyze collection data, and generate reports on various library activities. It should provide pre-defined templates for generating standard reports such as circulation statistics, collection usage, and user demographics. The system should also offer customization options to create ad-hoc reports based on specific requirements.6. Interlibrary Loan:Interlibrary loan functionality allows libraries to borrow materials from other libraries on behalf of their users. The library management system should support interlibrary loan requests, track loaned items, and manage delivery logistics. It should integrate with national and international interlibrary loan networks to facilitate resource sharing among libraries.7. Digital Resource Management:Digital resource management feature of the library management system enables libraries to manage electronic resources such as e-books, e-journals, databases, and multimedia content. It should support authentication protocols for accessing digital materials, provide federated search capabilities, and offer tools for managing licenses and usage rights. The system should also enable users to access digital resources remotely through web portals or mobile apps.Benefits of Library Management System:Implementing a library management system offers several benefits to libraries, librarians, and library users. Some of the key benefits include:1. Improved Efficiency:Library management system automates routine tasks such as cataloging, circulation, and reporting, thus reducing manual effort and saving time for librarians. It streamlines workflows, eliminates redundancies, and enhances overall operational efficiency of the library.2. Enhanced User Experience:Library management system provides a seamless and user-friendly interface for patrons to search, reserve, and borrow library materials online. It offers self-service options, personalized recommendations, and access to digital resources, thereby enhancing the user experience and increasing user satisfaction.3. Better Resource Utilization:Library management system helps in optimizing library resources by tracking circulation patterns, analyzing usage data, and identifying popular materials. It enables librarians to make informed collection development decisions, allocate budgets effectively, and improve resource utilization in the library.4. Enhanced Security:Library management system enhances the security of library materials and user information by implementing access controls, audit trails, and encryption mechanisms. It protects against unauthorized access, data breaches, and theft of library resources, ensuring the confidentiality and integrity of library operations.5. Data-driven Decision Making:Library management system provides real-time insights into library activities through comprehensive reporting and analytics tools. It enables librarians to analyze trends, monitor performance indicators, and make data-driven decisions to improve library services and optimize resource allocation.Conclusion:Library management system is an indispensable tool for modern libraries to streamline operations, enhance user experience, and manage resources effectively. By leveraging the key features of a library management system such as cataloging, circulation, user management, acquisitions, reporting, and digital resource management, libraries can provide better services to users and adapt to the evolving information landscape. Implementing a robust library management system is essential for libraries to stay relevant, competitive, and responsive to the changing needs of patrons in the digital age.。

4.外文数据库检索复习题

4.外文数据库检索复习题

EBSCO数据库中的BSC是商管财经类全文数据库Business SourceComplete的简称。

所选答案:[未给定]正确答案:对问题2 得0 分,满分1 分在外文数据库检索时,遇有Fulltext链接时,说明该库可提供原文。

所选答案:[未给定]正确答案:对问题3 得0 分,满分1 分在外文数据库检索时,使用截词检索可以提高信息检索的查全率,但是可能会降低信息检索的查准率。

所选答案:[未给定]正确答案:对问题4 得0 分,满分1 分在EBSCO数据库检索时,输入:ne?t 可以检出neat ,nest,next或net所选答案:[未给定]正确答案:错问题5 得0 分,满分1 分EBSCO数据库中的ASC是学术期刊集成全文数据库AcademicSearch Complete的简称。

所选答案:[未给定]正确答案:对问题6 得0 分,满分1 分常用布尔算符有:and、or、not等,当采用同义词进行检索时应当选用的算符是and所选答案:[未给定]正确答案:错问题7 得0 分,满分1 分SpringerLink将收录的所有文献类型按期刊、图书、丛书、参考工具书、实验室指南等进行划分。

所选答案:[未给定]正确答案:对在使用截词方法检索具有相同词干的检索词时,这些词之间自动地隐含了逻辑与的关系。

所选答案:[未给定]正确答案:错问题9 得0 分,满分1 分在使用截词方法检索具有相同词干的检索词时,这些词之间自动地隐含了逻辑或的关系。

所选答案:[未给定]正确答案:对问题10 得0 分,满分1 分在题名字段检索输入Comput* 可以检出题名中包含有Computing、Computed、Computer等词之一的文献。

所选答案:[未给定]正确答案:对问题11 得0 分,满分1 分SpringerLink数据库由美国Springer(施普林格)出版社出版。

所选答案:[未给定]正确答案:错问题12 得0 分,满分1 分禁用词(STOP WORD)是指不表达实际意义的虚词,如冠词、介词、连词等。

参照依赖理论

参照依赖理论

SQL Server Object Explorer T-SQL Editor
Project Based Development
1st class T-SQL Support
Go To Definition
锐普PPT论坛chinakui分享:
Source Code Based F5 Debugging & Testing with LocalDB Source Code Control MSBuild Headless Command Line Tools
Windows Store
Tech Company
Windows Store
Enterprise
Side-loaded
Enterprise
Consumer
Business
Connected Development
Familiar User Experience
锐普PPT论坛chinakui分享:
Familiar User Experience
Optimized Productivity
Difference Detection
Schema Editing Tools
PROJECT BASED DEVELOPMENT
锐普PPT论坛chinakui分享:
1st class T-SQL Support
行为经济学分析福岛核电事件对我国电力股 票的影响
——参照依赖理论
那究竟什么是参照依赖理论? 我们先来看一个密谋事件。
“太阳系密谋事件”
如果你买了地球的股票,你会怎么想?
如果你买了太阳系其他行星的股票,你会怎么想?
如果你是太阳系以外的投资者,你会怎么想?

选择合适的版本

选择合适的版本

1、选择合适的版本。

Windows 7版本众多,但受众各不相同,举例说大多数商业用户选择专业版即可,而无需更贵的旗舰版,除非你需要BitLocker等功能。

2、别忘了64位版。

Windows 7是微软第二套完整支持64位技术的系统,而且64位桌面环境也已经基本成熟,硬件、软件都差不多了。

3、Windows XP虚拟模式。

一套带有Windows XP完整拷贝的Virtual PC虚拟机。

这是虚拟化第一次全面走向普通用户,让人们可以在升级到Windows 7的同时保留完整的XP兼容性。

Windows XP Mode RTM最终正式版将在22日Windows 7发售当天公开发布。

4、Windows PowerShell v2。

不止是一个简单的命令行外壳,更是管理员期待已久的集成脚本环境(ISE),具备强大的分布式并行处理能力,可以借助新的远程功能轻松管理数百台计算机。

快捷键Ctrl+Alt+I。

目前只集成在Windows 7中,不过半年后会发布用于Vista等旧系统的独立版本。

5、AppLocer。

XP时代就有软件限制策略,AppLocker则是混合了黑白名单的软件运行限制功能,能增强甚至取代安全软件,确保只有你批准的软件能够执行。

6、在资源管理器和命令行之间切换。

选中某个文件夹,按住Shift点击右键,即可出现在此处打开命令窗口,点击即可打开命令行窗口而且位置就是当前文件夹;如果在命令行里想开启资源管理器窗口而且焦点位于当前文件夹,输入start。

(其实Vista也可以如此)7、把问题录下来。

碰到了麻烦想远程求助又不知道如何描述?问题步骤记录器(PSR)就能帮你把每次点击时的相应步骤和屏幕记录下来,甚至能够添加注释,最终生成一个MHTML文件并压缩,发送即可。

8、制作培训视频。

(这个其实不是Windows 7本身的,而是利用Camtasia等第三方工具录制简短视频,帮助其他人了解新系统新功能)9、别忘了Windows Server 2008 R2。

英语作文-图书馆的数字化信息组织与检索

英语作文-图书馆的数字化信息组织与检索

英语作文-图书馆的数字化信息组织与检索Libraries have undergone a profound transformation in recent years, moving from traditional repositories of physical books to dynamic hubs of digital information. This shift towards digitization has revolutionized the way libraries organize and retrieve information. In this essay, we will explore the significance of digital information organization and retrieval in libraries.Digitalization has opened up endless possibilities for how libraries store and manage their collections. No longer constrained by physical space, libraries can now offer access to vast digital repositories spanning various formats, including e-books, audiobooks, academic journals, and multimedia resources. This wealth of digital content presents both opportunities and challenges in terms of organization and retrieval.One of the key advantages of digital information organization is the ability to implement sophisticated classification systems. Traditional libraries often rely on the Dewey Decimal or Library of Congress classification systems, which categorize books based on subject matter. While these systems remain relevant, digital libraries can enhance them with metadata tagging, keyword indexing, and semantic analysis. These advanced techniques enable more precise categorization and facilitate more nuanced search capabilities.Metadata plays a crucial role in digital information organization by providing descriptive information about each resource. This metadata can include details such as title, author, publication date, subject keywords, and genre. By accurately tagging resources with metadata, libraries empower users to perform targeted searches and quickly locate relevant information. Moreover, metadata enables advanced search functionalities, such as faceted search, which allows users to filter search results based on multiple criteria simultaneously.In addition to metadata, keyword indexing enhances the discoverability of digital resources within libraries. Keyword indexing involves identifying significant terms or phrases within a resource and creating an index based on these keywords. This indexenables users to search for specific terms and retrieve relevant resources efficiently. Furthermore, keyword indexing can be augmented with natural language processing techniques to improve the accuracy of search results and accommodate variations in language usage.Semantic analysis represents a further advancement in digital information organization by focusing on the meaning and context of resources. Unlike traditional keyword-based approaches, semantic analysis seeks to understand the underlying concepts and relationships within the content. By analyzing the semantic structure of resources, libraries can offer more sophisticated search capabilities, such as concept-based retrieval and semantic search. These techniques enable users to explore related concepts and discover connections between disparate resources.While digital information organization offers numerous benefits, effective retrieval mechanisms are equally essential. Libraries must ensure that users can easily access and retrieve the information they need from digital repositories. User-friendly search interfaces play a critical role in facilitating efficient information retrieval. Intuitive search interfaces should provide users with various search options, including keyword search, advanced search, and browsing by subject categories.Moreover, libraries can enhance information retrieval through personalized recommendations and curated collections. By analyzing user behavior and preferences, libraries can suggest relevant resources tailored to individual interests. Additionally, curated collections curated by subject matter experts can highlight high-quality resources on specific topics, providing users with curated pathways for exploration.In conclusion, the digitization of library collections has transformed the landscape of information organization and retrieval. Through advanced techniques such as metadata tagging, keyword indexing, and semantic analysis, libraries can offer users unprecedented access to digital resources. By prioritizing user-friendly search interfaces and personalized recommendations, libraries can ensure that users can navigate digital repositories effectively and discover the information they seek.。

辨别信息真伪英语作文

辨别信息真伪英语作文

辨别信息真伪英语作文Discerning Truth from Falsehood: The Importance of Critical Thinking in the Digital AgeIn the modern era, where information is readily available at our fingertips, the ability to discern truth from falsehood has become increasingly crucial. The proliferation of digital media and the ease with which information can be shared and disseminated have led to a landscape rife with misinformation, conspiracy theories, and fake news. As individuals, we are constantly bombarded with a barrage of information, and it is essential that we develop the critical thinking skills necessary to navigate this complex landscape effectively.One of the primary challenges in the digital age is the ease with which information can be manipulated and presented in a way that appears credible. Individuals and organizations with nefarious intentions can leverage the power of social media and online platforms to spread false narratives, sow discord, and influence public opinion. This phenomenon has become particularly prevalent in the realm of politics, where the stakes are high, and the consequences of believing misinformation can be severe.To combat this, it is essential that we cultivate a mindset of critical thinking. This involves approaching information with a healthy skepticism, questioning the sources and motives behind the content we consume, and actively seeking out reliable and authoritative sources to corroborate the information we encounter. By doing so, we can better discern truth from falsehood and make informed decisions that are grounded in facts rather than emotions or biases.One of the key skills in critical thinking is the ability to identify logical fallacies and biases. Logical fallacies are flaws in reasoning that can lead to invalid or unsound conclusions, while biases are cognitive shortcuts that can distort our perception of reality. By recognizing these pitfalls, we can become more adept at identifying misinformation and separating fact from fiction.Another essential aspect of critical thinking is the ability to evaluate the credibility of sources. In the digital age, anyone can publish content online, and it is crucial that we scrutinize the reliability and objectivity of the information we consume. This involves considering factors such as the author's credentials, the reputation of the publication or platform, and the presence of any potential conflicts of interest or biases.Furthermore, the ability to think critically extends beyond the evaluation of information itself. It also involves the ability tosynthesize and analyze information from multiple perspectives, to consider alternative viewpoints, and to draw nuanced and well-reasoned conclusions. By embracing this approach, we can avoid falling into the trap of confirmation bias, where we only seek out information that aligns with our preexisting beliefs.Ultimately, the ability to discern truth from falsehood is not just a valuable skill, but a necessary one in the digital age. As we navigate an increasingly complex and information-saturated world, the development of critical thinking skills becomes essential for making informed decisions, maintaining a healthy skepticism, and safeguarding our individual and societal well-being. By cultivating these abilities, we can empower ourselves to be discerning consumers of information, to think independently, and to contribute to the betterment of our communities and the world at large.。

在数字时代培养有效的信息辨别技能英语作文

在数字时代培养有效的信息辨别技能英语作文

Cultivating Effective Information DiscernmentSkills in the Digital EraIn the era of digitization, information has become a ubiquitous commodity, flooding our lives from every conceivable angle. The internet, social media platforms, and various digital channels have made accessing information effortless, but this ease of access has also led to a deluge of misinformation and disinformation. Navigating this information-rich but often murky landscape requires the cultivation of effective information discernment skills.Firstly, it is crucial to develop a critical mindset. This involves approaching information with skepticism, rather than accepting it at face value. Questioning the source, motive, and evidence supporting any given piece of information is paramount. A critical mindset encourages us to seek multiple perspectives, compare and contrast different viewpoints, and formulate our own opinions based on a comprehensive understanding of the facts.Moreover, the ability to filter and prioritize information is essential. With the vast amount of dataavailable, it is impractical to process everything. Instead, we must learn to identify relevant information that aligns with our interests, needs, and goals. This requires setting clear objectives and using tools like search engines, RSS feeds, or social media filters to streamline ourinformation intake.Furthermore, verifying information from reliablesources is vital. In the digital age, anyone can publish content online, making it challenging to discern the truth. It is imperative to rely on established media outlets, academic institutions, or government agencies that have a reputation for accuracy and credibility. Cross-referencing information from multiple sources can also help verify its authenticity.Additionally, understanding the nuances of digital communication is key. The language, tone, and format of online content can often reveal its biases or hidden agendas. Learning to read between the lines and interpret digital cues can help us identify potential misinformationor propaganda.Lastly, ongoing education and awareness are crucial. The digital landscape is constantly evolving, and new forms of misinformation and disinformation are emerging. Staying up-to-date with the latest trends and developments in the field of digital information can help us stay ahead of the curve.In conclusion, cultivating effective information discernment skills in the digital era is essential for navigating the maze of information. A critical mindset, the ability to filter and prioritize information, verification from reliable sources, understanding digital communication nuances, and ongoing education are key components of this skillset. By honing these skills, we can empower ourselves to make informed decisions, avoid falling prey to misinformation, and contribute to a more informed and discerning society.**在数字时代培养有效的信息辨别技能**在数字化时代,信息已成为一种无处不在的商品,从各个可想象的角度充斥着我们的生活。

英语作文-档案馆的数字化存储与检索技术

英语作文-档案馆的数字化存储与检索技术

英语作文-档案馆的数字化存储与检索技术In the realm of archival science, the digitization of records and the technology for their retrieval represent a significant leap forward from traditional methods of preservation and access. The transition from physical to digital archives has not only expanded the capacity for storage but also revolutionized the way we retrieve information.The process of digitizing records involves converting physical documents into digital formats. This is typically done through scanning or photographing the documents, which are then stored as image files. However, to make these files searchable and retrievable, they must be processed further. Optical Character Recognition (OCR) technology is often employed to convert images of text into machine-encoded text. This allows for the content within the documents to be searched and indexed.Once digitized, the records are organized into databases. These databases are designed with sophisticated search algorithms that enable users to locate information quickly and efficiently. The metadata attached to each record, which includes details such as the date of creation, author, and subject matter, plays a crucial role in this process. It allows for the implementation of advanced search functions, such as keyword searches, full-text searches, and thematic searches.The benefits of digital storage and retrieval technologies are manifold. They provide enhanced preservation of documents, as digital files do not degrade over time in the same way that physical documents do. They also facilitate greater access to information, as digital archives can be made available online, allowing people from all over the world to access them without the need to physically visit the archive.Moreover, digital archives can support a variety of file types, including text, audio, video, and images, thus preserving a richer historical record. They also offer the potential for interactivity, such as the ability to annotate documents or link related records, which can enrich the user's experience and understanding of the material.However, the digitization of archives also presents challenges. Ensuring the long-term preservation of digital files requires careful planning, as technology changes rapidly and file formats can become obsolete. There is also the issue of digital divide; not everyone has the same level of access to digital technologies, which can limit the accessibility of digital archives.In conclusion, the digitization of archival storage and retrieval technologies has transformed the field of archival science. It has made vast amounts of information more accessible than ever before, while also presenting new opportunities and challenges. As technology continues to evolve, so too will the methods by which we preserve and access our collective history. The future of archival science is digital, and it holds great promise for the democratization of knowledge and the preservation of our cultural heritage.。

高中生英语作文《互联网时代的信息筛选》

高中生英语作文《互联网时代的信息筛选》

高中生英语作文《互联网时代的信息筛选》(中英文实用版)In the Internet era, information sorting has become an essential skill for individuals to navigate the vast sea of data.With the advent of the internet, an unprecedented amount of information is at our fingertips, making it easier to access knowledge and stay connected globally.However, this influx of information also brings challenges in terms of accuracy, reliability, and relevance.Therefore, the ability to sort through and select valuable information becomes increasingly crucial.To begin with, the internet is a double-edged sword when it comes to the accuracy and reliability of information.On one hand, we can easily find a wealth of knowledge on various topics with a few clicks.On the other hand, the spread of misinformation and fake news is also a significant concern.Therefore, it is vital to develop critical thinking skills and the ability to discern the credibility of sources.We should question the validity of information, check the reputation of the sources, and cross-reference facts to ensure that we are relying on accurate and trustworthy information.Moreover, with the vast amount of information available, it is essential to prioritize and focus on what is most relevant to our needs.Sorting through this overflow of data requires us to be discerning and purposeful in our search.We should define our information needsclearly and use search engines and filters effectively to find the most relevant information.Additionally, we can use various tools and applications that help us organize and manage the information we gather, making it easier to access and utilize when needed.Furthermore, the internet allows us to connect with people from diverse backgrounds and cultures, providing us with a broader perspective on various issues.However, it is crucial to be mindful of the potential biases and viewpoints that may be presented.We should strive to consume information from a variety of sources and perspectives, including those that may challenge our preconceived notions or beliefs.This helps us develop a more well-rounded understanding of the world and fosters open-mindedness and empathy.In conclusion, the internet era has brought about a wealth of information that is accessible at our fingertips.However, it also requires us to develop critical skills to sort through and select relevant and accurate information.By questioning the credibility of sources, being discerning in our search, and embracing diverse perspectives, we can navigate the digital landscape effectively and make informed decisions.In an era where information overload is a reality, the ability to sort and select information becomes a valuable skill that empowers us to thrive in the digital age.。

教你怎么找国内外博硕士论文

教你怎么找国内外博硕士论文

高级检索Search-advanced)
特殊检索功能
1. 字段检索(Keywords+Field)
特殊检索功能
2. 检索历史(Search History)
特殊检索功能
3. 主题索引 (Subject Tree)
特殊检索功能
4. 学校索引 (School Index)
浏览功能
如何取得 国内博硕士论文纸本原件


亲至各校图书馆参阅 通过馆际合作取得
国家科技图书文献中心

万方数据
三、查找国外博硕士论文
1、PQDD 2、ETD Digital Library
3、
ProQuest Digital Dissertations (PQDD)
ProQuest Digital Dissertations

检索结果
– 排序 – 标注 – 详细书目 – 免费预览
前24页 – PDF格式 – 线上订购
检索结果清单 (Search Results)
1. 排序功能:可依:取得学位的日期、作者及书名加以排序
2. 标注功能: 勾选后就会将本条记录加到marked list 3.订购标注功能: 勾选后就会将本条记录加到shopping cart 4.立即订购:点选 鍵便会进入shopping cart画面, 而且可以开始订购本条记录所对应的论文。
2.
3.
1997年建立
成员:169 个成员,148 所大学 (包括 4个联盟), 21所研究机构 24个单位提供网上数字化论文。
4. 点击Federated Search Demonstration,找到search these sites,即可 进行搜寻。 5. 6. 7. 8. show me all ETD sites→有24个单位 Find cataloged sites about _______ search Dissertation com 提供25页预览 Thumbnail 指姆指页

federated的用法

federated的用法

federated的用法Federated的用法1. Federated LearningFederated Learning是一种机器学习的方式,它允许不同设备和系统之间共享模型而不是数据。

这种方法对于保护用户隐私和节省带宽非常有帮助。

在Federated Learning中,每个设备本地训练模型,然后将更新的模型参数发送到集中的服务器上进行整合。

通过这种方式,用户数据不会离开本地设备,从而保护了隐私。

2. Federated Identity ManagementFederated Identity Management(联合身份管理)是一种身份验证和授权机制,它允许用户在多个不同的系统中使用同一组凭据进行身份验证。

例如,当您通过谷歌账号登录某个网站时,该网站可以通过与谷歌进行联合身份管理,使用谷歌账号验证您的身份,而无需您提供额外的凭据。

3. Federated DatabaseFederated Database(联合数据库)是一种允许在多个数据库之间共享数据的技术。

它可以将分布在不同数据库中的数据整合起来,使用户可以通过一个集中的接口来访问和查询数据,而无需了解底层数据库的具体细节。

这种方式可以提高数据的可用性和可伸缩性,同时简化了数据管理和查询操作。

Federated Search(联合搜索)是一种搜索技术,它允许用户一次性搜索多个不同的搜索引擎或数据库。

通过将搜索请求发送到多个来源,并将结果整合在一起,联合搜索可以提供更全面和准确的搜索结果。

这种技术通常用于信息聚合网站或跨领域搜索,可以帮助用户更快地找到所需的信息。

5. Federated AuthenticationFederated Authentication(联合身份验证)是一种身份验证机制,它允许用户使用在一个系统中验证过的凭据来访问其他系统。

例如,当您使用社交媒体账号登录某个网站时,该网站可以通过与社交媒体进行联合身份验证,验证您的身份并授权您访问其特定功能。

TRS全文数据库系统集群及案例介绍

TRS全文数据库系统集群及案例介绍

TRS全文数据库系统集群及案例目录TRS全文数据库系统集群 (2)案例类型一:TRS Database+Oracle (4)案例类型二:TRS Database+Oracle+TRS Radar (8)案例类型三:TRS Database+TRS CKM (11)TRS全文数据库系统集群简称TRS Database Cluster,是架构在多个物理TRS全文数据库服务器之上的分布式管理系统,它支持数据分布以及负载均衡两种基本分布方式,并支持两种方式的组合运用。

TRS全文数据库集群系统结构示意图中,“TRS全文数据库服务器组”内的数据库服务器之间属于负载均衡模式,由集群服务器统一调度,一个服务请求只发往其中的一个数据库服务器。

当然,在每个“TRS 全文数据库服务器组”中也可以只包含(也至少有)一个数据库服务器。

“TRS全文数据库服务器组”之间则属于分布式检索模式,一个检索请求需要根据其所包含的目标对象的分布情况,发往其中部分或全部的数据库服务器组。

采用TRS全文数据库集群服务器可以实现以下目标:●实现海量数据的无限扩展。

●实现高并发用户的高性能访问。

●实现高可靠性的检索服务(无单点故障)。

●实现本地管理(Manage Locally)和联邦检索(FederatedSearch)。

数据量估算一般方法:对于TB级的数据量,系统要提供秒级的检索速度,就必须构建分布式检索系统搜索这么庞大数据;为了几百上千人同时使用这个系统查询信息,这个系统就必须构建负载均衡集群系统。

根据实践经验,搜索引擎界认可的单机检索数据量为400~600万网页,我们的经验最高可以达到1000万网页,每网页HTML大小为10K+(天网统计13K),即单机索引检索数据量为130G(HTML,折合成Text不大于30G)。

那么1TB的HTML需要8台PC Server构建分布式检索集群系统, 1TB的Text以此类推需要更多的机器。

根据实践经验,搜索引擎界认可的单机并发检索请求数为10~20个, 如果考虑到瞬间高峰的大量重复检索,单机能够支持的并发检索请求数为50个以上。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer NetworksJie LuSchool of Computer Science Carnegie Mellon University Pittsburgh, PA 15213jielu@Jamie Callan School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 callan@ABSTRACTPeer-to-peer architectures are a potentially powerful model for developing large-scale networks of text-based digital libraries, but peer-to-peer networks have so far provided very limited support for text-based federated search of digital libraries using relevance-based ranking. This paper addresses the problems of resource representation, resource ranking and selection, and result merging for federated search of text-based digital libraries in hierarchical peer-to-peer networks. Existing approaches to text-based federated search are adapted and two new methods are developed for resource representation and resource selection according to the unique characteristics of hierarchical peer-to-peer networks. Experimental results demonstrate that the proposed approaches are both more accurate and more efficient than more common alternatives for text-based federated search in peer-to-peer networks.Categories and Subject DescriptorsH.3.3 [Information Storage and Retrieval]: Retrieval models, Search process, Selection processGeneral TermsAlgorithms, Design, Experimentation, Performance KeywordsPeer-to-peer, Hierarchical, Federated Search, Text-Based, Retrieval, Digital Library1.INTRODUCTIONPeer-to-peer (P2P) networks are an appealing approach to federated search over large networks of digital libraries. The activities involved for search in peer-to-peer networks include issuing requests (“queries”), routing requests (“query routing”), and responding to requests (“retrieval”). The nodes in peer-to-peer networks can participate as clients and/or servers. Client nodes issue queries to initiate search in peer-to-peer networks; server nodes provide information contents, respond to queries with documents that are likely to satisfy the requests, and/or route queries to other servers.The first peer-to-peer networks were based on sharing popular music, videos, and software. These types of digital objects have relatively obvious or well-known naming conventions and descriptions, making it possible to represent them with just a few words from a name, title, or manual annotation. From a Library Science or Information Retrieval perspective, these systems were designed for known-item searches, in which the goal is to find a single instance of a known object (e.g., a particular song by a particular artist). In a known item search, the user is familiar with the object being requested, and any copy is as good as any other. Known-item search of popular music, video, and software file-sharing systems is a task for which simple solutions suffice. If P2P systems are to scale to more varied content and larger digital libraries, they must adopt more sophisticated solutions.A very large number of text-based digital libraries were developed during the last decade. Nearly all of them use some form of relevance ranking, in which term frequency information is used to rank documents by how well they satisfy an unstructured text query. Many of them allow free search access to their contents via the Internet, but do not provide complete copies of their contents, or even complete title lists for their contents, upon request. Many do not allow their contents to be crawled by Web search engines. They do not cooperate by conforming to a single method of text representation, query processing, or document retrieval; they don’t even provide information about how these operations are done. We would argue that most of the recent research on peer-to-peer networks offers little useful guidance for providing federated search of current text-based digital libraries.This paper addresses the problem of using peer-to-peer networks as a federated search layer for text-based digital libraries. We study federated search in two different types of environments: cooperative environments where each digital library provides accurate resource description of its content upon request, and uncooperative environments where resource descriptions must be obtained indirectly. We start by assuming the current state of the art; that is, we assume that each digital library is a text database running a reasonably good conventional search engine, that it provides search access to its holdings, and that it provides individual documents in response to full text queries. We present in this paper how resource descriptions of digital libraries are obtained and used for efficient query routing, and how results from different digital libraries are merged into a single, integrated ranked list in peer-to-peer networks.In the following section we give an overview of the prior research on federated search of text-based digital libraries and peer-to-peer networks. Section 3 describes our approaches to federated search of text-based digital libraries in peer-to-peer networks. Sections 4 and 5 discuss our data resources and evaluation methodologies. Experimental settings and results are presented in Section 6. Section 7 concludes.2.OVERVIEWAccurate and efficient federated search in peer-to-peer networks of text-based digital libraries requires both the appropriate peer-to-peer architecture and the effective search methods developed for the chosen architecture. In this section we present anoverview of the prior research on federated search of text-based digital libraries, peer-to-peer network architectures, and text-based search in peer-to-peer networks in order to set the stage for the descriptions of our approaches to text-based federated search in peer-to-peer networks.2.1Federated Search of Text-Based Digital LibrariesPrior research on federated search of text-based digital libraries (also called “distributed information retrieval” in the research literature) identifies three problems that must be addressed: •Resource representation: Discovering the contents or content areas covered by each resource (“resource description”);•Resource ranking and selection: Deciding which resources are most appropriate for an information need based on their resource descriptions; and•Result-merging: Merging ranked retrieval results from a set of selected resources.A directory service is responsible for acquiring resource descriptions of the digital libraries it serves, selecting the appropriate resources (digital libraries) given the query, and merging the retrieval results from selected resources into a single, integrated ranked list. Solutions to all these three problems for the case of a single directory service have been developed in distributed information retrieval. We briefly review them below. 2.1.1Resource RepresentationDifferent techniques for acquiring resource descriptions require different degrees of cooperation from digital libraries. STARTS is a cooperative protocol that requires every digital library to provide an accurate resource description to the directory service upon request [6]. STARTS is a good solution in environments where cooperation can be guaranteed. However, in some environments where digital libraries may not cooperate or may have an incentive to cheat, STARTS cannot be used to acquire accurate resource descriptions.Query-based sampling is an alternative approach to acquiring resource descriptions without requiring explicit cooperation from digital libraries. The resource description of a digital library is constructed by sampling its documents via the normal process of submitting queries and retrieving documents. Query-based sampling has been shown to acquire fairly accurate resource descriptions using a small number of queries and documents in distributed information retrieval environments [1].The total number of documents of a digital library is one of the most important corpus statistics required by many resource selection algorithms. Capture-Recapture [12] and Sample-Resample [20] are two methods of estimating the total number of documents of an uncooperative digital library. Experimental results show that in most scenarios, Sample-Resample is more accurate and has less communication costs than the Capture-Recapture method.2.1.2Resource Ranking and SelectionResource selection aims at selecting a small set of resources that contain a lot of documents relevant to the information request. Resources are ranked by their likelihood to return relevant documents and top-ranked resources are selected to process the information request.Resource selection algorithms such as CORI [1], gGlOSS [7], and Kullback-Leibler (K-L) divergence-based algorithms [24] use techniques adapted from document retrieval for resource ranking. The resource description of a digital library used by these algorithms includes a list of terms with corresponding collection term frequencies, and corpus statistics such as the total number of terms and documents in the collection. These algorithms have been shown to work well with resource descriptions provided by cooperative digital libraries or acquired using query-based sampling.Other resource selection algorithms including ReDDE [20] and DTF (the decision-theoretic framework for resource selection) [16] rank resources by directly estimating the number of relevant documents from each resource for a given query. ReDDE relies on sampled documents obtained using query-based sampling for such estimation. DTF has three variants DTF-rp, DTF-sample and DTF-normal. DTF-rp estimates the number of relevant documents from a resource by assuming a linearly decreasing recall-precision function and calculating the expected precision and recall from the resource. DTF-sample uses sampled documents to estimate how relevant documents are distributed among the available resources. DTF-normal models the distribution of document scores from a resource with normal distribution and map document scores to probability of relevance using a function learned with user relevance feedback.Deciding how many top-ranked resources to be selected (“thresholding”) is a problem that is usually simplified. Most resource selection algorithms use heuristic values such as 10 and 20 for the number of selected resources.2.1.3Result MergingMany result-merging algorithms have been proposed in distributed information retrieval. Various approaches can be divided into two categories: approaches based on normalizing resource-specific document scores into resource-independent document scores, and approaches based on recalculating document scores at the directory service.The CORI merging algorithm uses a heuristic linear combination of digital library scores and document scores to normalize the scores of the documents from different digital libraries. The intuition is to favor documents from digital libraries with high scores and also to enable high-scoring documents from low-scoring digital libraries to be ranked highly. It is effective when used together with the CORI resource selection and INQUERY document retrieval algorithms in federated search using a single directory service [1].There has been some work on using logistic regression to learn merging models to normalize document scores but relevance judgments are required for training [2].The Semi-Supervised Learning result-merging algorithm uses the documents obtained by query-based sampling as training data to learn score normalizing functions on a query-by-query basis. It is shown to work well with a variety of resource selection and document retrieval algorithms and is the current state-of-the-art for result merging in distributed information retrieval [19]. Document scores can be recalculated at the directory service by downloading all the documents in the retrieval results fromselected resources, indexing them, and re-ranking them using a document retrieval algorithm.Downloading documents is not necessary if all the statistics required for score recalculation can be obtained alternatively. Kirsch’s algorithm [10] requires each resource to provide summary statistics for each of the retrieved documents. It allows very accurate normalized document scores to be determined without the high communication cost of downloading.The corpus statistics required for recalculating document scores could also be substituted by a reference statistics database containing all the relevant statistics for some set of documents. This method is explored in [3] for federated search using a single directory service and shown to be effective compared with using the corpus statistics provided by cooperative digital libraries. 2.2P2P Network ArchitecturesAs mentioned in Section 1, the activities involved for search in peer-to-peer networks include issuing queries, query routing, and retrieval. Query routing is essentially a problem of resource selection and location. Resource location in first generation peer-to-peer networks is characterized by Napster, which used a single logical directory service, and Gnutella 0.4, which used undirected message flooding and a search horizon. The former proved easy to attack, and the latter didn’t scale; both systems demonstrated the importance of robust and reliable methods of locating information in peer-to-peer networks. They also explored very different solutions: Napster was centralized and required cooperation (sharing of accurate information); Gnutella 0.4 was decentralized and required little cooperation.Recent research provides a variety of solutions to the flaws of the Napster and Gnutella 0.4 architectures, but perhaps the most influential are hierarchical and structured P2P architectures. Structured P2P architecture associates each data item with a key and distributes keys among directory services using a Distributed Hash Table (DHT) [17, 18, 21, 22, 28]. Hierarchical P2P architecture [9, 11, 23] uses top-layer directory services to serve regions of bottom-layer digital libraries and directory services work collectively to cover the whole network. The common characteristic of both approaches is the construction of an overlay network to organize the nodes that provide directory services (also called “look up services” by DHT-based approaches) for efficient query routing. An important distinction is that structured P2P networks require the ability to map (via a distributed hash table) from an information need to the identity of the directory service that satisfies the need, whereas hierarchical P2P networks rely on message-passing to locate directory services. Structured P2P networks require digital libraries to cooperatively share descriptions of data items in order to generate keys and construct distributed hash tables. In contrast, hierarchical P2P networks enable directory services to automatically discover the contents of (possibly uncooperative) digital libraries, which is well-matched to networks that are dynamic, heterogeneous, or protective of intellectual property.2.3Text-Based Search in P2P NetworksMost of the prior research on search in peer-to-peer networks only support simple keyword-based search. Matches between query terms and keywords of documents are used to determine how to route queries and which documents to be retrieved. There has been some recent work on developing systems that adopt more sophisticated retrieval models to support text-based search (alsocalled “content-based retrieval”) in peer-to-peer networks.Examples are PlantP using a completed decentralized P2Parchitecture [5], pSearch using a structured P2P architecture [22],and content-based retrieval in hierarchical P2P networks [13]. In PlantP [5], a node uses a TF.IDF algorithm to decide whichnodes to contact for information requests based on the compactsummaries it collects about all other nodes’ inverted indexes.Because no special resources are dedicated to support directoryservices in completely decentralized P2P architectures, it issomewhat inefficient for each node to collect and storeinformation about the contents of all other nodes, especially indynamic P2P networks.pSearch [22] uses the semantic vector (generated by LatentSemantic Indexing) of each document as the key to distribute document index in a structured P2P network so that documents close in distance have similar contents. The relevance of adocument to a query is determined by the similarity between theirsemantic vectors. To compute semantic vectors for documentsand queries, global statistics such as the inverse documentfrequency and the basis of the semantic space need to bedisseminated to each node in the network. Because globalstatistics can only be obtained in completely cooperativeenvironments where each digital library shares its document andcorpus statistics, this approach cannot be easily extended touncooperative and heterogeneous environments.There has been some prior research on content-based resourceselection and document retrieval in hierarchical P2P networks ofdigital libraries [13]. Viewing peer-to-peer networks as aparticular type of distributed information retrieval environment,content-based resource selection is extended to the case ofmultiple directory services in peer-to-peer environments wheredigital libraries cooperatively provide resource descriptions toconnecting directory services. Experimental results demonstratethat content-based resource selection and document retrieval canprovide more accurate and more efficient solutions to federatedsearch in peer-to-peer networks of text-based digital librariescompared with the flooding and keyword-based approaches. The problem of result merging in hierarchical P2P networks of uncooperative and barely-cooperative text-based digital libraries has also been studied in [15]. The Semi-Supervised Learning (SSL) result-merging algorithm is modified and an algorithm Score Estimation with Sample Statistics (SESS) which extends Kirsch’s approach to result merging is proposed. Experimental results show that modified SSL has satisfactory precision for top-ranked merged documents, and SESS is able to provide near optimal performance with a small amount of cooperation from digital libraries.3.TEXT-BASED FEDERATED SEARCH IN HIERARCHICAL P2P NETWORKSThe research described in this paper adopts a hierarchical P2Parchitecture because it provides a more flexible framework toincorporate various solutions to resource selection and resultmerging in both cooperative and uncooperative environments.Following the terminology of prior research, we refer to text-based digital libraries as “leaf” nodes, and directory services as“hub” nodes. Each leaf node is a text database that providesfunctionality to process full text queries by running a documentretrieval algorithm over its index of local document collection and generate responses. Each hub acquires and maintains necessary information about its neighboring hub and leaf nodes and uses it to provide resource selection and result merging services to peer-to-peer networks. In addition to leaf nodes and hubs, there are also nodes representing users with information requests in peer-to-peer networks. They are referred to as “client” nodes. In a hierarchical P2P network, leaf nodes and client nodes can only connect to hubs and hubs connect with each other.Search in peer-to-peer networks relies on message-passing between nodes. A request message (“query”) is generated by a client node and routed from a client node to a hub, from one hub to another, or from a hub to a leaf node. A response message (“queryhit”) is generated by a leaf node and routed back along the query path in reverse direction. Each message in the network has a time-to-live (TTL) field that determines the maximum number of times it can be relayed in the network. The TTL is decreased by 1 each time the message is routed to a node. When the TTL reaches 0, the message is no longer routed.When a client node has an information request, it sends a query message to each of its connecting hubs. A hub that receives the query message uses its resource selection algorithm to rank and select one or more neighboring leaf nodes as well as hubs and routes the query to them if the message’s TTL hasn’t reached 0. A leaf node that receives the query message uses its document retrieval algorithm to generate a relevance ranking of its documents and responds with a queryhit message to include a list of top-ranked documents. Each top-level hub (the hub that connects directly to the client node that issues the request) collects the queryhit messages and uses its result merging algorithm to merge the documents retrieved from multiple leaf nodes into a single, integrated ranked list and returns it to the client node. If the client node issues the request to more than one hub, then it also needs to merge results returned by multiple top-level hubs.Figure 3.1 illustrates federated search of text-based digital libraries in hierarchical P2P networks. The C (white) node is the client node that issues the information request, the H (black) nodes are hubs, and the D (gray) nodes are leaf nodes (digital libraries). The edges between nodes represent connections. The arrows with solid lines indicate the directions to send query messages and the arrows with dashed lines indicate the directions to send queryhit messages.In the following subsections, we present in more details the solutions to the problems of resource representation, resource ranking and selection, and result merging in both cooperative and uncooperative peer-to-peer environments.3.1 Resource RepresentationThe description of a resource is a very compact summary of its content. Compared with a copy of the complete index of a collection of documents, resource description requires much less communication and storage costs but still provides useful information for resource selection algorithms to determine which resources are more likely to contain documents relevant to the query. As mentioned in Section 2.1.2, the resource description used by most resource selection algorithms include a list of terms with corresponding term frequencies (collection language model), and corpus statistics such as the total number of terms and documents provided or covered by the resource. The resource here could be a single leaf node, a hub that covers multiple neighboring leaf nodes, or a “neighborhood” that include all the nodes reachable from a hub. Although resource descriptions for different types of resources have the same format, different methods are required to acquire them, which we introduce below.3.1.1 Resource Descriptions of Leaf NodesResource descriptions of leaf nodes are used by hubs for query routing (“resource selection”) among connecting leaf nodes. In cooperative environments, each leaf node provides accurate resource description to its connecting hubs upon request. In uncooperative environments, each hub conducts query-based sampling independently to obtain sampled documents from its connecting leaf nodes. Sampled documents from a leaf node are used to generate its collection language model. They are also used by the Sample-Resample method to estimate the total number of documents in this leaf node’s collection.3.1.2 Resource Descriptions of HubsThe resource description of a hub is the aggregation of the resource descriptions of its connecting leaf nodes. Since hubs work collaboratively in hierarchical P2P networks, neighboring hubs can exchange with each other their aggregate resource descriptions. However, because the aggregate resource descriptions of hubs only have information for nodes within 1 hop, if they are directly used by a hub to decide which neighboring hubs to route query messages to, the routing would not be effective when the nodes with relevant documents sit beyond this “horizon”. Thus for effective hub selection, a hub must has information about what contents can be reached if the query message it routes to a neighboring hub may further travel multiple hops. This kind of information is referred to as the resource description of a neighborhood and is introduced in the following subsection.3.1.3 Resource Descriptions of NeighborhoodsA neighborhood of a hub H i in the direction of its neighboring hub H j is a set of hubs that can be reached by following the path from H i to H j . Figure 3.2 illustrates the concept of neighborhood. Hub H 1 has three neighboring hubs H 2, H 3 and H 4. Thus it has three neighborhoods marked by N 1,2, N 1,3 and N 1,4. The resource description of a neighborhood provides information about the contents covered by all the hubs in this neighborhood. A hub uses resource descriptions of neighborhoods to select and route queries to its neighboring hubs.Resource descriptions of neighborhoods provide similar functionality as routing indices [4]. An entry in a routing index records the number of documents that may be found along a path for a set of topics. The key difference between resourceFigure 3.1 Federated search in hierarchical P2P networks.D 9 D 8D 4 D 56descriptions of neighborhoods and routing indices is that resource descriptions of neighborhoods represent contents with unigram language models (terms with their frequencies). Thus by using resource descriptions of neighborhoods, there is no need for hubs and leaf nodes to cluster their documents into a set of topics and it is not necessary to restrict queries to topic keywords.Similar as exponentially aggregated routing indices [4], a hub calculates the resource description of a neighborhood by aggregating the resource descriptions of all the hubs in the neighborhood decayed exponentially according to the number of hops. For example, in the resource description of a neighborhood N i,j (the neighborhood of H i in the direction of H j ), a term t ’s exponentially aggregated term frequency is calculated as:}/),({]1),([,−∈∑k i ji k H H numhops N H kFHt tf (1)where tf (t , H k ) is t ’s term frequency in the resource description ofhub H k , and F is the average number of hub neighbors each hub has in the network.The exponentially aggregated total number of documents in a neighborhood is calculated as:}/)({]1),([,−∈∑k i ji k H H numhops N H kF Hnumdocs (2)The creation of resource descriptions of neighborhoods requires several iterations at each hub and different hubs can run the creation process asynchronously. A hub H i in each iteration calculates and sends to its hub neighbor H j the resource description of neighborhood N j,i denoted by ND j,i by aggregating its hub description HD i and the most recent resource descriptions of neighborhoods it receives from all of its neighboring hubs excluding H j . ND j,i is calculated as: ∑∈+=ji k H H hbors directneig H k i i i j F ND HD ND \)(,,}/{ (3)The stopping condition could be either the number of iterations reaching a predefined limit, or the difference in resource descriptions between adjacent iterations being small enough. The process of maintaining and updating resource descriptions of neighborhoods is identical to the process used for creating them. The resource descriptions of neighborhoods could be updated when the difference between the old and the new value is significant, or periodically, or when a node disconnects from the network.For networks that have cycles, frequencies of some terms and the number of documents may be overcounted, which will affect the accuracies of resource descriptions. How to deal with cycles in peer-to-peer networks using routing indices is discussed in detailin [4]. We could use the same solutions described in [4] for cycle avoidance or cycle detection and recovery. For simplicity, in this paper, we take the “no-op” solution, which completely ignores cycles. Experimental results show that resource selection using resource descriptions of neighborhoods generated in networks with cycles is still quite efficient and accurate.3.2 Resource Ranking and SelectionThe goal of query routing is to direct the information request to those nodes that are most likely to contain relevant documents with minimum number of query messages. The flooding technique guarantees to reach nodes with relevant information contents but requires exponential number of query messages. Random forwarding the request to a small subset of neighbors can significantly reduce the number of query messages but the reached nodes may not be relevant at all. To achieve both efficiency and accuracy, each hub needs to rank its neighboring leaf nodes by their likelihood to satisfy the information request and neighboring hubs by their likelihood to reach nodes with relevant information contents and only forwards the request to top-ranked neighbors. Because the resource descriptions of leaf nodes and those of neighborhoods are not in the same magnitude, a hub handles separately the ranking and selection of its neighboring leaf nodes and hubs.3.2.1 Leaf Node RankingAdapting language modeling approaches for ad-hoc information retrieval, we use the Kullback-Leibler (K-L) divergence-based method [24] for leaf node ranking. In the language modeling framework, the K-L divergence resource selection algorithm calculates P (L i | Q ), the conditional probability of predicting the collection of leaf node L i given the query Q and uses it to rank different leaf nodes. P (L i | Q ) is calculated as follows:)|()()()|()|(ii i i L Q P Q P L P L Q P Q L P ∝×= (4)with uniform prior probability for leaf nodes;∏∈+×+=Qq i i iL numterms G q P L q tf L Q P µµ)()|(),()|( (5) where tf (q | L i ) is the term frequency of query term q in leaf nodeL i ’s resource description (collection language model), P (q | G ) is the background language model used for smoothing and µ is the smoothing parameter in Dirichlet smoothing.3.2.2 Leaf Node Selection with Unsupervised Threshold LearningAfter leaf nodes are ranked based on their P (L i | Q ) values, the usual approach is to select the top-ranked leaf nodes up to a predetermined number. In hierarchical P2P networks, the number of leaf nodes served by individual hubs may be quite different, and different hubs may cover different content areas. In this case, it is not appropriate to use a static, query-independent and hub-independent number as threshold for a hub to decide how many leaf nodes to select for a given query. It is desirable that hubs have the ability to learn hub-specific and query type-specific thresholds automatically.The problem of learning threshold to convert relevance ranking scores into a binary decision has mostly been studied in information filtering [25, 26, 27]. However, the user relevanceFigure 3.2 Neighborhoods in hierarchical P2P networks.1,3。

相关文档
最新文档