基于元数据的文件系统管理工具的设计与实现

相关主题

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

密级:

硕士学位论文

基于元数据的文件系统管理工具的设计与实现

作者姓名：陈德清

指导教师: 石京燕副研究员

中国科学院高能物理研究所

学位类别: 工程硕士

学科专业: 计算机技术

培养单位: 中国科学院高能物理研究所

2014年4月

Design and Implementation of File System Management Tool Based on Metadata

Deqing Chen

A Dissertation Submitted to

University of Chinese Academy of Sciences

In partial fulfillment of the requirement

For the degree of

Master of Computer Technology

Institute of High Energy Physics,

Chinese Academy of Sciences

April, 2014

研究生学位论文声明

本人郑重声明：所呈交的学位论文，是本人在导师指导下独立进行研究工作所取得的成果，除文中已经注明引用的内容外，本学位论文的研究成果不包含任何他人享有著作权的内容。对本论文所涉及的研究工作做出贡献的其他个人和集体，均已在文中以明确方式标明。

签名：_____________日期：_____________ 关于学位论文使用授权的说明本人完全了解中国科学院高能物理研究所“关于中国科学院高能物理所研究生论文及研究成果使用权的规定”（2001）高发研生字第315号文件，即：高能物理研究所拥有在著作权法规定范围内学位论文的使用权，其中包括：（1）已获学位的研究生必须按规定提交学位论文，高能物理研究所可以采用影印、缩印或其他复制手段保存研究生上交的学位论文；（2）为教学和科研目的，高能物理研究所可以将公开的学位论文作为资料在图书馆、资料室等场所供科研人员阅读，或在所内网站供科研人员浏览部分内容；（3）根据《中华人民共和国学位条例暂行实施办法》，向国家图书馆等相关部门报送可以公开的学位论文。

签名：_____________日期：_____________

摘要

在数据密集型计算的集群系统中往往存放着海量的用户数据文件。庞大的文件数量使管理员难以通过传统Linux系统命令高效地获取文件系统的全面信息，从而文件管理、系统监视、数据备份等功能无法顺利进行。

目前虽然存在一些工具软件提供文件系统元数据信息的获取功能，但对于海量的文件数据，其获取效率往往不高，而且所获取的信息由于组织不当而无法共享；另外，在文件系统元数据信息展示方面，这些工具软件往往以字符的形式输出结果，不够直观。

本文通过对开源软件Robinhood的改进，实现了一个高效的基于元数据的文件系统管理工具，采用分布式并行遍历文件系统目录树的方式，快速获取海量文件的元数据信息，并将信息保存到数据库中。存储着文件系统元数据信息的MySQL 数据库，通过改进版Robinhood实现的标准接口，对外提供快速的数据库信息获取服务。本文遵照标准接口开发了一系列的应用系统：管理系统、监视系统和备份系统。其中，管理系统通过调用改进版Robinhood实现的标准接口,获取数据库中保存的文件系统信息，并结合用户自定义的策略实现对文件系统的细粒度管理；监视系统采用WEB技术，以图表形式展示文件系统状态，为用户提供方便直观的监视界面；备份系统基于开源软件Amanda，通过从数据库获取待备份文件的列表，实现对文件系统的快速实时备份。

实验测试表明，采用分布式并行处理方式遍历海量文件系统的效率相比于传统的遍历方式有了很大提高，极大地缩短了扫描时间；同时，基于数据库的应用系统为用户提供了更为便捷的功能，提高了文件系统的管理效率。

关键词：元数据，Robinhood，分布式计算，Torque作业管理系统，MySQL 数据库

Abstract

The data-intensive computing cluster system stores vast amounts of user data files. It is difficult for the administrators to obtain comprehensive information of file system efficiently by executing traditional Linux system commands. Some functions like document management, system monitoring and data backup can’t work smoothly due to less of the complete file information.

Although there are some tools providing the functions of file system metadata information collection, the collection efficiency would be dropped down when it deals with big file system which stores massive file data. Besides, the information obtained by the file system can’t be shared with other tools due to their inefficient structure construction. In addition, the tools can’t provide an intuitive metadata presentation for managers and users.

This paper implemented a high efficient tool based on an open-source software, Robinhood, to collect metadata of file system. Metadata information of massive files is obtained by distributed traversal file tree and collected information is stored into database. The tool provides a series of standard interfaces to access database fast. A series of functions are developed: file management, monitoring and data backup. The standard APIs are called by the file management function to get the metadata from database. Combined with dedicate policy given by users, it could provide fine-grain file management. Monitoring function could give rich table and reports of the files system via web page. Supported by the tool, the backup system, originated from another open-source, Amanda, could get complete backup file list and backup data in time.

The test results shows that the efficiency of scanning massive file system by the tool had been significantly improved comparing to the traditional ways, and the scan time is greatly reduced as well. Meanwhile, the functions provided by the tool could give a more convenient and friendly user interface which promotes the file system management efficiency.

Keywords: Metadata, Robinhood, Distributed Computing, Torque Job Management System, MySQL Database