指导教师: 石京燕副研究员


学位类别: 工程硕士

学科专业: 计算机技术

培养单位: 中国科学院高能物理研究所


Design and Implementation of File System Management Tool Based on Metadata


Deqing Chen

A Dissertation Submitted to

University of Chinese Academy of Sciences

In partial fulfillment of the requirement

For the degree of

Master of Computer Technology

Institute of High Energy Physics,

Chinese Academy of Sciences

April, 2014



本文通过对开源软件Robinhood的改进,实现了一个高效的基于元数据的文件系统管理工具,采用分布式并行遍历文件系统目录树的方式,快速获取海量文件的元数据信息,并将信息保存到数据库中。存储着文件系统元数据信息的MySQL 数据库,通过改进版Robinhood实现的标准接口,对外提供快速的数据库信息获取服务。本文遵照标准接口开发了一系列的应用系统:管理系统、监视系统和备份系统。其中,管理系统通过调用改进版Robinhood实现的标准接口,获取数据库中保存的文件系统信息,并结合用户自定义的策略实现对文件系统的细粒度管理;监视系统采用WEB技术,以图表形式展示文件系统状态,为用户提供方便直观的监视界面;备份系统基于开源软件Amanda,通过从数据库获取待备份文件的列表,实现对文件系统的快速实时备份。


关键词:元数据,Robinhood,分布式计算,Torque作业管理系统,MySQL 数据库


The data-intensive computing cluster system stores vast amounts of user data files. It is difficult for the administrators to obtain comprehensive information of file system efficiently by executing traditional Linux system commands. Some functions like document management, system monitoring and data backup can’t work smoothly due to less of the complete file information.

Although there are some tools providing the functions of file system metadata information collection, the collection efficiency would be dropped down when it deals with big file system which stores massive file data. Besides, the information obtained by the file system can’t be shared with other tools due to their inefficient structure construction. In addition, the tools can’t provide an intuitive metadata presentation for managers and users.

This paper implemented a high efficient tool based on an open-source software, Robinhood, to collect metadata of file system. Metadata information of massive files is obtained by distributed traversal file tree and collected information is stored into database. The tool provides a series of standard interfaces to access database fast. A series of functions are developed: file management, monitoring and data backup. The standard APIs are called by the file management function to get the metadata from database. Combined with dedicate policy given by users, it could provide fine-grain file management. Monitoring function could give rich table and reports of the files system via web page. Supported by the tool, the backup system, originated from another open-source, Amanda, could get complete backup file list and backup data in time.

The test results shows that the efficiency of scanning massive file system by the tool had been significantly improved comparing to the traditional ways, and the scan time is greatly reduced as well. Meanwhile, the functions provided by the tool could give a more convenient and friendly user interface which promotes the file system management efficiency.

Keywords: Metadata, Robinhood, Distributed Computing, Torque Job Management System, MySQL Database
