Disk-Based Container Objects

合集下载

相关主题

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Disk-Based Container Objects

Tom Nelson

A container that's very large, or that must persist between programs, really needs to live on disk.

C++ container class libraries have become a staple item in many programmers' toolkits. The introduction of templates has made these libraries notably robust and simple to use. However, their transient (memory-based) nature still imposes certain restrictions on their use. First, they cannot grow to arbitrary size if needed; second, they lack persistence, hence they disappear when the program shuts down.

The second restriction is usually easier to cure than the first. To make a transient container "hold water," the container class could use a persistent streams facility to write the essential data members of each contained object to a disk file before shutting down. The next program invocation need only initialize another transient container, then sequentially extract each object from the persistent store and add it to the new container.

In some cases, though, you can't guarantee that your run-time storage requirements won't overflow available memory. A priority or scheduling queue, for example, might need to process and store an unanticipated quantity of incoming data. You could write some objects to disk to free space in the container. However, to effectively process additions and deletions, a transient container normally requires the presence of all contained objects in memory at one time. When you write contained objects to a file, the logical structure of the container (pointers, state info, etc.) is lost. One solution consists of moving the entire container out to a disk file so that it becomes, in effect, a database, or "Containerized Database."

This article will demonstrate techniques for building and using such disk-based container classes. They allow you to employ container objects of virtually any size regardless of memory constraints, which you can maintain indefinitely. Even when persistence is not necessary, the technique still permits arbitrary growth of the containers by storing overflow in temporary files, restricted only by available disk space. I will concentrate here on developing disk-based implementations for three fundamental structure types (lists, vectors, and binary trees). I discuss a few abstract types derived from them, and provide an example of a disk-based binary tree sort.

Design and Performance Considerations

The public interface and behavior of member functions in a disk- based container class are nearly identical to a comparable transient container, making its use almost transparent to the programmer. There are, of course, a few instances where you need to be aware of its disk-based nature, such as supplying a container filename to a class constructor (if you need persistence, that is). Other considerations important for transient containers — for instance, specifying an upper limit for the size of a vector — usually drop out of the picture entirely. Disk-based containers are essentially open ended.

I used templates to implement most aspects of the container classes provided here. While templates introduce added complexity for the uninitiated, their use greatly simplifies coding for direct containers types. Disk-based containers are implemented as direct containers, since you will need to store actual objects. Using indirect containers, which maintain (void *) pointers to other objects in memory, would introduce unnecessary complexity. When using templates, you will encounter other important considerations such as increased code size and compilation times. Each template instance will have its own private copy of nearly identical code that operates on different data types.

Due to their disk-based nature, these containers also impose a variable degree of speed penalty. Instead of memory addresses, disk-based containers use file offsets, most commonly a long or unsigned long integer value. To process additions and deletions to the container file (primarily for list containers), one or more nodes must be read into memory, the pointers adjusted, and the updated nodes written out again.

Container File Access

A few factors make this procedure less cumbersome and disk- intensive than it first appears. Contemporary operating systems have a system-wide disk cache that stores recently used disk sectors in memory. The operating system inspects cache memory before initiating a physical disk access. A disk-based container can also employ a private cache that stores recently used objects in memory. You may find this important when working in a multitasking environment. The system cache may encounter heavy use at times from other programs currently running.

Class DirectFile (in dfile.h, available online) defines a direct file management scheme. Member functions access container file contents directly from the disk via the system cache. You will probably find that most disk-based containers can make effective use of this simpler method of access with little performance penalty. For other situations, such as those noted earlier, disk-based containers can also use class CachedFile (in cfile.h and cfile.cpp). CachedFile implements a most-recently-used object cache independent of the system cache. (For more info on disk caching, see my article "Memory Caching for Disk-Based Objects," CUJ, October 1996, p. 59.) The public interfaces for both DirectFile and CachedFile are identical, which makes access to objects of either class transparent to higher-level classes. I also padded the argument list for constructor DirectFile to