1 Julep A Framework for Reliable Distributed Computing in Java
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Julep: A Framework for Reliable Distributed Computing in Java
Lawrence R. Klos Golden G. Richard III Zhidong Xu
Department of Computer Science
University of New Orleans
New Orleans, LA 70148
Abstract
Julep is an object-oriented testbed designed for analysis and comparison of temporal diversity fault tolerance mechanisms. It is written in Java, and runs as a layer underneath a distributed application. Julep can run on any standard COTS platform with a JVM, in homogeneous or heterogeneous environments. Julep is designed to quickly and easily incorporate new process recovery mechanisms, allowing accurate comparison between mechanisms for specific applications on specific hardware platforms. Julep’s central Manager acts as a task location and status lookup service.A novel aspect of Julep is the implementation of ‘unbreakable’ communication channels.Julep is flexible regarding its use of fault tolerant mechanisms. It can be used as a testbed to compare performance between implemented recovery mechanisms, as a framework within which new recovery mechanisms can be implemented and tested, or as an infrastructure to add fault tolerance to an existing distributed application.
1.1 Introduction
Distributed systems are being implemented and relied upon to an ever larger degree in current day computing. Critical distributed applications such as air traffic control systems, hospital patient monitoring systems and missile control systems are permeating every aspect of modern day life. The recent popularity of the Internet has only fueled the growth of distributed systems, with consumer demand driving the creation of new applications such as on-line banking and stock trading.
As distributed applications become more pervasive, the need for such systems to be fault tolerant grows. Fault tolerance for distributed systems has been under study for over twenty years. Recent research[14] has categorized software fault tolerance (SWFT) strategies into four areas, one of which is known as temporal diversity. Temporal diversity encompasses a range of strategies all designed to save the state of a process, allowing for future recovery of the process state. Many mechanisms have been created to provide for process recovery in distributed systems. In fact, survey papers have been written to classify and theoretically compare these techniques, but a system for runtime comparison of techniques would be helpful.
Julep is a testbed for the experimental evaluation of distributed process recovery mechanisms. Because Julep is written in Java, it can run on a variety of standard COTS hardware platforms. The goal of Julep is to allow actual runtime comparison of process recovery techniques. Julep’s message passing component is made robust through the implementation of ‘unbreakable’ channels for object streams. If a process receiving a message fails and restarts on a new machine, the channel is dynamically rerouted to the receiver’s new address/port, allowing the message transfer to complete successfully. These channels are implemented using the UDP protocol, with an added protocol layer to allow any desired size of object to be transferred, and to guarantee reliable object transfer. This paper will briefly review the issues involved in process recovery techniques, outline Julep’s current implementation relative to those issues, and describe future areas of research for the project.
1.2 Distributed Systems
A distributed system is composed of a set of autonomous computing systems, usually linked together by a network, that cooperate and coordinate their actions in order to perform a task. The distributed system model Julep was designed for has no physical shared memory and no shared global clock. The individual processes that compose the system have access only to imperfectly synchronized local clocks. Communication between the processes consists of messages passed over the network. We further assume