Fault-Tolerant Storage and Quorum Systems for Dynamic Environments M.Sc. Thesis
Method and system for fault-tolerant transfer of f
专利名称:Method and system for fault-toleranttransfer of files across a network发明人:Michael Frederick Kenrich,Hal S.Hildebrand,Senthilvasan Supramaniam申请号:US10642041申请日:20030815公开号:US07555558B1公开日:20090630专利内容由知识产权出版社提供专利附图:摘要:Improved techniques for transferring files through a multi-tier computing environment are disclosed. The transfer of files across the multiple tiers of thecomputing environment can use staging at intermediate tiers to facilitate the file transfer. Each tier can include at least one computing machine that includes a file transfer manager. The file transfer managers at the computing machines in each of the multiple tiers serve to effectuate the file transfer through the multi-tier computing environment. In one embodiment, the multi-tier computing environment is a multi-tier file security system and the files being transferred are audit files.申请人:Michael Frederick Kenrich,Hal S. Hildebrand,Senthilvasan Supramaniam地址:335 Beach Dr. Aptos CA 95003 US,655 Sierra St. Moss Beach CA 94038 US,1487 Ormsby Dr. Sunnyvale CA 94087 US国籍:US,US,US代理机构:Sterne, Kessler, Goldstein & Fox PLLC更多信息请下载全文后查看。
国家石油和天然气有限公司(National Oilwell Varco)产品说明书
• Centralized datacenter environment hosts data, monitoring, analysis, support and report applications • Virtualization technology allows expandable software and extendable hardware independently • High availability, redundancy and disaster recovery of critical operations • Data provider for efficient capture, storage and retrieval of information • Software lifecycle management• Extensive system monitoring and administration •System hardening for emerging cybersecurity threatsRISE armrest design includes new screens, configurable buttons, encoder wheels, and an improved joystick that all together optimize the human-machine interaction during the different drilling processes. Additionally, the armrests include retractable cup holders.Our new joystick design includes tactile feedback to enhance the operator’s situational awareness and ultimately improve efficiency and safety.Our RISE workstation supports storing and restoring users’ ergonomic profiles, allowing the workstation to self-adjust to specific operator’s body measurements.The RISE workstation is the premium plaform for CYBERBASE systems.An operator’s physical well-being during drillingoperations significantly impacts rig crew performance. Hazards of sitting for extended periods of time includeincreased discomfort and increased risk of back injuries. In order to improve the driller’s well-beingduring operation, RISE provides a posture change ranging from any position betweensitting and standing.CYBERBASE ™ v14: The flexible, upgradable and reliable drilling control systemRISE ™: elevating the industry standardCYBERBASE™ is the original integrated drilling control system built for seamless integration of machine control, process management and information sharing. True to the original design, the CYBERBASE system works with any machine and complimentary system while maintaining optimal process safety and cybersecurity.CYBERBASE v14 bridges the challenge of aging technology through applying purposeful innovation:Drilling system data centerThe first in the industry data center solution to fully decouple software from hardware creating a truly future oriented system. A data center is inherently fault-tolerant by dynamic load sharing and allowing hot-swapping of hardware without interfering with the drilling operations. As a bi-product, the data center allows more computing and data storage capacity hence giving the end user faster response time, better information and cheaper upgrades. Solutions such as the drilling data, analysis and report functions, system health monitoring, eHawk™ remote support, digital CCTV solutions, Webdriller, NOV Alarm Statistics, Sdi.Connect etc. are co-hosted in data center as virtual machines.***********The CYBERBASE front-end stations use high reliability fanless smart clients. Networks are upgraded to improve capacity, reliability, scalability and maintainability.Capturing, storing and providing dataThe CYBERBASE system has many stakeholders consuming information such as drilling and machine data, manuals, configuration files and software management history.CYBERBASE solves the challenges associated with explosive growth in data and increasingly complex storage environments.CYBERBASE GUIsSoftware lifecycle managementTool controllers 1, 2 and 3Tool controllers 4, 5 and 6Drilling Control NetworkDrilling Control Network Prepared for 3rd PartyInterface BOPCement System Vessel ManagementOther SystemsDrilling Control InterfaceRISE Operator Station (A)RISE Operator Station (n)CCTVstreaming & soundService TerminalOffice WorkstationsRouterData CenterPrepared for Virtual MachineseHawkHuman Machine Interface CYBERBASE Data Provider Drilling Flight Recorder Alarm Statistics WebdrillerCYBERBASE System Monitoring RIG Sentry (Condition Monitoring)Torque Turn Drilllink Sdi ConnectKick Monitoring System Daily Drilling ReportPrepared for eHawk and onshore supportPrepared for Drilling Flight Recorder remote workstationCybersecurity hardeningCybersecurity is important and remains a high-focus for CYBERBASE by implementing defense-in-depth strategy.• Strict remote-access control through a DMZ and antivirus software protection• Managed USB ports, segregated network and firewalls help maintaining the air-gap • Application whitelisting prevents unintended applications from running•Extended system surveillance including deep packet inspection and network intrusion detectionNOV introduces RISE, the only driller’s workstation designed from the ground up to address ergonomic concerns, minimize fatigue, and keep your crew in top form with its industry-leading customizable operating positions. The development of RISE grew from years of ergonomic research and theory to offer your cabin crews the opportunity to work with better posture, less stress, and more focus on the job at hand. Our RISE workstation brings unprecedented chair adjustment options to custom-fit the operator’s body and provide optimal comfort.Key Features:• Sitting and standing operation• Unprecedented adjustments for optimal ergonomics • User storable multi-angle configurable positions •New 2-button joystick with operator feedbackRISE supports the following operator adjustments to maximize operator’s comfort:• Backrest incline/recline • Lumbar support • Knee support • Grip length • Elbow support• Armrest pillow slide • HMI slide & tilt • Swivel。
Fault-Tolerant Systems 11.1. What is a Fault
11Fault-Tolerant Systems11.1. What is a Fault ?Any action that does not conform to the given specification of a system is viewed as a fault. Historically, models of failures have been linked with the users level of interaction with a system. A VLSI designer may focus on stuck-at-0 and stuck-at-1 faults only, where the output of a gate is permanently stuck to either a 0 or a 1 regardless of input variations. A system level hardware designer, on the other hand, may be ready to view a failure as any arbitrary or erroneous behavior of a module as a whole. A drop in the power supply voltage, or radio interferences due to a lightning, or a cosmic shower, often causes transient failures that may temporarily perturb the system state without causing any permanent damage to the system. Finally, even if hardware does not fail, software may fail due to improper or unexpected changes in the specifications of the system. Before any discussion about how to tolerate such faults, it is important to present a proper characterization of the various kinds of faults that can occur in a system.11.2. Classification of FaultsOur view of a distributed system is essentially a process-level view, so we begin with the description of some important types of failures that are visible at the process level. Note that each type of failure at any level may be caused by a failure at some lower level of abstraction. Thus, a process may cease to produce an output when a wire in the circuit breaks. A complete characterization of the relationship between faults at different levels is beyond the scope of our discussion. Our classification of failures is as follows:Halting Failure. When a process that is expected to produce one or more messages, or change the values of some process variables, ceases to do so on a permanent basis, a halting failure occurs. Note that this is an irreversible change. Halting failures are also known as crash failures.In a variation of this model, halting failures are treated as reversible, i.e. a process may play dead for a finite period of time, and then resume operation. This includes thecase in which the faulty process is repaired and restarted after some time. Such failures are called napping failuresByzantine Failure. Byzantine failures correspond to completely arbitrary failure patterns, and is the weakest of all the failure models. As an example, let N.i denote the set of neighbors of a process i. Assume that i is expected to send a value x to every process in N.i. If process i does not send the intended value x to each of its neighbors, then the failure is called a byzantine failure. The following are some examples of inconsistent behaviors possibly caused by byzantine failure:¥ Two distinct neighbors j and k receive values x and y, where x ≠ y.¥ Every neighbor receives a value z where z ≠ x.¥ One or more neighbors do not receive any value from process i.Some of the possible causes of byzantine failures are¥ The total or partial breakdown of a link joining i with one of its neighbors ¥ Software problems in process i¥ Hardware synchronization problems - assume that every neighbor is connected to the same bus, and trying to read the same copy sent out by i, but since the clocksare not synchronized, they may not read the value x exactly at the same time. If x is a time-sensitive variable, then different neighbors of i may receive differentvalues from process i.¥ Malicious action by process i.Transient Failure. Certain types of fault actions have temporary effects on the global state of a system. Such failures perturb the global state in an arbitrary way, but the effect of the agent inducing this failure is not perceived thereafter. This is called a transient failure.A special kind of transient failure applicable to message-passing models only is the omission failure. Consider a transmitter process sending a sequence of messages to a receiver process. If the receiver does not receive some of the messages sent by the transmitter, then it is an omission failure. In real life, this can be caused either by transmitter malfunction, or by the loss of messages in transit.Software Failure. Software does not fade or erode with time. Assume that a system running under the control of a program S is producing intended results. If the system suddenly fails to do so even if there is no hardware failure, there is a problem withspecifications. If {P}S{Q} is a triple in programming logic, and the precondition P is inadvertently weakened, then there is no guarantee that the postcondition Q will always hold!The situation can be explained using the example of a pop machine. This pop machine delivers a can of pop, when a user inserts 50cents into the machine. The machine is designed to accept quarters and dimes only. If an uninformed user tries to buy a can of pop with ten nickels, then the machine will fail to deliver the pop, since the machine is not designed to accept nickels! This malfunction may be viewed as a software failure.Temporal Failure. Real time systems require actions to be completed within a specific amount of time. When this deadline is not met, a temporal failure occurs.11.3. Specification of FaultsWe present here a general model for specifying an arbitrary type of failure. This model was proposed by Arora and Gouda in [AG93]. A system description consists of (i) a set of specified actions S representing the fault-free system, and (ii) a set of fault actions F. The actions of F can be expressed using notations similar to those used in S.The faulty system consists of the union of all the actions in both S and F, and will be denoted by S F. An example of such a specification follows.Assume that a system, in absence of any fault, sends out the message "a" infinitely often (i.e. the output is an infinite sequence aaaaa...). However, a failure "occasionally" causes a message to change from "a" to "b". This description can be translated to the following specification1 :define x : booleana , b: messageinitially x = true;S::do x → send a odF::do true → send b odWith a weakly fair scheduler, the difference between the behaviors of S and S F becomes perceptible to the outside world.1 This specification is not unique. Many other specifications are possible.A halting failure of S can be represented using the following specification for F:F :: do true →x := false odAfter this fault action is executed, the system ceases to produce an output -- a condition that cannot be reversed using the actions in S or F.Now, consider a system that receives a message msg, and forwards it to each of its N neighbors {0, 1, 2, ..., N-1}. This can be specified byS::initially j = 0, flag = falsedo¬flag ∧ msg = a→x := a; flag := true(j < N) ∧ flag→send x to j; j := j+1j = N→j := 0; flag := falseodThe following fault action on the above system specifies a form of byzantine failure, since it can cause the process to send a to some neighbors, and send b to some others.F::do flag→x := b od{b ≠ a}Under the broad class of byzantine failures, specific fault behaviors can be modeled using appropriate specifications. Consider the following example. Here, the fault-free system executes a non-terminating program that sends out the integer sequence 0,1,2,0,1,2 .... Once the fault actions are executed, the faulty system changes every third integer from 2 to 9.define k : integer; {k is the body of a message}x : boolean;initially k = 0; x = true;S::do k < 2 →send k; k := k+1x ∧ (k = 2)→send k; k := k+1k ≥ 3 →k := 0;odF::do x→¬ x¬ x ∧ (k = 2) →send 9; k:= k+1odNo separate specification is necessary for software failures -- it is adequate to explicitly write down the preconditions and the postconditions of the fault-free system, and observe that these hold for the application at hand.Finally, temporal failures are detected using a special predicate "timeout", which becomes true when an event does not take place within a predefined deadline. An example is given below: Consider a process i broadcasting a message every 60 seconds to all of its neighbors. Assume that the message propagation delay is negligibly small. If process j does not receive any message from process i within 62 seconds (i.e., it keeps a small allowance before passing a verdict), it permanently sets a boolean flag f.i. indicating that process i has undergone a termporal failure. This can be specified asdefine f.i: booleaninitially f.i = falseS::¬ f.i ∧ message received from process i → skipF::timeout (i,j)→ f.i := trueHere, the truth of the predicate timeout (i,j) implies that the deadline has elapsed on the arrival of the message from j to i. It is quite possible that process i did not undergo a halting failure, but slowed down due to unknown reasons. The exact mechanism for asserting the predicate timeout (i,j) is as follows: Process j has a local variable called timer that is initialized to the value of the deadline T. After this, timer is decremented with every tick of the local clock. If (timer = 0), then the predicate timeout is asserted, otherwise, after the event occurs, timer is reset to T and the next countdown begins.Note that the correct use of timeout is based on the existence of synchronized clocks. If the local clocks of the processes i and j are not synchronized (at least approximately), then they can drift arbitrarily -- as a result, there will no correlation between i's measure of 60 seconds and j's measure of 70 seconds. In an extreme case, j's 62 seconds can be less than i's 60 seconds, so that even if process i sends a message every 60 seconds, process j will timeout and set f.i.11.4. Fault-Tolerant SystemsThe first step in designing a fault-tolerant system is to understand what is meant by tolerating a fault. There are three different concepts in the area of fault-tolerance:¥ Fault Masking¥ Fault Recovery¥ Graceful DegradationFault MaskingIn this case, the occurrence of faults does not have any visible effect in the eyes of an external observer. Let {P}S{Q} be a computation, and F represent the fault-actions.Then the system masks the actions of F iff wp(S F , Q) = P', and P ⇒ P'.Fault RecoveryEvery fault-tolerant system cannot mask failures. In such a cse, the faulty behavior will be visible in the eyes of an external observer. An important issue in such cases is the duration of the fault actions and the faulty behavior. Let S be a computation that satisfies the triple {P}S{Q}, and F denote the fault actions. Also, let R be a predicate representing the "weakest postcondition" in presence of failures. This implies {P}S F {R}, and intuitively R is the "worst-case result" produced by the faulty system. If R ⊆ Q , then the system is able to mask the actions of F . Otherwise, the fault is not of the masking type. However, if the failure is transient and the actions of F are no longer enabled following the corruption of the global state, then in some cases the system eventually recovers, and satisfies the postcondition Q . This is possible when R ⇒wp(S,Q). Fig. 11.1. illustrates the situation.timecompleteshere completeshereFig. 11.1. An illustration of fault recoverySystems that (i) guarantee recovery when started from an arbitrary initial state (i.e. P = true) and (ii) maintain the desired postcondition, are known as self-stabilizing systems.Graceful DegradationMany systems can neither mask, nor fully recover from the effect of failures. However, some of them exhibit a degraded behavior that falls short of the normal behavior, but is still "acceptable." The notion of acceptability is highly subjective, and is entirely dependent on the user running the application. Some examples of degraded behavior are as follows:1. Consider a taxi booth where customers call to order a taxi. Under normal conditions,(i) each customer ordering a taxi must eventually get it, and (ii) these requests must be serviced in the order in which they are received at the booth. In case of a failure, a degraded behavior which may be acceptable corresponds to the case when only condition(i) is satisfied.2. While routing a message between two points in a network, a program computes the shortest path. In the presence of a failure, if this program returns another path which is not the shortest path but one that is "marginally longer" than the shortest one, then this may be considered acceptable.3. A pop machine returns a can of soda when a customer inserts 50cents in quarters, dimes, or nickels. After a failure, if the machine refuses to accept dimes and nickels, but returns a can of soda only if the customer deposits two quarters, then it may be considered acceptable.Detection of FailuresThe implementation of fault-tolerance of any type requires a mechanism for detecting failures. This in turn depends on specific assumptions about the degree of synchronization like the existence of synchronized clocks, lower bound on the processor speed, or upper bound on message propagation delays, as described in Section 2.1.3. The transition from a fully synchronous to a fully asynchronous system is a gradual one, and it is possible to deal with a system that is only partially synchronous. The implementation of fault-tolerance depends on the degree of synchronization.As the first example, let us consider whether a halting failure can be detected in a message passing system. Without any assumption about synchronized clocks, or an upper bound of message propagation delays, or a lower bound of process execution speeds, it is impossible to detect a halting failure, because it is not feasible to distinguish between a crashed process, and a healthy process which is executing actions "very slowly." However, in a fully synchronous system, timeout can be used to detect halting failures.As another example, consider how omission failures can be detected. The problem is as hard as the detection of halting failures, unless the channels are FIFO. With a FIFO channel, a sender process i can attach a sequence number seq with every message m as described below:do true →send <seq, m>;seq := seq + 1odWhenever a receiver process j receives two consecutive messages whose sequence numbers are m and n, and n ≠ m+1, it suspects an omission failure.The detection of byzantine failures also requires a fully synchronous system. Several protocols for masking inconsistencies caused by byzantine failures are available in the published literature -- these are known as byzantine agreement protocols.In the following sections, we will present specific examples of implementing fault-tolerance with halting and omission failures. Byzantine agreement and self-stabilizing systems will be presented in subsequent chapters.11.5. Halting FailuresNext we present examples of two widely used methods for masking the effect of halting failures.Triple Modular redundancyIn synchronous systems, a widely used method of masking the effects of halting failures is the use of triple modular redundancy. Consider a process B that receives an input value x from a process A, computes the function y = f(x), and sends it to a third process C. If B fails by stopping, then C does not receive any value of y.Fig 11.2. Implementing fault-tolerance using Triple Modular RedundancyTo mask the effect of B's failure, process B is replaced by three processes B0, B1, and B2 as shown in Fig. 10.1. Even if one of these three processes undergoes a halting failure, process C still receives the correct value of f(x), as long as it computes the majority of the three incoming values. Note that when the majority is computed, the output of the faulty process can be substituted by an arbitrary default value. A generalization of this approach leads to n-modular redundancy that helps mask the halting failure of m or fewer processes, where n ≥ 2m+1.Atomic transactionsIn a distributed database system, let A be a composite object with components A.0, A.1, ..., A.n. Consider a transaction that assigns a new value x.i to each component A.i. We will represent this operation by A := x. The transaction A := x is called atomic, when "either all or none" of the assignments A.i := x.i are completed. Note that a halting failure can allow a fraction of these assignments to be completed, and violate the atomicity property.To make such a transaction look atomic in the face of crash failures, Lampson proposed the idea of stable storage. A stable storage maintains two copies of A (i.e two copies of every component A.i), and allows two operations update and inspect to access the components. Let us designate the two copies of A by A0 and A1 (Fig 11.3). Process P, which performs the update operation, updates each copy, stamps these updates with (i) the timestamp T, and (ii) a unique signature S called checksum, which is a function of x and T.Fig. 11.3 The model of a stable storage. P performs the update operationand Q performs the inspect operation{procedure update}1A0 := x;{copy 0 updated -- operation not necessarily atomic} 2T0 := time;{timestamp assigned to copy 0}3S0 := checksum (x, T0){signature assigned to copy 0}4A1 := x;{copy 1 updated -- operation not necessarily atomic} 5T1 := time;{timestamp assigned to copy 1}6S1 := checksum (x, T1){signature assigned to copy 1}Process Q, which performs the inspect operation, checks for both the copies, and based on the times of updates as well as the values of the checksums, chooses the correct version of A that satisfies the criterion of atomic update.{procedure inspect}A if S0 = checksum (A0, T0) ∧S1 = checksum (A1, T1) ∧T0 > T1→accept A0B S0 = checksum (A0, T0) ∧S1 = checksum (A1, T1) ∧T0 < T1→accept A0 or A1C S0 = checksum (A0, T0) ∧S1 ≠ checksum (A1, T1) →accept A0D S0 ≠ checksum (A0, T0) ∧S1 = checksum (A1, T1) →accept A1f iCase A corresponds to a failure between steps 3 and 4 of an update. Case B represents no failure -- so any one of the copies is acceptable. Case C indicates a failure between steps 1-3, and case D indicates a failure between steps 4-6. It is important to note that as long as A0 and A1 are properly initialized, and P fails by stopping at any point during steps 1-6 of the update operation, one of the guards A-D must be true for process Q.Stable storage is widely used in client-server models to survive crashes. The two copies A0 and A1 are maintained on two disks on two separate drives. Data can also be recovered when instead of a halting failure by process P, one of the two disks crashes.11.6 Omission FailuresOmission failures are usually caused by transient malfunctions of the channel. In the OSI model of a computer network, omission failures are typically handled either in the data-link layer or in the transport layer. The principle behind handling omission failures is to detect the disappearance of an expected message or an acknowledgement (using sequence numbers and timeouts), and then arrange for a retransmission. Due to the transient nature of this failure, it is assumed that if a message is sent "a large number of times", then it will eventually be received.A widely used transport layer protocol for handling omission failures is the sliding window protocol. This protocol works on a channel where message propagation delays have upper bounds. In the sliding window protocol, there are two processes, a sender process S and a receiver process R, connected by a pair of channels as shown in Fig.11.4. The channel is not a FIFO channel. Process S sends out a sequence of messages m[0], m[1], m[2], ...from an infinite tape, and process R, after receiving each message m[i], decides whether to accept it and forward it to the upper layers of protocol, and then sends an acknowledgement back to S. Both messages and acknowledgements may occasionally be lost or delayed, that is determined by timeouts and retransmissions. The goal of the sliding window protocol is to satisfy the following three crieria:W1.Every message sent out by S should eventually be received by R.W2. No message should be accepted and forwarded to the upper protocol layers more than once.W3. The receiving process R should always forward messages m[i] before m[i+1].Thus, if S sends out the sequence a b c d e ..., then from the persprective of process R, W1 rules out accepting this sequence as a c d e ..., W2 rules out the possibility of accepting it as a b b c c c d e ..., and W3 prevents it from being accepted as a c b d e ....An obvious solution is the so called stop-and-wait protocol. Process S will send one message m[i], and wait for its acknowledgement from R. If the message or the acknowledgement is lost, then m[i]is retransmitted. Otherwise, the next message m[i+1] is sent by S.Fig 11.4. Sliding window protocolThe approach is logically correct, but this restricted version suffers from poor throughput in as much as the sender has to wait for at least two round-trip delays before transmitting a new message. An obvious generalization is the sliding window protocol, that is based on the following scheme:¥The sender is allowed to continue the send action without receiving the acknowledgements of at most w messages (w > 0), where w is called thewindow size". If no acknowledgement to the previous w messages is receivedwithin an expected period of time, then the sender resends those w messages.¥ The receiver anticipates the sequence number j of the next incoming message. If the anticipated message is received, then R accepts it, sends the correspondingacknowledgement back to S, and increments j. Otherwise, R sends out anacknowledgement corresponding to the sequence number j-1 of the previousmessage accepted by it.It is important to note that both the messages and the acknowledgements have to include the sequence number which is an integral part of the decision making process. Thus the i th message will be sent out by S as (m[i], i), and the corresponding acknowledgement will be returned by R as (ack, i). The program is described below.program window;{program for process S}define next, last, w : integer;initially next = 0, last = -1, w > 0;and both channels are emptydo last < next ≤ last + w→send (m[next], next);next := next + 1(ack, j) is received→if j > last→ last := jj ≤ last→ skipf itimeout (r,s)→∀i: last < i ≤ last + w :: send (m[i]. i)od{program for process R}define j : integer;initially j = 0;do(m[next], next) is received→if j = next→accept the message;send (ack, j);j:= j+1j ≠ next→send (ack, j-1)fi;odTo demonstrate that this protocol satisfies the three criteria W1 - W3, we outline an inductive proof.Basis. Message m[0] is eventually accepted by R. To show this, note that if m[0]is lost in transit, then either the guard (j ≠next)is enabled for R, or no acknowledgement is returned, and (last = -1)holds. Messages are sent by S until (next = w-1), and the acknowledgements returned (if any) by R have no impact on the state of S. After this, the guard timeout is enabled for process S, and it sends m.0 through m[w-1] for a second round. In a finite number of rounds of retransmission, m.0 must reach R, the guard (j=next) is enabled, and m[0] is accepted. Furthermore, since j can only increase after a message is accepted, the condition (j = 0) cannot be asserted for a second time. So, m.0 is accepted exactly once.Inductive Step. Assume that R has accepted every message from m[0]through m[k] (k > 0), j = k+1, m[k+1] has already been transmitted by S, so the condition last < k+1 ≤ next holds. We need to show that eventually m[k+1] is accepted by R.If m[k+1] is lost in transit, then no guard is enabled for R. When the remaining messages in the current window are sent out by S, the acknowledgements (ack, k) returned by R do not cause S to increment the value of last beyond k.Eventually the guard timeout is enabled for process S, and it retransmits messages m[last+1] through m[last + w] -- this includes m[k+1]. In a finite number of rounds of retransmission, m[k+1] is received and accepted by R.Finally, for process R, the value of j never decreases. Therefore m[i] is always accepted before m[i+1]The Alternating Bit ProtocolThe alternating bit protocol is a special version of the sliding window protocol, for which w=1.A major hurdle in implementing the sliding window protocol described earlier is that, an arbitrarily large sequence number must accompany the body of the message. Accordingly, neither messages nor acknowledgements can be represented in bounded space. The alternating bit protocol avoids this problem by appending only a 1-bit sequence number to the body of the message. However, the scheme works only when the channels are FIFO. Accordingly, the alternating bit protocol is suitable for application in the data-link layer. The protocol is described below:program ABP;{program for process S}define sent, b : 0 or 1;next : integer;initially next = 0, sent = 1, b = 0and both channels are emptydo sent ≠b→send (m[next], b);next := next +1; sent := b(ack, j) is received →if j = b → b := 1-bj ≠ b →skipf itimeout (r,s)→send (m[next-1], b)od{program for process R}define j : 0 or 1;initially j = 0;do(m[next], b) is received →if j = b → accept the message;send (ack, j);j:= 1 - jj ≠ b → send (ack, 1-j)odWithout going through a formal proof, we demonstrate why the FIFO property of the channel is considered essential for the alternating bit protocol.Fig 11.5. A global state of the alternating bit protocol.Consider the global state of Fig 11.5 that was reached in the following way. m[0] was transmitted by S and accepted by R, but its achnowledgement (ack, 0) was delayed -- so S transmitted m[0]once more. When (ack, 0)finally reached S,it sent out m[1].Since the channel is not FIFO, assume that m[1] reached R before m[0]. This was accepted by R, and (ack,1) was sent back to S. On receipt of this (ack,1) S sent out m[2].Now m[0] with a tag 0 reaches R, and R accepts it, as it mistakes it for m[2] since both m[0] and m[2] have the same tag 0! Clearly this possibility is ruled out when the channels are FIFO.11.7. Concluding RemarksThe examples presented in this chapter illustrate methods of implementing primarily the masking type of fault-tolerance. If the specification of the possible type of fault F is known, then in an F-tolerant system, no occurrence of F should be perceptible to the user of the protocol.Many variations of sliding-window protocol are used in real aplications. Unlike the one discussed in this chapter, most handle two-way communications, where each node can act both as a sender and Another generalization that reduces the number of retransmissions involve the use of a receive buffer. Good messages that are received by R after the loss or the corruption of an earlier message are not discarded, but saved in R's local buffer, and acknowledged. Eventually, the sender learns about it through the timeout mechanism, and selectively transmits the lost or the corrupted messages.An important class of fault-tolerant systems deals with reaching consensus, where a number of processes communicate with one another in a faulty environment to reach an agreement. Distributed consensus is an extensively studied area of research, and some。
机械外文翻译中英文 机床 模具 机械 材料
附录英文原文Industrial Robot and its system’s componentsThere are a variety of definitions of the term robot. Depending on the definition used, the number of robot installations worldwide varies widely. Numerous single purpose muchines are used in manufacturing plants that might appear to be robots. These machines are hardwired to perform a single function and can not be reprogrammed to perform a different function. Such single-purpose machines do not fit the definition for industrial robots that is becoming widely accepted. This definition was developed by the robot Institute ofAmerica:A robot is a reprogrammable muhifunctional manipulator designed to move material, parts, tools, or specialized devices through variable progranmled motions for the perfommnce of a variety of tasks.Note that this definition cxmtalns the words reprograrnmable and multifunctional. It is these two characteristics that separate the true industrial robot from the various single-purpose machines used in modern manufacturing firms. The term"reprogrammable"implies two things:The robot operates aec~)rding to a written program, and this program can be rewritten to acconlmodatc a variety of manufacturing tasks.The term"multifunctional"means that the robot can, through reprogramming and the use of different cnd-effectors, perform a number of different manufacturing tasks. Definitions written around these two critical characteristics are becoming the accepted definitions among manufacturing professionals.The fimt articulated arm came about in 1951 and was used by the U.S. Atomic Energy Commission. In 1954, the first programmable robot was designed by George Devol. It was based on two important technologies:(1) Numerical control (NC) technology.(2) Remote manipulation technology.Numerical control technology provided a foma of machine control ideally suited to robots.It allowed for the control of motion by stored programs. These programs contain data points to which the robot sequentially moves, timing signals to initiate action and to stop movement, and logic statements to allow for decision rfmking.Remote manipulation technology allowed a machine to be more than just another NC machine. It allowed such machines to become robots that can perfoml a variety of manufactuing tasks in both inaccessible an unsafe environmonts. By mering these two technologies, Devol developed the first industrial robot, an unsophistieated programmable materials handling machine.The first conunercially produced robot was developed in 1959. In 1962, the first industrial robot to be used oil a production llne was installed by General Motors Corporation. This robot was produced by Unimation. A major step forward in robot control occurred in 1973 with the development of the T-3 industrial robot by Cincinnati Milaeron. The T-3 robot was the first commercially producedindustrial robot controlled by a minicomputer.Numerical control and remote manipulation technology prompted the wide scale development and use of industrial robots. But major technological developments do not take place simply because of such new capabilities. Something must provide the impetus for taking advantage of these capabilities. In the case of industrial robots, the impetus was economies.The rapid inflation of wages experienced in the 1970s tremendously increased the personnel costs of manufacturing firms. At the same time, foreign competition became a serious problem for U. S. manufacturers. Foreign manufacturers who had under taken automation on a wide scale basis, such as those in Japan, began to gain an increasingly large share of the U.S. and world market for manufactured goods, particularly automobiles.Through a variety of automation techniques, includicg robots, Japanese manufacturers, beginning in the 1970s, were able to produce better automobiles more cheaply than nonautomated U.S. manufacturers. Consequently, in order to survive, U.S. manufacturers were forced to consider any technological developments that could help improve productivity.It became imperative to produce better produets at lower costs in order to be competitive with foreign manufacturers. Other factors such as the need to find better ways of performing dangerous marmfacturing tasks contributed to the development of industrial robots. However, the principal rationale has always been,and is still, improved productivity.One of the principal advantages of robots is that they can be used in settings that are dangerous to humans. Welding and parting are examples of applications where rotmts can be dangerous to humans. Even though robots are closely asmciated with safety in the workplace, they can, in themselves, be dangerous.Robots and robot cells must be carefully designed and configured so that they do not endanger human workers and other machines. Robot work envelops should be accurately calculated and a danger zone surroundting the envelop clearly marked off. Red flooring strips and barriers can be userd to keep human workers out of a robot’s work envelope.Eren with such precautions it is still a good idea to have an automatic shutdown system in situations where robots are used. Such a system should should have the capacity to sense the need for an automatic shutdown of operations. Fault-tolerant computers and redunant systems can be installed to ensure proper shutdown of robotics systems to ensure a safe environment.Industrial robots is the science of designing, building, and applying industrial robcts. What are robots In the late 1970s the Robotic Industries Association defined a robot as〞 a manipulator, designed to move material, parts,tools or specialized devices through variable programmed motions for the performance of a variety of tasks". Although this definition does not directly include pick and place arms as robots, teleoperamrs and remotely controlled devicesare often referred to also as robots. The International Standards Organization(ISO) has a more lengthy definition of an industrial robot:A machine formed by a mechanism including several degrees of freedom, often having tile appearanoa of one or several arms ending in a wrist capable of holding a tool or a workpiece or an inspection device. In particular, its control unit must use a memorizing device and .sometimes it can use sensing or adaptation appliances taking into account environment and circumstances. These multipurtpose machines are generally designed to carry out a repetitive function and can be adapted to other functions.The RIA and ISO definitions both stress the muLtifunctional and programmable capabilities and, therefore, exclude special-purpose "hard automation" tools and equipment typically found in high volume production. Also excluded are manual remote manipulators, which are extensions of human hands for use in, for example, sterile, hot, or radioactive environments.In Japan, the Japanese Industrial Robot Association (JIRA) classifies industrial robots by the method of input informatkm and the method of teaching:1. Manual Manipulators. Manipulators directly activated by the operator.2. Fixed-sequence Robot. Robot that once programmed for a given sequence of operations is not easily changed.3. Variable-sequence Robot. Robot that can be programmed for a given sequence of operations and can easily be changed or reprogrammed.4. Playback Robot. Robot that "memorizes" work sequences taught by a human being who physically leads the device through the intended work pattern; the robot can then create this sequence repetitively from memory.5. Numerically Controlled (NC) Robot. Robot that operatas from and is controlled by digital data, as in the form of punched tape, cards, or digital switches; operates like a NC machine.6. Intelligent Robot. Robot that uses sensory perception to evaluate its environment andcmake decisions and proceeds to operate accordingly.The first-generation of robot systems was defined for the various robots with limited computer power. Their main intelligant functions include programming by showing a sequence of manipulation steps by a human operator using a teach box. Without any sensors, these robots require a prearranged and relatively fixed factory environment and, therefore, have limited use.The second-generation of robot systems was enhanced by the addition of a computer processor. A major step in industrial robotics development was the integration of a computer with the industrial robot mechanism. This has provided real-time calculation of trajectory to smooth the motions of the end effector and integration of mine simple force and proximity sensors to obtain external signals.The main applications of second generation robots include spot and arc welding, spray painting, and some assembly.Third-generation robot systems incorporate multiple eomputer processors and multiple arms that can operate asynchronously to perform .several functions. Distributed hierarchical mmputer organization is preferred, because it can coordinate motions and interface with external sensors, other machines, and other robots and can communicate with other computers. These robots can already exhibitintelligent behavior, including knowledge-based control and learning abilities.Japan ranks as the world's top robot-producing and robot-using country, with more than 40% of the world's industrial robot installations. The reasons for this penetration are sociological-and technological factors that are unique to Japan: industrial robots brought productivity and quality gains in Japanese industry, coupled with improvements of the work enviromnent. These have perpetuated the social-demand for more robots as well as increased the expectation from this technology.Current and emerging robot applications in industry can be categorized on the complexity and requirements of the job. They range from simple, low technolngy pick-and place operations through medium technology painting, some assembly and welding operations to high technology precision assembly and inspection operations.Pick-and-place Operations The earliest applications of robots were in machine loading unloading, pick-and-place, and material transfer operations. Such robots typically were not servo controlled and worked with pneumatic or hydraulic power. The Icxad-carrying requirements were high, working in dirty or hazardous factory environments. Replacing unskilled human labor often in hazardous jobs, these robots had to be robust and low in initial and maintenance costs.Painting and Welding Operations The next level in the sophistimtion of industrial robot applications was in spray painting, and spot and arc welding. These applications complemented or replaced certain skilled human labor. Often the justification was by eliminating dangerous environmental exposures. These applications often require tracking complex trajectories such as painting surface mntours, hence mrvo controlled "articulated or spherical robot structures were used.Lead-through teaching modes became commom, and sometimes sophisticated sensors are employed to maintain process consistency. Experience has shown that when properly selected and implemented, thase robotic applications usually lead to reduced overall manufacturing costs and improved product quality compared with manual method.Assembly Operations The most advanced level of technology employing third-generation industrial robots is found in assembly. System repeatability is of utmost importance. End-of-arm tooling must be compliant, i.e., have both force and displacement control to adjust part insertions, which require that the robot actually "feel" its way along. This technology usually requires a measure of artificial intelligence. Assembly robots generally are electronically driven and operate in clean enviromnents. Assembly robots are expected to exceed further technology applications.Other Applications Other typical applications of robots include inspection, quality control, and repair; processing such as laser and water jet cutting and drilling, riveting, and clean room operations; and applications in the wood, paper, and food-processing industries. As industrial robot technology and robot intelligence improve even further, additional applications may be justified effectively.The components of a robot system could be discussed either from a physical point of view or from a systems point of view. Physically, we would divide the system into the robot, power system, and controller (computer). Likewise, the robot itself could be partitioned anthropomorphically into base, shoulder, elbow, wrist, gripper, and tool. Most of these terms require little explanation.Consequently, we will describe the components of a robot system from the point of view of information transfer. That is, what infomtation or signal enters the component; what logical or arithmetic operation does the component perform; and what information or signal does the component produce It is important to note that the same physical component may perform many different information proees.sirkg operations (e. g. , a central computerperforms many different calculations on different data). Likewise, two physically separate components may perform identical information operations ( e. g., the shoulder and elbow actuators both convert signals to motion in very similar ways).Actuator Asmciated with each joint on the robot is an actuator which causes that joint to move. Typical actuators are electric motors and hydraulic cylinders. Typically, a robot system will contain six actuators, since six are required for full control of position and orientation. Many robot applications do not require this full flexibility, and consequently, robots are often built with five or fewer actuators.Sensor To control an actuator, the computer must have infommtion regarding the position and possibly the velocity of the actuator. In this context, the term position refers to a displacement from some arbitrary zero reference point for that actuator. For example, in the case of a rotary actuator , "position" would really the angular position and be measured in radians.Many types of sensors can provide indications of position and velocity. The various types of sensors require different mechanisms for interfacing to the computer. In addition, the industrial use of the manipulator requires that the interface be protected from the harsh electrical environment of the factory. Sources of electrical noise such as arc welders and large motors can easily makena digital system useless unless care is taken in design and construction of the interface.Computation We could easily have labeled the computation module "computer,"as most of the Functions to be described are typically perfommd by digital computers. However, many of the Functions may be performed in dedicated custom hardware or networks of computers We will, thus, discuss the commputational component as if it were a simple computer, recognizing that tile need for real-time control may require special equipment and that some of this equipment may even be analog, although the current trend is toward fully digital systems.One further note: We will tend to avoid the use of the term microprocessor in this book and simply say computer, although many current robot manufacturers use one or more microprocessors in their systems.The computation component performs the following operations: Servo Given the current position and/or velocity of an actuator, determine the appropriate drivesignal to move that actuator toward its desired position. "This operation must be performed for eaeh actuator. Kinematics Given the current statel of the actuators (position and velocity ),determine the current state of the gripper. Conversely, given a desired state of the hand, determine the desired state for each actuator.Dynamics Given knowledge of the loads on the arm (inertia, friction, gravity, acceleration), use this information to adjust the sorvo operation to achieve better performance.Workplace Sensor Analysis Given knowledge of the task to be performed, determine appropriate robot motion commands. This nmy include analyzing a TV picture of the workplace or measuring and compensating for forces applied at the hand.In addition to these easily identified co. mponents, there are also supervisory operations such as path planning and operator interaction.中文翻译工业机器人及其系统组成有许多关于机器人这个术语的定义。
StarWind Virtual SAN Free 入门指南说明书
#1 HyperConverged Appliance for SMB and ROBOStarWind Virtual SAN®Free Getting StartedJUNE 2015TECHNICAL PAPERTrademarks“StarWind”, “StarWind Software” and the StarWind and the StarWind Software logos are registered trademarks of StarWind Software. “StarWind LSFS” is a trademark of StarWind Software which may be registered in some jurisdictions. All other trademarks are owned by their respective owners. ChangesThe material in this document is for information only and is subject to change without notice. While reasonable efforts have been made in the preparation of this document to assure its accuracy, StarWind Software assumes no liability resulting from errors or omissions in this document, or from the use of the information contained herein. StarWind Software reserves the right to make changes in the product design without reservation and without notification to its users.Technical Support and ServicesIf you have questions about installing or using this software, check this and other documents first - you will find answers to most of your questions on the Technical Papers webpage or in StarWind Forum. If you need further assistance, please contact us.Copyright ©2009-2015 StarWind Software Inc.No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written consent of StarWind Software.In 2016, Gartner named StarWind “Cool Vendor for Compute Platforms”.Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.About StarWindStarWind is a pioneer in virtualization and a company that participated in the development of this technology from its earliest days. Now the company is among the leading vendors of software and hardware hyper-converged solutions. The company’s core product is the years-proven StarWind Virtual SAN, which allows SMB and ROBO to benefit from cost-efficient hyperconverged IT infrastructure. Having earned a reputation of reliability, StarWind created a hardware product line and is actively tapping into hyperconverged and storage appliances market. In 2016, Gartner named StarWind “Cool Vendor for Compute Platforms” following the success and popularity of StarWind HyperConverged Appliance. StarWind partners with world-known companies: Microsoft, VMware, Veeam, Intel, Dell, Mellanox, Citrix, Western Digital, etc.ContentsIntroduction (4)Implementation (5)Contacts (7)IntroductionStarWind Virtual SAN® is entirely software-based, hypervisor-centric virtual machine storage. It creates a fully fault-tolerant and high-performing storage pool that is built for the virtualization workload “from scratch”. StarWind Virtual SAN basically “mirrors” inexpensive i nternal storage between hosts. Virtual SAN completely eliminates any need for an expensive SAN or NAS or other physical shared storage. It seamlessly integrates into the hypervisor for unbeatable performance and exceptional simplicity of useStarWind comes with the different set of options and deployment scenarios. It allows implementation of:•Hyper-Converged architecture, which assumes running StarWind on the same physical host where the client is running•Compute and Storage Separated architecture, where StarWind is running on the dedicated physical box.Free version comes with basic set of features (and is targeting just one deployment scenario which is Compute and Storage Separated, while paid version can also do Hyper-Converged and various combined ones. StarWind Virtual SAN Free takes two servers with some internal storage, brand new or decommissioned from other project and turns them into a “shared nothing” fault-tolerant SAN and NAS. HA iSCSI SAN protocol is kept for “internal housekeeping” and no t available to external initiator servers. Highly available shared storage is available to the client servers through Continuously Available SMB 3.02 and Failover NFSv4.1. This dramatically simplifies the installation, support, and management of the storage solution. Performance is not throttled and still stands in line with Enterprise-grade storage arrays.This guide is intended to highlight the specific implementation that StarWind Virtual SAN Free allows, its pros and cons.A full set of up-to-date technical documentation can always be found here, or by pressing the Help button in the StarWind Management Console.For any technical inquiries, please visit our online community, Frequently Asked Questions page, or use the support form to contact our technical support department.ImplementationStarWind Virtual SAN Free takes a pair of new or decommissioned commodity servers and turns them into a DIY dual-controller “shared nothing” fault-tolerant SAN and NAS by “mirroring” their internal storage between them. Resulting solution exposes continuously available SMB 3.02 shares and failover NFS v4.1 mount points and targets such a use cases:•Shared storage for Microsoft Hyper-V VMs and SQL Server DBs (CA SMB3)•Shared storage for VMware vSphere & ESXi, Citrix XenServer and various Xen VMs (NFS) •Failover file server (common data, VDI profiles, backups and so on) (SMB3 & NFS)Though a hyper-converged scenario is an industry trend now, the differentiation of compute and storage layers makes sense if there’s need to grow by capacity or compute separately from each other. Typical use cases are shared storage for huge clustered SQL Server and Oracle deployments and an inexpensive block back-end for Scale-Out File Servers, NFS shared file servers, etc.This option assumes manual installation and configuration of the StarWind Virtual SAN on the Windows Server that is running on the separate physical box. As mentioned previously it really sensitive hardware utilization control leverages, such deployment usually considered for the bigdeployments, where underprovisioning may result in significant waste of budget on the hardware that wouldn`t be actually used, as in some case of hyper-converged architecture usage.While running StarWind in Compute and Storage architecture, it is possible to scale compute and storage resources independently, with different leverages regardless from each other. As a result, the system better fits the task while CapEx and OpEx go through the floor,since there is no need to purchase hardware that will be essentially wasted. Thus, the system can be created specifically for a particular task.In order to configure the configurations that StarWind Virtual SAN Free allows please refer to the one of the step-by-step manuals that you can find by using the links below:•/configuring-ha-file-server-for-smb-nas•/configuring-ha-file-server-for-nfs-nas•/sw-providing-ha-shared-storage-for-scale-out-file-serversContacts1-617-449-77171-617-507-5845 +44 20 3769 1857 (UK) +49 302 1788 849 (Germany)+33 097 7197 857 (France)+7 495 975 94 39(Russian Federation and CIS)1-866-790-2646Customer Support Portal:Support Forum:Sales:General Information:https:///support https:///forums ***********************************StarWind Software, Inc. 35 Village Rd., Suite 100, Middleton, MA 01949 USA ©2015, StarWind Software Inc. All rights reserved.。
fault-tolerant的中文 -回复
fault-tolerant的中文-回复【faulttolerant的中文】故障容错技术在计算机领域的应用【引言】在计算机技术不断发展的今天,系统可靠性和容错性成为人们越来越关注的重要问题。
故障容错技术(faulttolerant)以其在系统设计和实现中的重要地位,成为计算机领域研究的热门话题。
故障容错技术的主要目标是提供系统的可靠性,能够在出现故障时自动恢复,保证系统的稳定运行。
本文将深入探讨故障容错技术的中文含义、原理和在计算机领域的应用。
【正文】一、故障容错技术的中文含义故障容错技术(faulttolerant)中的"fault"指的是系统或设备出现故障,而"tolerant"则代表了处理故障能力强、具备自我修复能力的特性。
故障容错技术是通过系统设计和实现,保证在出现硬件或软件故障情况下,系统仍然能够保持稳定运行,不会导致整个系统崩溃。
二、故障容错技术的原理故障容错技术的原理主要包括以下几个方面:1.冗余技术:通过将系统的关键组件或数据进行冗余设计,当其中一个组件或数据出现故障时,可以快速切换到备用组件或数据,确保系统的持续可用性。
2.自检和自修复:系统能够定期检测自身状态,及时发现并修复故障。
这可以通过使用故障检测算法、自动备份等方式实现。
3.错误检测和纠正:故障容错技术的核心是识别系统中存在的错误并加以纠正。
这可以通过利用校验码、差错控制码等方法实现。
三、故障容错技术在计算机领域的应用故障容错技术在计算机领域的应用非常广泛,涉及到操作系统、数据库、网络等多个方面。
1.操作系统操作系统是计算机系统的核心,它负责管理和控制硬件资源,保证各个应用程序的正常运行。
故障容错技术在操作系统中的应用主要体现在以下几个方面:- 容错文件系统:通过数据冗余、错误检测和自动修复机制保证文件系统的可靠性和可用性。
- 容错虚拟内存:在内存管理中,通过实现容错机制,可以避免出现内存故障导致的系统崩溃。
fault-tolerant的中文 -回复
fault-tolerant的中文-回复题目:faulttolerant的中文意思及其应用领域的探讨引言:现如今,随着信息技术的高速发展,各行各业对于系统的可靠性要求越来越高。
而faulttolerant作为一种重要的技术手段,在信息领域扮演着重要的角色。
本文将着重探讨faulttolerant的中文意思以及其在不同领域的应用。
第一部分:faulttolerant的中文意思faulttolerant一词源于英文,fault意为“故障”,tolerant则是“容忍”的意思。
结合起来,faulttolerant可以翻译为“容错”或者“故障容忍”。
它指的是一种系统或设备在发生故障时,仍然保持运行,并且不会对整体系统的正常工作产生影响。
第二部分:faulttolerant的应用领域1.计算机科学领域:在计算机科学领域,faulttolerant技术被广泛应用于操作系统、数据库系统、网络通信等方面。
一些关键性的任务,比如银行交易、航空航天系统和核能系统等都需要高度的容错性,以确保任何故障都不会导致系统瘫痪或数据损失。
2.云计算与大数据领域:随着云计算和大数据应用的迅速发展,对于系统的可靠性要求也越来越高。
在这些领域中,faulttolerant被广泛使用以确保系统的高可用性。
例如,分布式存储系统和分布式计算框架通常采用冗余数据和备份策略,以应对节点故障和数据丢失的情况。
3.网络和通信领域:在网络和通信领域,faulttolerant技术用于保证数据的可靠传输。
例如,通过使用冗余的网络链路或路由协议,可以避免单个链路或节点的故障对整个网络的影响。
此外,还有许多基于容错技术的通信协议被应用于提高通信的可靠性。
4.工业自动化领域:在工业自动化领域,faulttolerant技术可应用于保证生产过程的持续运行。
通过使用冗余的传感器、执行器和数据采集设备,可以在故障发生时快速切换到备用设备,避免生产中断和设备损坏。
POST Error Messages(阵列卡错误代码)
Slot x drive array – redundant controller error
Replace controller or server motherboard.
1762
Redundant controller operation is not supported in this firmware version. Please remove redundant controller or upgrade controller firmware. (Controller is disabled until this problem is resolved.)
1726
Slot x Drive Array - Array Accelerator Memory Size Change Detected – Array Accelerator configuration has automatically been updated (RESUME = F1 KEY)
Details of these messages and corrective actions are provided in the following table.
Not all the messages in the table will apply to every system or controller.
This POST message should never occur unless the write cache is somehow enabled on a controller that does not have batteries.
1729
Slot 1 Drive Array - disk performance optimization scan in progress – RAID 4/5 performance may be higher after completion.
HP认证题库
HP HP0-794Implementing Windows Server 2003 on HP ProLiant Cluster SolutionsC. 2D. 3Answer: C4. What enables restoring Exchange databases to an online production server?A. Shadow Backup CopyB. Recovery Storage GroupC. Mailbox Recovery CenterD. Mobile Exchange Solution PackageAnswer: B5. What is the leading cause of downtime?A. hardware failuresB. operational activitiesC. software failuresD. infrastructure problemsAnswer: D6. What is the maximum number of drive letters per cluster for shared drives with Microsoft Windows Server 2003?A. 18B. 19C. 20D. 23Answer: D7. When creating a clustered volume mount point on a cluster node, which requirement must be met? The partition to be mounted must be ______.A. created on the Quorum diskB. mounted to a shared disk in a different groupC. created on a non-clustered diskD. formatted using NTFSAnswer: D8. Which description accurately depicts an HP Multi-Server Card?A. It supports up to 64 network devices within a single UPS group and provides unattended, sequential shutdown of every device in that group, regardless of the operating system.B. It enables each load segment to function as a separate UPS with individual shutdown schedules and load segment configurations.C. It detects and warns users of power failures and manages a safe system shutdown before the backup power supply is exhausted.D. It allows for the safe installation of new batteries through the front panel, without powering down the connected equipment.Answer: B9. Which component of Insight Manager 7 enables you to browse cluster and component status in a tree hierarchy?A. Systems ManagerB. Cluster MonitorC. Cluster ManagerD. Management ServicesAnswer: B10. What is least likely to fail in a Windows Server 2003 network environment?A. hardwareB. applications。
时钟同步服务方案英语
时钟同步服务方案英语Clock Synchronization Service PlanIntroduction:Clock synchronization is vital for various applications and systems that rely on accurate timing information. It ensures that different devices or systems have a consistent and precise notion of time. In this service plan, we propose a clock synchronization solution that aims to provide reliable and efficient synchronization for a wide range of applications.Objectives:1. To provide highly accurate time synchronization for devices and systems.2. To ensure synchronization reliability even in the presence of network delays or failures.3. To minimize the overhead and resources required for synchronization.4. To support scalability and allow synchronization for a large number of devices or systems.5. To provide flexibility in terms of synchronization protocols and algorithms.Technical Approach:1. Clock Synchronization Protocol:We will implement a robust clock synchronization protocol that can handle network delays, packet loss, and clock drift. The protocol will leverage the Network Time Protocol (NTP) as the foundation but enhance it with additional mechanisms for increased accuracy and robustness.2. Multiple Time Sources:To enhance reliability and accuracy, our system will support multiple time sources. These time sources can include atomic clocks, GPS receivers, or other synchronized systems. By incorporating multiple time sources, we can minimize the impact of individual failures or inaccuracies.3. Dynamic Time Skew Adjustment:Our system will continuously monitor the time skew between the local clock and the synchronized time. It will dynamically adjust the local clock to reduce the skew. This approach ensures that the synchronized time is always accurate, even in the presence of clock drift.4. Fault-Tolerant Design:To ensure reliability, our system will have a fault-tolerant design. It will include redundancy and backup mechanisms to handle network failures or time sourcefailures. In case of a failure, the system will automatically switch to an alternative time source or re-establish synchronization when the connectivity is restored.5. Scalability and Flexibility:Our solution will be designed for scalability, allowing synchronization for a large number of devices or systems. It will support both centralized and distributed architectures, depending on the specific requirements of the application. Additionally, the system will be flexible enough to support different synchronization protocols and algorithms, allowing customization based on the specific needs of the application.Implementation Plan:1. Design and Development:We will start by designing the clock synchronization protocol and the overall system architecture. This phase will involve determining the requirements, defining protocols and algorithms, and planning for scalability and fault tolerance. Once the design is finalized, we will proceed with the development of the system components.2. Testing and Optimization:During this phase, we will extensively test the system for reliability, accuracy, and performance. We will simulate various network conditions and time source failures to ensure that the system can handle them gracefully. We will also optimize the system for efficiency and resource utilization.3. Deployment and Integration:Once the system is thoroughly tested and optimized, we will proceed with its deployment and integration into the target application or system. This phase will involve setting up the necessary infrastructure, configuring the system parameters, and integrating it with the existing components.4. Monitoring and Maintenance:After deployment, we will continuously monitor the system to ensure its proper functioning. We will establish proactive monitoring mechanisms to detect any potential issues or failures. Regular maintenance and updates will be performed to keep the system up to date with the latest technologies and protocols.Conclusion:Our clock synchronization service plan aims to provide reliable and accurate time synchronization for a wide range of applications. By leveraging a robustsynchronization protocol, multiple time sources, and dynamic time skew adjustment, we ensure that the synchronized time is always accurate and reliable. The fault-tolerant design and scalability features make our solution suitable for various scenarios.。
Fault-tolerant systems
we have applied several improvement: we
ideas. First, we use an algorithmic rent number of particles.
recombination of most particles. To study the kinetics of these large numbers of interacting particles, a full Monte Carlo simulation of their movements is needed. One of the main points of interest is computing the number of ions that escape from the high-energy electron tracks. To be able to simulate the movements of the positive ions and the thermalized initial spatial distribution obtain information by comparing electrons, an of these particles must be and
b Department of Physics and Astronomy, Vrije Universiteit, De Boelelaan 1081, 1081 HV Amsterdam, Netherlands
Abstract
Ion recombination in nonpolar liquids is an important problem in radiation chemistry. We have designed and implemented a parallel Monte Carlo simulation for this computationally intensive task on a network of workstations. The main problem with parallelizing this application is that the amount of work performed by each process decreases during execution, resulting in high communication overhead and load imbalances. We address this problem by dynamically adjusting the number of processors that are used. We have evaluated the performance of the parallel program on two systems, one using Ethernet and the other using Myrinet. On Ethernet, the program suffers from a large communication overhead. Using the Myrinet
东盟自贸区英语词汇
中国外交部Ministry of Foreign Affairs, P.R.China ()文莱外交与贸易部Ministry of Foreign Affairs & Trade,Brunei Darussalam (.bn/) 柬埔寨外交与国际合作部Ministry of Foreign Affairs & International Cooperation,Cambodia (.kh/)Indonesia印度尼西亚Lao PDR老挝Myanmar缅甸Malaysia 马来西亚Philippines 菲律宾Singapore 新加坡Thailand 泰国Viet Nam 越南ASEAN secretariat 东盟秘书处中国东盟自由贸易区China-ASEAN Free Trade Aera(CAFTA)大湄公河次区域Greater MeKong Sub-region(GMS)东南亚国家联盟Association of Southeast Asian Nations(ASEAN)CAFTA Tariff Reduction Program 自贸区降税计划Trade in Goods 货物贸易Trade in Services 服务贸易Dispute Settlement Mechanism 争端解决机制The 10 priority cooperation fields between China and ASEAN 中国-东盟十大优先合作领域Human resources development 人力资源开发Two-way investment 双向投资MeKong River Basin development 湄公河流域开发Information and communications technology(ICT) 信息产业Transport 交通energy 能源cultrue 文化tourism 旅游public health公共卫生火龙果fire-dragon fruit 榴莲durian 山竹mangosteen 红毛丹rambutan Briefing推介会amount to 高达China and ASEAN are close neighbors connected by mountains and rivers中国和东盟山水相连、人脉相依。
jraft状态机实现原理
jraft状态机实现原理英文回答:Understanding the JRaft State Machine Implementation.JRaft is a Java-based, consensus-oriented distributed consensus framework that provides a highly available,fault-tolerant, and scalable solution for managing distributed storage systems. At the core of JRaft lies its state machine implementation, which plays a crucial role in managing and updating the system's state.The state machine in JRaft is responsible for storing and manipulating the system's data. It receives commands from clients and applies them to the system's state. Once a command is applied, the state machine replicates the updated state to followers in the distributed system to ensure data consistency.The JRaft state machine is designed to be highlyreliable and fault-tolerant. It employs various mechanisms to prevent data corruption and ensure the system's integrity, even in the face of network failures or server crashes.Key Features of the JRaft State Machine:Linearizable Semantics: The state machine provides linearizable semantics, which guarantees that all client requests are applied to the system's state in the same order they were received.High Availability: The state machine is designed to be highly available, ensuring that the system remains operational even if some servers fail.Fault Tolerance: The state machine employs mechanisms like replication and leader election to tolerate server failures and maintain data integrity.Scalability: The state machine can be scaled horizontally to handle increased load and improveperformance.Under the Hood: How the JRaft State Machine Works.The JRaft state machine consists of three main components:1. Log: The log stores all the commands that have been applied to the system's state. It provides a persistent record of the system's history.2. State: The state represents the current state of the system. It is updated by applying commands from the log.3. Snapshot: A snapshot is a persistent representation of the system's state at a specific point in time. Snapshots are used for faster recovery after server failures.When a client sends a command to the JRaft system, the command is appended to the log. The leader server then replicates the log to all the followers in the system. Oncethe majority of followers have acknowledged the log entry, the command is considered committed and is applied to the system's state.The state machine maintains a consistent view of the system's state across all servers. This consistency is achieved through the use of consensus algorithms, such as the Raft consensus algorithm.中文回答:JRaft 状态机实现原理。
BIOS出错英文提示详解
BIOS出错英文提示详解Drive A error 驱动器A错误System halt 系统挂起Keyboard controller error 键盘控制器错误Keyboard error or no keyboard present 键盘错误或者键盘不存在BIOS ROM checksum error BIOS ROM校验错误Single hardisk cable fail 当硬盘使用Cable选项时硬盘安装位置不正确FDD Controller Failure BIOS 软盘控制器错误HDD Controller Failure BIOS 硬盘控制器错误Driver Error 驱动器错误Cache Memory Bad, Do not Enable Cache 高速缓存Cache损坏,不能使用Error: Unable to control A20 line 错误提示:不能使用A20地址控制线Memory write/Read failure 内存读写失败Memory allocation error 内存定位错误CMOS Battery state Low CMOS没电了Keyboard interface error 键盘接口错误Hard disk drive failure 加载硬盘失败Hard disk not present 硬盘不存在Floppy disk(s) fail (40) 软盘驱动器加载失败,一般是数据线插反,电源线没有插接,CMOS 内部软驱设置错误CMOS checksum error-efaults loaded. CMOS校验错误,装入缺省(默认)设置Detecting floppy drive A media... 检测软驱A的格式Drive media is : 1.44Mb1.2Mb 720Kb 360K 驱动器格式是1.44Mb、12Mb、720kb、360kb的一种DISK BOOT FAILURE, INSERT SYSTEM DISK AND PRESS ENTER 磁盘引导失败,插入系统盘后按任意键继续Invalid partition table 无效的分区表Error loading operating system 不能装入引导系统Missing operating system 系统引导文件丢失Invalid system disk 无效的系统盘Disk I/O error, Replace the disk and press any key. 磁盘I/O错误,替换磁盘后按任意键,当C 盘系统文件丢失或被破坏时出现。
02_Fault_Tolerance
Power Failure: Phone Lines
Soft Hard
2000 hr >.1 hr 4000 hr 100,000hr
Hardware Modules: Software:
(many are transient)
1 Bug/1000 Lines Of Code (after vendor-user testing) vendor=> Thousands of bugs in System! Most software failures are transient: dump & restart system. Useful fact: 8,760 hrs/year ~ 10k hr/year
hardware environment operations
software total
0 1985 1987 1989
1985 SOFTWARE 2 HARDWARE 29 MAINTENANCE 45 OPERATIONS 99 ENVIRONMENT 142
1987 1990 53 33 91 310 162 409 171 136 214 346
12%
25%
1 1 .2
Environment
Application Software
9 .3 % Operations
Vendor (hardware and software) 5 Months Application software 9 Months Communications lines 1.5 Years Operations 2 Years Environment 2 Years 10 Weeks
容错服务器在高速公路收费系统中的应用
Value Engineering 0引言随着经济的不断发展,计算机技术、网络技术等的不断更新和扩大,不仅加快了高速公路建设的步伐,而且对高速公路机电系统的要求也越来越高。
实施高精度、高可靠性的交通机电系统,已经是一个非常流行的话题;其中,收费系统是高速公路机电三大系统之一,是目前国内高速公路机电系统应用的核心,也是效益发挥最大,日趋最为成熟的系统。
随着高速公路机电系统不断发展和成熟,整个收费系统的可靠性越来越受到重视,各种升级后的计算机设备、路由设备、交换设备、操作系统、数据库系统等也都会有不同程度的应用,再加上环境、资源共享、数据通信、计算机病毒以及网络管理等方面的不安全因素,收费数据的可靠性、安全性和有效性会变得更加重要。
以往,在收费系统中为提高数据库服务器的可靠性,通常采用双机热备方案,尽管采用该方案对系统的可靠性上有所提高,但是增加了整体的软硬件投资费用,增加了系统的运营和维护工作,而且系统的可靠性并不一定能得到足够的保障。
而容错服务器由于采用了部件级全硬件冗余技术,整个硬件系统从主板、CPU 、内存以及I/O 模块等都采用双硬件模式,并且由于该系统采用了先进的同步(Lock Step )技术,冗余部件同一时刻执行同一指令,整个系统的连续可用性能达到99.999%以上,这样不但可避免以前双机热备系统中单硬件系统故障所造成的系统停机以及系统切换时间上的经济损失,同时还可降低系统维护过程中的维护量及后续维护费用。
1容错方案设计思想容错服务器对很多用户来说,早已不再陌生。
建立在冗余技术基础之上的容错服务器,在解决硬件单点故障、缩短故障恢复时间、降低人为错误、减少部件和软件版本不兼容等方面相对于双机服务器都显示出了其强大的优势。
容错服务器由于采用了部件级全硬件冗余技术,整个硬件系统从主板、CPU 、内存以及I/O 模块都采用双硬件模式,其冗余硬件部件同一时刻执行同一指令,整个系统的连续可用性高于99.999%。
Fault-tolerant systems
Chapter1Hume:aFunctionally-Inspired Language for Safety-Critical SystemsKevin Hammond1and Greg Michaelson2Abstract:Hume is a novel functionally-based language that targets safety-critical applications.The language is designed to support rigorous cost and space anal-yses,whilst providing a high level of abstraction including polymorphic type inference,automatic memory management,higher-order functions,exception-handling and a good range of primitive types.1.1INTRODUCTIONThis paper describes the Hume programming language.Hume(Higher-order Unified Meta-Environment)is a strongly typed,mostly-functional language with an integrated tool set for developing,proving and assessing concurrent,safety-critical systems.Hume aims to extend the frontiers of safety-critical language design,introducing new levels of abstraction and provability.A full description of Hume including a formal dynamic semantics and a partially completed static semantics can be found at /hume. We anticipate publishing this material as a formal language definition in the near future.1.1.1BackgroundHume is named for the Scottish Enlightenment sceptical philosopher David Hume (1711-1776),who counseled that:To begin with clear and self-evident principles,to advance by timorous and sure steps,to review frequently our conclusions,and examine accurately all their consequences;though by these means we shall make both a slow and a short progress in our systems;are the only methods,by which we can ever hope to reach truth,and attain a proper stability and certainty in our determinations.D.Hume,An Enquiry Concerning Human Understanding,1748These sentiments epitomise the philosophy of programming language design that we have followed in constructing Hume.1.2MOTIV ATION AND OBJECTIVESSince the focus of the Hume design is on safety critical applications,it is paramount that Hume programs have predictable and,preferably,provable properties.How-ever,the strong properties of program equivalence,termination and time and space use are undecidable for Turing complete languages.Conversely,languages in which such properties are decidable(i.e.finite state machines)lack expressive-ness.The goal of the Hume language design is to support a high level of expres-sive power,whilst providing strong guarantees of dynamic behavioural properties such as execution time and space usage.Hume reflects these considerations in: the separation of the expression and coordination aspects of the language;and the provision of an integrated tool set,spanning both static and dynamic pro-gram analysis and manipulation.1.2.1Important Design CharacteristicsAny system that is aimed at safety critical applications must meet a number of stringent criteria if it is to be used with confidence.Hume has been designed to support Leveson’s guidelines for Safeware—software intended for safety critical applications[11],as well as similar guidelines provided by other authors[2].In general,safety critical systems must meet both strong correctness criteria and strict performance criteria.The latter are most easily attained by working at a low level,whereas the former are most easily attained by working at a high level.A primary objective of the Hume design is to allow both types of criteria to be met while working at a high level of abstraction.The Hume language has been designed to allow relatively simple formal cost models to be developed,capable of costing both space and time usage.This re-quires some restrictions on the expression language in cases which are cost or2space critical.Thisfirst version of the language design is deliberately rather spar-tan,allowing experimentation with essential features but omitting some desirable syntax or other language features,such as type classes.Future versions of the language should address these omissions.The language definition does support a wide range of(particularly numeric)basic types.This is because issues of type coercion and type safety are fundamental to ensuring both correctness and safety.Both system level and process level exceptions are supported,including the ability to set timeouts for expression computations.Exceptions may be raised from within the expression language but can only be handled by the process lan-guage.This reduces the cost of handling exceptions and maintains a pure expres-sion language,as well as simplifying the expression cost calculus.A radical design decision for safety critical systems is the use of automatic memory management techniques.Automatic memory management has the ad-vantage of reducing errors due to poor manual management of memory.The disadvantage lies in terms of excessive time or space usage.Hume implementa-tions will use static analysis tools to limit space usage,and will incorporate recent developments in bounded-time memory management techniques[3].1.3THE HUME LANGUAGEThis section introduces the Hume language informally.A full description,from which this section is abstracted,can be found in the Hume language definition[13].1.3.1Structure of the LanguageIn common with other coordination language approaches such as Linda[4],Hume takes a layered approach.The outermost layer is a static declaration language that provides definitions of types,streams etc.to be used in the dynamic parts of the language.The innermost layer is a fairly conventional expression language which is used to define values and(potentially higher-order)functions using purely func-tional expressions.Finally,the middle layer is a coordination language that links functions into possibly concurrent processes.Exceptions transcend the layers.They are declared by exception<id> <type>within declarations,raised by raise<id><expr>within expres-sions and handled by boxes.1.3.2The Hume Expression LanguageThe Hume expression language is a purely functional language with a strict se-mantics.It is intended to be used for the description of single,one-shot,non-reentrant processes.It is deterministic,and has statically bounded time and space behaviour.In order to achieve this,expressions that are not the target of dynamic timeouts must be restricted to statically-checkable primitive recursive forms.This allows the construction of verifiable static cost analyses.Dynamically bounded time and space expressions(within-expressions)specify that evaluation of the3associated expression must complete within the specified constant time or space. Failure to do so causes the Timeout or HeapOverflow exception to be raised.Note that the expression language has no concept of external,imperative state. Such state considerations are encapsulated entirely within the coordination lan-guage.1.3.3The Hume Coordination LanguageThe Hume coordination language is afinite state language for the description of multiple,interacting,re-entrant processes built from the purely functional ex-pression layer.The coordination language is designed to have statically provable properties that include both process equivalence and safety properties such as the absence of deadlock,livelock or resource starvation.The coordination language also inherits properties from the expression language that is embedded within it.The basic unit of coordination is the box,an abstract notion of a process that specifies the links between its input and output channels in terms of functional expressions,and which provides exception handling facilities including timeouts and system exceptions The coordination language is responsible for interaction with external,imperative state through streams and ports that are ultimately con-nected to external devices.1.3.4The Hume Declaration LanguageThe declaration language introduces types and values that scope over either or both the coordination and expression languages.The coordination language is embedded in the declaration language through box and wiring declarations while the expression language is embedded through function and value declarations. While it is possible to define recursive and mutually recursive functions,it is not possible to do the same for simple values.1.4EXAMPLE PROGRAMS1.4.1Parity CheckingOurfirst example is a simple even parity checker whose input is wired to a com-munications port(comm1),and whose output is wired to its second input.The initial parity value is defined to be true.box even_parityin(b::word1,p::bool)out(p’::bool)match(0,true)->true|(1,true)->false|(0,false)->false4|(1,false)->true;wire even_parity(comm1,even_parity.p’initially true)(even_parity.p);1.4.2ChecksumThe second example calculates the checksum of256input numbers.It returns a string which is either"OK"or an error message.This example illustrates(some-what artificially)the use of exceptions and handlers.type short=int16;type long=int32;box checksumin(n::short,sum::long,count::short)out(sum’::long,count’::short,message::string)handles CheckSumErrormatch(n,sum,256)->(0,0,if n=sum then"OK"else raise CheckSumError())|(n,sum,count)->(sum+n,count+1,"")handleCheckSumError_->(0,0,"Checksum error:sum="+tostring(sum)+"checksum="+tostring(n));wire checksum(comm1,checksum.sum’initially0,checksum.count’initially0)(checksum.sum,checksum.count,stdout); 1.5COSTING HUME PROGRAMSFigures1.1–1.6give rules that can be used to statically derive a bounded time cost forfirst-order Hume programs(we are independently working on time sys-tems for higher-order programs[12]).The rules are a simple big-step operational semantics,with extensions to timeouts and exceptions.We have deliberately used a very simple cost semantics here rather than the more general form that we are developing as part of our work in parallel computation[12].The latter is intended for arbitrary recursion and requires the use of a constraint solving engine,whereas Hume is restricted to primitive recursive forms.It would be straightforward to al-ter these rules to cover space rather than time cost,as required.The cost rules specified here derive notation from the Hume static and dynamic semantics,which is in turn derived from that used for Haskell and Standard ML.5E decls E Costni1i n EE’j decl i E i c ij1E var exp cE var∞matches c cE body CostE matches c c E handlers cE matches time handles min t c c cFIGURE1.2.Cost axioms for boxesOnly two forms of declaration are interesting.Variable declarations are evaluatedat the start of a declaration block.They therefore have afixed dynamic cost.The cost of a function(of the form var matches)is defined as the cost of matching allpatterns in the set of matches,plus the cost of the most expensive expression on the right-hand-sides of the matches.This therefore computes an upper bound on the cost of evaluating a function body.Recursive definitions are assigned infinitecost.The cost is bound to the function name in the environment so that the cost of each and every function call is correctly calculated.The result of a declaration sequence is the union of the environment recording function costsFigure1.2gives the cost of a single box iteration.This is defined similarly to the cost of a function definition,but including the upper bound of the exceptionhandling cost.There are two cases.If no timeout clause is given,then the matches must befinite.Otherwise the cost is the lesser of the timeout and the actual execution cost plus the handler cost.Figures1.3–1.4give cost rules for expressions.The cost of a function ap-plication is the cost of evaluating the body of the function plus the cost of each argument.The cost of building a new constructor such as a list or tuple is thecost of evaluating the arguments to that constructor plus one for each argument (representing the cost of building each heap cell).Note that the cost of a non-nulllist exp1exp n includes the cost of thefinal,and is therefore greater than the equivalently sized tuple exp1exp n or vector exp1exp n.The cost of a coercing an expression to a given type using an as-expression dependson the original type of the expression and the one that it is coerced to.There is,of course,no cost for a simple type cast.The cost of raising an exception is the cost of the enclosed expression plus one(representing the cost of actually throwing that exception).Finally the cost of an expression enclosed within a timeout is the minimum of the expression cost and the specified timeout.Figure1.5gives costs for pattern matches.Two values are returned from a match sequence.Thefirst is the total cost of matching all the patterns in the match sequence.This places an upper bound on the match cost.The second is the cost of evaluating the most expensive right-hand-side.This places an upper7E exp Costn0E var exp1exp n∑n i1c i cE con exp1exp n∑n i1c i1E exp1exp n∑n i1c i n1E exp1exp n∑n i1c i nE exp1exp n∑n i1c i nFIGURE 1.3.Cost axioms for expressions(variables,applications and construc-tors)bound on the expression cost.The cost of matching multiple patterns in a single match is the sum of matching each pattern.Wildcard patterns(E exp Cost E exp c E match cE if exp1then exp2else exp3c1max c2c3E decls E’c E E’exp cE exnid exp c1value exp2t E exp1cE Matches Cost CostE match c c E matches c cE pat1pat n exp∑n i1c i cE pat CostE con pat1pat n∑n i1c i1i1i n E pat i c iE0i1i n E pat i c iE0i1i n E pat i c iE1FIGURE1.5.Cost axioms for pattern matchesapproach separatesfinite data structures such as tuples from potentially infinite structures such as streams.This allows the definition of functions that are guar-anteed to be primitive recursive.In contrast with Hume,it is necessary to identify functions that may be more generally recursive.Ada is widely used for embedded systems,and many tools have been con-structed to assist the understanding of space and time behaviour[2].Compared with ANSI standard Ada,Hume provides much higher level of abstraction with a far more rigorously defined semantics,which is specifically designed to support cost semantics.Finally,there has been recent interest in using variants of Java as the ba-sis for embedded systems,though to our knowledge there is as yet no specif-ically safety-critical design.Two interesting variants are Embedded Java[17]10E Handlers CostE pat c E exp cE handler handlers max c cFIGURE1.6.Cost axioms for exception handlersand RTJava[9],for soft real-time applications.Like Hume,both languages sup-port dynamic memory allocation with automatic garbage collection and provide strong exception handling mechanism.The primary differences from Hume are the incorporation of arbitrary recursion in all cases,an absence of formal design principles,the use of a single-layered approach in which coordination is merged with computation,and of course the use of an object-oriented expression language rather than one that is purely functional.We believe that the design choices made here are more suitable for applications where safety is paramount.For example, the use of purely functional rather than dynamically-linked object-oriented design allows straightforward static reasoning about the meaning of programs,at the cost of convenience in modifying a running system.1.7CONCLUSIONThis paper has introduced the Hume language,a novel functionally-inspired lan-guage that is geared towards safety critical applications.A key feature of the design is the incorporation of a purely functional expression language that pro-vides strong guarantees of functional correctness with equally strong guarantees of time and space behaviour.To our knowledge,this is thefirst fully-featured functional language that supports such guarantees.To illustrate the key features of our language design,we have produced a sim-ple boundedfirst-order time cost analysis.The analysis makes a number of sim-plifying assumptions,such as costing all heap allocations,exception raises and variable bindings identically(as1in this set of definitions).It would,however, be straightforward to parameterise the analysis on more realistic costs,or even to modify it so as to calculate space rather than time costs.Clearly,Hume is still at an early stage of development.None of our language tools has yet been constructed and important design details are still to be resolved. We feel,however,that the properties designed into the language should yield long-term benefits in the safety-critical arena.11REFERENCES[1]W.Ackermann,Solvable Cases of the Decision Problem,North Holland,1954.[2]J.G.P.Barnes,High Integrity Ada:the Spark Approach,Addison-Wesley,1997.[3]C.Clack,“The Demense Model for Bounded Automatic Memory Management”,Un-published Report,University College,London,2000.[4]D.Gelernter,and N.Carriero,“Coordination Languages and Their Significance”,CACM,32(2),February,1992,pp.97–107.[5]A.D.Gordon,S.Marlow and S.L.Peyton Jones,“A Semantics for Imprecise Excep-tions”,Proc.Symposium on Principles of Programming Languages(POPL2000), January2000.[6]M.G.Hinchey and J.P.Bowen,Applications of Formal Methods,Prentice-Hall,1995.[7]International Organisation for Standardisation,“ISO:Information Processing Sys-tems—Open Systems Interconnection—LOTOS,A Formal Description Technique based on the Temporal Ordering of Observational Behaviour”,ISO8807,Geneva, August1988.[8]International Organisation for Standardisation,“ISO:Information Processing Sys-tems—Open Systems Interconnection—Estelle,A Formal Description Technique based on an Extended State Transition Model”,ISO9074,Geneva,May1989. [9]D.Jensen,“‘Real-time Java’.What could and what should it mean?”URL:/carnahan/real-time/vocab/ rt-java。
Beginner's Guide to Fault Tolerant Power Supplies
Hot-swap
Some equipment is operated 24 hours a day, 7 days a week, allowing no time to bring the system down for maintenance. In this case the failed power supply must be “swapped” out and a new one inserted without disruption to equipment operation.
If you take Lambda’s recommendation to invest a little extra money up front to make your power system more secure, test your system to make sure you have it right!
Current share
Some power systems employ a method of balancing the current between the power supplies to increase field life. This can be an electronic signal wire that links the power supplies together or a switch* on the power supply that initiates a slight drop in the output voltage as more current is drawn. (*Common on high power DIN rail units)
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
3 Fault-Tolerant Adaptive Storage System 22 3.1 Small Load Implies Some Resiliency Against a Non-adaptive Adversary . . . 23 3.2 A Generic Scheme for a Fault-Tolerant Storage System With a Constant Blowup Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Adaptive Storage System Based on the ‘And-Or’ System . . . . . . . . . . . 25 3.3.1 Adaptive Read Strategy for Finding a Complete And Set. . . . . . . 25 3.3.2 Adaptive Read Strategy in the Presence of Faults. . . . . . . . . . . . 28 3.3.3 Dynamically Adjusting the Retrieval Scheme to the Number of Faults. 30 4 Dynamic Implementation of the And-Or Storage System 4.1 Some Properties of the Distance Halving Dynamic Network . . . . . . . . . . 4.2 Dynamic fault-tolerant Scheme . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Network Implementation of Storage and Retrieval . . . . . . . . . . . 32 34 35 36
Acknowledgements
I would like to thank my advisor Moni Naor for many useful conversations and for inspiring me with ideas. I am also very grateful for Moni’s help in turning my research results into a formal paper. I would like to thank Gera Weiss for many useful discussions and comments. Last but not least, I would like to thank Ilan Gronau for his extensive comments on a draft of this thesis, which led to significant improvements in the presentation and for many useful discussions. Finally I would like to thank all the colleagues and friends that made my time at the Weizmann Institute a great experience.
1
Contents
1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . 1.2 Definitions and Preliminaries . . . . . . . . . . 1.3 A Simple Example of a Fault-Tolerant Storage 1.4 Related Work . . . . . . . . . . . . . . . . . . 1.5 Results and Thesis Organization . . . . . . . . 1.5.1 Organization . . . . . . . . . . . . . . 5 5 6 8 8 10 11 13 13 17 20
2
5 The And-Or Quorum System In a Random Faults Model 5.1 Motivation and Definitions . . . . . . . . . . . . . . . . . . . . . 5.2 Algorithmic Probe Complexity of the And-Or Quorum System. 5.2.1 A Non-adaptive Algorithm. . . . . . . . . . . . . . . . . 5.2.2 An Adaptive Algorithm . . . . . . . . . . . . . . . . . . 5.3 The And-Or Quorum System on a Balanced Binary Tree . . . .
. . . . . .
. . . . . .
. . . . . .
2 Storage Systems with Non-Adaptive Retrieval 2.1 The Non-Adaptive Retrieval Model . . . . . . . . . . . . . . . . . . . . . . . 2.2 Read-Write Tradeoff of a Storage System . . . . . . . . . . . . . . . . . . . . 2.3 Bounding the Blowup-Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . .
Abstract We deal with storage systems and quorum systems for a dynamic environment where servers may join and leave the system. We suggest a file storage system construction based on the √ √ ‘And-Or’ quorum system, that has a O( n) write complexity, O( n log n) read complexity and a constant data blowup-ratio, where n represents the number of processors in the network. Our construction is fault-tolerant against an adversary that can crash θ(n) processors of her choice while having slightly less adaptive queries than the reader. When both the legitimate reader and the adversary are nonadaptive we derive lower bounds on the read complexity, write complexity and data blowup ratio. We show these bounds are tight using a simple storage system construction, based on an -intersecting quorum system. In the random-fault model we show that the And-Or quorum system possesses optimal algorithmic probe complexity for both non-adaptive and adaptive readers. The non-adaptive √ algorithm for finding a live quorum achieves a O( n log n) algorithmic probe complexity which matches a lower bound of Naor and Wieder [20]. The adaptive algorithm finds a live √ quorum with a probe complexity which is linear in the size of a quorum (O( n)) and requires at most O(log log n) rounds. To the best of our knowledge, this is the smallest number of √ rounds in which a live quorum can be found, using only O( n) probes, for a system with an optimal load. Last we present an adaptation of the above storage system and a quorum system for a dynamic environment. Both are based on the And-Or tree and a dynamic overlay network that emulates the De-Bruijn network. These adaptations maintain the good properties of the above constructions (e.g., fault-tolerance, load and availability). The algorithms suggested for the maintenance of these dynamic data structures are strongly coupled with the routing scheme of the network. This fact enables the use of gossip protocols which saves in message complexity and keeps the protocols simple and local.