WS-ReplicationResource Replicated Resources in Grid Environments.




WS-ReplicationResource: Replicated Resources in GridEnvironments.Manuel Salvadores1, Pilar Herrero2, María S. Pérez2, Alberto Sanchez21IMCS, Imbert Management Consulting SolutionsC/ Fray Juan Gil 7, 28002 Madrid , Spainmso@imcs.es2Facultad de Informática – Universidad Politécnica de MadridCampus de Montegancedo S/N28.660 Boadilla del Monte, Madrid, Spain{pherrero, mperez, ascampos}@fi.upm.esAbstract. This paper presents the √N + ROWA model that is been develop atthe Universidad Politécnica de Madrid with the aim of replicating informationin Grid environments and optimizing the number of messages to be exchangedin the process. Our approach is one of the key stones of a new grid service(WS-ReplicationResource) that in the near future will provide Grid systemswith a high level of transancionality of actions to be carried out inside theenvironment.1. IntroductionThe last tendencies of WSRF (Web Services Resource Framework) specifications oriented towards Grid Computing Systems development and their implementation in Globus Toolkit 4 (GT4) [13] will mark, in the near future, the main development lines around middleware to support Grid applications based on the OGSA standard [1].OGSA enumerate those characteristics that Grid systems have to possess. The high-availability1 plays an important role among all these characteristics. The replication concept is close related to the availability concept, being one of the techniques more employed for failure recovery.The WS-ResourceProperties specification [2], as a part of WSRF specifications, define an standard way to exchange messages that could allow a client consult or update the values of the properties associated to each specific resource.A resource could be defined as the Web Service (WS) that having a set of properties, defined by the WS-ResourceProperties, and being its state the combination of all the values associated to all these properties at a given moment, can maintain this state through the WS-Addressing [14]1 OGSA specification, point 2.102 Manuel Salvadores1, Pilar Herrero2, María S. Pérez2, Alberto Sanchez2To have the information related to each of these specific resources replicated will be quite useful not just to allow a high availability in the system but also to design new collaborative models as well as to introduce complex negotiation models and mechanisms based on agents.On the other hand, this model of synchronisation could be extended to high-scale transactional systems as component integrated inside a framework, such as DCP-Grid[11, 12].In this paper we present our specification, the one we have called WS-ReplicationResource, extending all the functionalities of WS-ResourceProperties to allow the replication of the properties of a WS through the nodes connected to a Grid infrastructure.This paper has been organized as follows. The next section exposes the motivation of this work. The paper continues with a brief discussion about the related work on the area, our approach and the scalability of the system under our approach. The paper concludes with a section to present the paper’s conclusions and the ongoing and future work.2. MotivationFour operations are defined in the WS-ResourceProperties specification to access to the resource’s properties:1. GetResourceProperty: to obtain the property’s value2. GetMultipleResourceProperty: to obtain the value associated to several properties in just one operation.3. SetResourceProperties: to create update properties in just one resource.4. QueryResourceProperties: to carry out some queries, related to a specific resource’s properties, through XPath [15].Taking into account all these operations, our motivation is to propose a decentralised scenario in which each of the nodes could be the queries’ or updates receptor over an specific replicated resource, having an idea about which of the resource’s properties could be accessed at a given moment and making all this possible in an autonomous way.Figure 1 presents the initial scenario to be solved taking into account i nodes N={N 1,N 2,N 3, …, N i }. Each of these nodes could also be the receptor of reading requests as well as writing. In order to ensure the casual order (fairness)[10] of the actions to be carried out, we could represent each of the actions as a tuple (a, t), where ‘a’ represents the action to be carried out in the moment ‘t’. In this way, if A is a sequences of 4 actions (N=4), A could be represented as:A={ (a1,t1), (a2,t2), (a3,t3) ,(a4,t4) }(1)The casual order would imply to introduce new constraints such as: 111,2,3...i i i i i N t t A before A ++∀=<→ (2)WS-ReplicationResource: Replicated Resources in Grid Environments. 3 In the figure 1 it is possible to appreciate the problematic situation caused by action3 due to its counter’s value which should be 10. The casual order in the execution of reading writing operations could be solved by a controlled access to the resources in mutual exclusion.Fig. 1. Scenario: Four actions are brought about a replicated resource, WS-ReplicationResource must ensure the causal order3. Related WorkOne of the first algorithms utilised for the access in a mutual exclusion is the Ricart y Agrawala algorithm [3]. Ricart and Agrawala's Algorithm solves the synchronization problem in distributed systems. This algorithm insures that only one process will be allowed in a critical region at a time. It works by a using a system of messages and acknowledgements. The sending of a message is assumed to be reliable; that is, every message is acknowledged. The algorithm works as follows:When a process wants to enter a critical region, it builds a message containing thename of the critical region it wants to enter, its process number and the current time.It then sends the message to all the other processes including itself. When a process receives a request message from another process, the action it takes depends on itsstate with respect to the critical region named in the message. There are three possible states:•If the receiver is not in the critical region and does not want to enter it, it sends back an OK message to the sender (shown as Ready state inworkbench).4 Manuel Salvadores1, Pilar Herrero2, María S. Pérez2, Alberto Sanchez2•If the receiver is already in the critical region, it does not reply. Instead, it queues the request (shown as In CS state in workbench).•If the receiver wants to enter the critical region, but has not yet done so, it compares the timestamp in the incoming message with the one contained inthe message that it has sent everyone. The lowest one wins. If the incomingmessage is lower, the receiver sends back an OK message. If its ownmessage has a lower timestamp, the receiver queues the incoming messageand sends nothing (shown as waiting state in workbench).After sending out requests asking permission to enter a critical region, a process sits back and waits until everyone else has given permission. As soon as all the permissions are in, it may enter the critical region. When it exits the critical region, it sends OK messages to all processes on its queue and deletes them all from the queue. This algorithm will grow up proportionally to the number of nodes needed because it would be necessary: 2*(N-1) messages, being N the number of nodes, to became to an agreement in the critical section; (N-1) messages to let the rest of the nodes know that I would like to access to the critical section; and (N-1) answers from he rest of the nodes to give the final approval.The algorithms based on Quorums [8], are the optimised option to access to critical sections in distributed systems, where Quorum could be defined as:“Let S = {S1, S2, …} be a set of sites. A quorum system Q is a set of subsets of S with pair-wise non-null intersection. Each element of Q is called a quorum”.For example, if we have four sites, S1, S2, S3 and S4. A possible quorum system then consists of these three quorums: {S1, S2, S3}, {S2, S3, S4} and {S1, S4}, although there are many other possible quorum systems for these four sites.In systems based on Quorums a process can access to the critical section if an only if it obtains the premise of all the elements of its own Quorum. This is a way of reducing, considerably, the number of messages.Quorums could be combined with the ROWA technique [5] (Read One-Write All). This technique introduces a difference in between writing and reading operations as follow:•Read Operations: read from any site. If a site is down, try another site.•Write operations: write to all sites. If any site rejects the write, abort the transaction.But the ROWA technique is not working if one of the repliques fails, and therefore the combination with Quorums is one of the techniques more useful.As for the combination Quorums + ROWA [6], it is important to highlight that in this case there are two different kinds of Quorums: writing (wq) and reading (rq). Both of them will have the following constraints in the Read-One-Write-All (ROWA) approach:•Logical read on data item x is converted to physical read on any of its copies (“read one”)•Logical write of x is translated to physical write on all of the copies of x.On the other hand, the different topologies and negotiation politics inside the Quorums allow to distinguish among numerous types of different Quorums, being some of them:WS-ReplicationResource: Replicated Resources in Grid Environments. 5 • Majority o Consensus [8]: Uses voting to reach consensus. Each site has anassigned weight (number of votes), and quorums are defined so that number of needed votes exceeds half of the total (majority). If n be the sum of all assigned weights, then read (rq) and write (wq) quorums must then fulfill these constraints:2 * |wq| > n and |rq| + |wq| > nand the Minimum quorum sizes that work:|wq|=n/2+1 |rq|=n/2• Tree: A generalization of Majority Quorum. The main idea is to organize thesites into a hierarchy. This hierarchy is represented as a complete tree where physical sites appear at the leaves of the tree. At each level (starting at the root level) of the tree, a majority of tree nodes must be chosen. For each node chosen at level i, a majority of nodes at level i+1 must be chosen. A write quorum consists of the root of the tree, a majority of its children, a majority of the children of each children, etc. A read quorum consists of the root of the tree. If the root is unavailable, the read quorum consists of a majority of its children, and so recursively [9]• Grid: There are different kinds of Grid quorum, such as the rectangular or thetriangular. In the rectangular,, a read quorum consists of an element of each column (|rq| = c) A write quorum requires an entire column and one element from each of the remaining columns (|wq| = r + c - 1) If the grid is a square |rq| = vn |wq| = 2 * vn -1 [7]Finally, the √N algorithm, being N the number of nodes, is based on the association of nodes in N minimum subsets with no null intersection (between each two of them) [4]:,1,,i j i j i j N S S ∀≤≤≠∅I (3)Taking into account the different methods and algorithms presented here, in the next section we will introduce our approach which is based on the association of two of these algorithms: √N algorithm [4] and the ROWA technique [5].4. The √N + ROWA ModelInitially we will identify two components to be introduced inside each and every node to be aware of those nodes that are in the system and are part of the S i Quorum. Once a node has this information, it can subscribe itself, through the WS-Notification [17], to one of the properties content inside WS-ReplicationResource specification. This property will be working as traffic light (mutex) to control the access to the resource through the model we are going to describe in this section (see figure 3).6 Manuel Salvadores1, Pilar Herrero2, María S. Pérez2, Alberto Sanchez2Fig. 2. The √N + ROWA model architectureOur approach, the one we called √N+ ROWA model, takes into account the √N protocol [4] and applying the ROWA protocol to control the access based on the different types of operations (writing and reading). It makes also distinguish in between the reading and the writing quorums (rq and wq respectively). On the other hand, in order to replicate the data through the nodes, our approach will consider two key factors:a) The impact that the “laze propagation ” technique will have over the model b) The scalability of the systemIn our model, when an i-node wish to carry out an writing operation, it requires the votes of the quorum S i and the writing information will be replicated only in those elements of its quorum. In order to carry out a writing/reading operation over S j being i ≠j and i j z S S N =I , the node N z will have to send S j the updated modifications over the synchronised element before giving S j its vote.In the figure 4, it is possible to appreciate the “Lazy Propagation” effect because the operation 1 (write request over the node 2), which requires the obtaining of wq, repliques the writing operation only to the rest of the S 2 nodes (operation 2). In this moment t_counter is increased in the S 2 nodes. When the node 4 receives a read request (operation 3), and while this node is negotiating this request, the nodes 4 and 2 detect t_counter 2 > t_counter 4 and therefore they would need to update their values (operation 4). Something similar would happen if the node 1 receives a writing request, operation 6.WS-ReplicationResource: Replicated Resources in Grid Environments. 7Fig. 3. √N+ ROWA model interaction based on S1...S4 Quorums and 4 nodesTaking into account all this possible situations, we could define that a system is replicabily stable if:,1,__i j i j i j N t counter t counter ∀≤≤→= (4)Although the system stability is an important issue in this kind of systems, another important factor is to keep this system stability while the system scalability increases. In the next section, we will present a study that we have performed with the aim of modelling not just reading/writing operations to be carried out in the system, but also the cost of the replication model to became as stable as we have defined in this paper by the formulas 3 and 4.5. ScalabilityOur first work hypothesis, in this study, will be that there are not possible collisions in the system, leaving this case of study as future research work, and the second one will be that the time to process one operation is much lower that the number of transactions per unit of time. That is:_()0.proc n pet t p colision sg <<<→ (5)This implies that the system has time enough to recover from a lock for accessing to a resource before receiving another request.8 Manuel Salvadores1, Pilar Herrero2, María S. Pérez2, Alberto Sanchez2The average of messages to be sent would be the addition of (k-1) from the operation request, (k-1) answers, (k-1) to the replication (only in a writing request) and (k-1) only if the last operation was carried out in another Quorum (see equation6). To see equation 6 demonstrations go to [4](1)(1)()(1)(_)()(1)m k k p w k p change q p w k =−+−+−+− (6) Being p(change_q) the probability that two sequential operations are executed in different quorums, p(w) the probability that that the operation is writing and k the quorum length.Moreover, if N is the number of nodes and K is the size of the intersection set no linear between each of them, then the equation 7 will complement to the previous one:21N k k =−+[4] (7)On the other hand, if want that the computational process will take advantage of the balance capability, the likelihood of Quorum’s change p(change_q) should follow the equation 8:1(_)N p change q N −= (8)Equations 6, 7 and 8 will allow us to obtain a the average message exchange as a function of the quorum length and, if the likelihood for a writing request is p(w) = 0.2, we could represent this function as it is showed in figure 5. The average message exchange for different quorums’ length is showed in table 1.Fig. 4. Average message exchange applying a lazy propagation for those Quorums whichlength is 0..100WS-ReplicationResource: Replicated Resources in Grid Environments. 9 Number Nodes Quorum Length (K) Messages Exchange381 20 421561 40 863541 60 1306321 80 174 Table 1. Average message exchange for different quorums’ length.In the table 1 it is possible to appreciate the system scalability. When the numberof nodes is close to 381, the average of messages to be exchanged to access to the exclusion mutual area is close to 42, being this value quite acceptable. However, if the number of nodes increases to 1561 (four times), the number of messages to be exchanged will be double. As it is possible to appreciate in the table 1 and figure 4,the number of messages to be exchanged is proportional to the Quorums’ length (K). However, the quorums’ length will be almost the square root of the number of nodes (scalability) (see equation 7).6. Conclusions and Future WorkThis paper describes a model that is been carried out at the UniversidadPolitécnica de Madrid with the aim of replicating the information optimizing the number of messages to be exchanged as well as their use to built Grid environments based on WSRF specifications. The approach presented in this paper will be one ofthe pillars of a new grid service: WS-ReplicationResource.As ongoing work, we are currently working on its implementation and in a near future we are planning the deployment in a large scale grid infrastructure, providinghigh level of transancionality of actions to be carried out inside the environment. References1.Foster, I. et al: The Open Grid Services Architecture, Version 1.0./documents/GWD-I-E/GFD-I.030.pdf Consulted in June2005.2.Graham S., Treadwell J.: Web Services Resource Properties 1.2 (WS-ResourceProperties), Working Draft 04, 10 June 2004. http://docs.oasis-/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-04.pdf3.Ricart G., Agrawala A.K: An optimal algorithm for mutual exclusion in computernetworks. Commun. ACM 24, 627-628, Jan. 1981.4.M. Maekawa: "A sqrt(N) algorithm for mutual exclusion in decentralized systems",ACM Transactions on Computer Systems, Vol.3, No. 2, pp.145-159. 1985.5.R. Jiménez-Peris, M. Patiño-Martínez, G. Alonso, B. Kemme. How to Select aReplication Protocol According to Scalability, Availability, and CommunicationOverhead. 20th IEEE Int. Conf. on Reliable Distributed Systems, SRDS'01, pp. 24-33, New Orleans, Oct. 2001.10 Manuel Salvadores1, Pilar Herrero2, María S. Pérez2, Alberto Sanchez26.P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control andRecovery in Database Systems. Addison Wesley, Reading, MA, 1987.7.Shun Yan Cheung, Mostafa H. Ammar, Mustaque Ahamad: The Grid Protocol: AHigh Performance Scheme for Maintaining Replicated Data. IEEE Trans. Knowl.Data Eng. 4(6): 582-592, 1992.8.R. H. Thomas. A Majority Consensus Approach to Concurrency Control forMultiple Copy Databases. ACM Transactions on Database Systems, 4(9):180–209, June 1979.9.D. Agrawal and A. E. Abbadi. The Tree Quorum Protocol: An Efficient Approachfor Managing Replicated Data. In Proc. Of the 16th VLDB Conf., Brisbane, Australia, 1990.10.K. Birman, Building Secure and Reliable Network Applications, ch. 18, Manning,1996.11.Manuel Salvadores, Pilar Herrero, María S. Pérez , Víctor Robles DCP-Grid, AFramework for Concurrent Distributed Transactions on Grid Environment The First International Workshop on “Knowledge and Data Mining Grid” (KDMG’05) On the 3RD Atlantic Web Intelligence Conference 2005 (AWIC'05), LNAI 3528- 0498, Lodz, Polonia 2005.12.Manuel Salvadores, Pilar Herrero, María S. Pérez , Víctor Robles DCP-Grid, AFramework for Conversational Distributed Transactions on Grid Environment International Workshop on Grid Computing Security and Resource Management (GSRM'05) In conjunction with the International Conference on Computational Science 2005(ICCS 2005), LNCS 3516, Emory University Atlanta, USA, May 200513.Globus Alliance, Globus Toolkit: /toolkit/ Consulted in June2005.14.Worl Wide Web Consortium, WS-Addressing: /Submission/ws-addressing/ Consulted in June 2005.15.World Wide Consortium, XPath: /TR/xpath Consulted in June2005.16.Birman, K., "The process group approach to reliable distributed computing".Communications of the ACM, December 1993.17.WS-Notification, IMB Developer Works http://www-/developerworks/library/specification/ws-notification/ Consulted in June 2005.。
