Algorithmic randomness of closed sets
A Peer-to-Peer Spatial Cloaking Algorithm for AnonymousLocation-based Services∗Chi-Yin Chow Department of Computer Science and Engineering University of Minnesota Minneapolis,MN cchow@ Mohamed F.MokbelDepartment of ComputerScience and EngineeringUniversity of MinnesotaMinneapolis,MNmokbel@Xuan LiuIBM Thomas J.WatsonResearch CenterHawthorne,NYxuanliu@ABSTRACTThis paper tackles a major privacy threat in current location-based services where users have to report their ex-act locations to the database server in order to obtain their desired services.For example,a mobile user asking about her nearest restaurant has to report her exact location.With untrusted service providers,reporting private location in-formation may lead to several privacy threats.In this pa-per,we present a peer-to-peer(P2P)spatial cloaking algo-rithm in which mobile and stationary users can entertain location-based services without revealing their exact loca-tion information.The main idea is that before requesting any location-based service,the mobile user will form a group from her peers via single-hop communication and/or multi-hop routing.Then,the spatial cloaked area is computed as the region that covers the entire group of peers.Two modes of operations are supported within the proposed P2P spa-tial cloaking algorithm,namely,the on-demand mode and the proactive mode.Experimental results show that the P2P spatial cloaking algorithm operated in the on-demand mode has lower communication cost and better quality of services than the proactive mode,but the on-demand incurs longer response time.Categories and Subject Descriptors:H.2.8[Database Applications]:Spatial databases and GISGeneral Terms:Algorithms and Experimentation. Keywords:Mobile computing,location-based services,lo-cation privacy and spatial cloaking.1.INTRODUCTIONThe emergence of state-of-the-art location-detection de-vices,e.g.,cellular phones,global positioning system(GPS) devices,and radio-frequency identification(RFID)chips re-sults in a location-dependent information access paradigm,∗This work is supported in part by the Grants-in-Aid of Re-search,Artistry,and Scholarship,University of Minnesota. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.ACM-GIS’06,November10-11,2006,Arlington,Virginia,USA. Copyright2006ACM1-59593-529-0/06/0011...$5.00.known as location-based services(LBS)[30].In LBS,mobile users have the ability to issue location-based queries to the location-based database server.Examples of such queries include“where is my nearest gas station”,“what are the restaurants within one mile of my location”,and“what is the traffic condition within ten minutes of my route”.To get the precise answer of these queries,the user has to pro-vide her exact location information to the database server. With untrustworthy servers,adversaries may access sensi-tive information about specific individuals based on their location information and issued queries.For example,an adversary may check a user’s habit and interest by knowing the places she visits and the time of each visit,or someone can track the locations of his ex-friends.In fact,in many cases,GPS devices have been used in stalking personal lo-cations[12,39].To tackle this major privacy concern,three centralized privacy-preserving frameworks are proposed for LBS[13,14,31],in which a trusted third party is used as a middleware to blur user locations into spatial regions to achieve k-anonymity,i.e.,a user is indistinguishable among other k−1users.The centralized privacy-preserving frame-work possesses the following shortcomings:1)The central-ized trusted third party could be the system bottleneck or single point of failure.2)Since the centralized third party has the complete knowledge of the location information and queries of all users,it may pose a serious privacy threat when the third party is attacked by adversaries.In this paper,we propose a peer-to-peer(P2P)spatial cloaking algorithm.Mobile users adopting the P2P spatial cloaking algorithm can protect their privacy without seeking help from any centralized third party.Other than the short-comings of the centralized approach,our work is also moti-vated by the following facts:1)The computation power and storage capacity of most mobile devices have been improv-ing at a fast pace.2)P2P communication technologies,such as IEEE802.11and Bluetooth,have been widely deployed.3)Many new applications based on P2P information shar-ing have rapidly taken shape,e.g.,cooperative information access[9,32]and P2P spatio-temporal query processing[20, 24].Figure1gives an illustrative example of P2P spatial cloak-ing.The mobile user A wants tofind her nearest gas station while beingfive anonymous,i.e.,the user is indistinguish-able amongfive users.Thus,the mobile user A has to look around andfind other four peers to collaborate as a group. In this example,the four peers are B,C,D,and E.Then, the mobile user A cloaks her exact location into a spatialA B CDEBase Stationregion that covers the entire group of mobile users A ,B ,C ,D ,and E .The mobile user A randomly selects one of the mobile users within the group as an agent .In the ex-ample given in Figure 1,the mobile user D is selected as an agent.Then,the mobile user A sends her query (i.e.,what is the nearest gas station)along with her cloaked spa-tial region to the agent.The agent forwards the query to the location-based database server through a base station.Since the location-based database server processes the query based on the cloaked spatial region,it can only give a list of candidate answers that includes the actual answers and some false positives.After the agent receives the candidate answers,it forwards the candidate answers to the mobile user A .Finally,the mobile user A gets the actual answer by filtering out all the false positives.The proposed P2P spatial cloaking algorithm can operate in two modes:on-demand and proactive .In the on-demand mode,mobile clients execute the cloaking algorithm when they need to access information from the location-based database server.On the other side,in the proactive mode,mobile clients periodically look around to find the desired number of peers.Thus,they can cloak their exact locations into spatial regions whenever they want to retrieve informa-tion from the location-based database server.In general,the contributions of this paper can be summarized as follows:1.We introduce a distributed system architecture for pro-viding anonymous location-based services (LBS)for mobile users.2.We propose the first P2P spatial cloaking algorithm for mobile users to entertain high quality location-based services without compromising their privacy.3.We provide experimental evidence that our proposed algorithm is efficient in terms of the response time,is scalable to large numbers of mobile clients,and is effective as it provides high-quality services for mobile clients without the need of exact location information.The rest of this paper is organized as follows.Section 2highlights the related work.The system model of the P2P spatial cloaking algorithm is presented in Section 3.The P2P spatial cloaking algorithm is described in Section 4.Section 5discusses the integration of the P2P spatial cloak-ing algorithm with privacy-aware location-based database servers.Section 6depicts the experimental evaluation of the P2P spatial cloaking algorithm.Finally,Section 7con-cludes this paper.2.RELATED WORKThe k -anonymity model [37,38]has been widely used in maintaining privacy in databases [5,26,27,28].The main idea is to have each tuple in the table as k -anonymous,i.e.,indistinguishable among other k −1tuples.Although we aim for the similar k -anonymity model for the P2P spatial cloaking algorithm,none of these techniques can be applied to protect user privacy for LBS,mainly for the following four reasons:1)These techniques preserve the privacy of the stored data.In our model,we aim not to store the data at all.Instead,we store perturbed versions of the data.Thus,data privacy is managed before storing the data.2)These approaches protect the data not the queries.In anonymous LBS,we aim to protect the user who issues the query to the location-based database server.For example,a mobile user who wants to ask about her nearest gas station needs to pro-tect her location while the location information of the gas station is not protected.3)These approaches guarantee the k -anonymity for a snapshot of the database.In LBS,the user location is continuously changing.Such dynamic be-havior calls for continuous maintenance of the k -anonymity model.(4)These approaches assume a unified k -anonymity requirement for all the stored records.In our P2P spatial cloaking algorithm,k -anonymity is a user-specified privacy requirement which may have a different value for each user.Motivated by the privacy threats of location-detection de-vices [1,4,6,40],several research efforts are dedicated to protect the locations of mobile users (e.g.,false dummies [23],landmark objects [18],and location perturbation [10,13,14]).The most closed approaches to ours are two centralized spatial cloaking algorithms,namely,the spatio-temporal cloaking [14]and the CliqueCloak algorithm [13],and one decentralized privacy-preserving algorithm [23].The spatio-temporal cloaking algorithm [14]assumes that all users have the same k -anonymity requirements.Furthermore,it lacks the scalability because it deals with each single request of each user individually.The CliqueCloak algorithm [13]as-sumes a different k -anonymity requirement for each user.However,since it has large computation overhead,it is lim-ited to a small k -anonymity requirement,i.e.,k is from 5to 10.A decentralized privacy-preserving algorithm is proposed for LBS [23].The main idea is that the mobile client sends a set of false locations,called dummies ,along with its true location to the location-based database server.However,the disadvantages of using dummies are threefold.First,the user has to generate realistic dummies to pre-vent the adversary from guessing its true location.Second,the location-based database server wastes a lot of resources to process the dummies.Finally,the adversary may esti-mate the user location by using cellular positioning tech-niques [34],e.g.,the time-of-arrival (TOA),the time differ-ence of arrival (TDOA)and the direction of arrival (DOA).Although several existing distributed group formation al-gorithms can be used to find peers in a mobile environment,they are not designed for privacy preserving in LBS.Some algorithms are limited to only finding the neighboring peers,e.g.,lowest-ID [11],largest-connectivity (degree)[33]and mobility-based clustering algorithms [2,25].When a mo-bile user with a strict privacy requirement,i.e.,the value of k −1is larger than the number of neighboring peers,it has to enlist other peers for help via multi-hop routing.Other algorithms do not have this limitation,but they are designed for grouping stable mobile clients together to facil-Location-based Database ServerDatabase ServerDatabase ServerFigure 2:The system architectureitate efficient data replica allocation,e.g.,dynamic connec-tivity based group algorithm [16]and mobility-based clus-tering algorithm,called DRAM [19].Our work is different from these approaches in that we propose a P2P spatial cloaking algorithm that is dedicated for mobile users to dis-cover other k −1peers via single-hop communication and/or via multi-hop routing,in order to preserve user privacy in LBS.3.SYSTEM MODELFigure 2depicts the system architecture for the pro-posed P2P spatial cloaking algorithm which contains two main components:mobile clients and location-based data-base server .Each mobile client has its own privacy profile that specifies its desired level of privacy.A privacy profile includes two parameters,k and A min ,k indicates that the user wants to be k -anonymous,i.e.,indistinguishable among k users,while A min specifies the minimum resolution of the cloaked spatial region.The larger the value of k and A min ,the more strict privacy requirements a user needs.Mobile users have the ability to change their privacy profile at any time.Our employed privacy profile matches the privacy re-quirements of mobiles users as depicted by several social science studies (e.g.,see [4,15,17,22,29]).In this architecture,each mobile user is equipped with two wireless network interface cards;one of them is dedicated to communicate with the location-based database server through the base station,while the other one is devoted to the communication with other peers.A similar multi-interface technique has been used to implement IP multi-homing for stream control transmission protocol (SCTP),in which a machine is installed with multiple network in-terface cards,and each assigned a different IP address [36].Similarly,in mobile P2P cooperation environment,mobile users have a network connection to access information from the server,e.g.,through a wireless modem or a base station,and the mobile users also have the ability to communicate with other peers via a wireless LAN,e.g.,IEEE 802.11or Bluetooth [9,24,32].Furthermore,each mobile client is equipped with a positioning device, e.g.,GPS or sensor-based local positioning systems,to determine its current lo-cation information.4.P2P SPATIAL CLOAKINGIn this section,we present the data structure and the P2P spatial cloaking algorithm.Then,we describe two operation modes of the algorithm:on-demand and proactive .4.1Data StructureThe entire system area is divided into grid.The mobile client communicates with each other to discover other k −1peers,in order to achieve the k -anonymity requirement.TheAlgorithm 1P2P Spatial Cloaking:Request Originator m 1:Function P2PCloaking-Originator (h ,k )2://Phase 1:Peer searching phase 3:The hop distance h is set to h4:The set of discovered peers T is set to {∅},and the number ofdiscovered peers k =|T |=05:while k <k −1do6:Broadcast a FORM GROUP request with the parameter h (Al-gorithm 2gives the response of each peer p that receives this request)7:T is the set of peers that respond back to m by executingAlgorithm 28:k =|T |;9:if k <k −1then 10:if T =T then 11:Suspend the request 12:end if 13:h ←h +1;14:T ←T ;15:end if 16:end while17://Phase 2:Location adjustment phase 18:for all T i ∈T do19:|mT i .p |←the greatest possible distance between m and T i .pby considering the timestamp of T i .p ’s reply and maximum speed20:end for21://Phase 3:Spatial cloaking phase22:Form a group with k −1peers having the smallest |mp |23:h ←the largest hop distance h p of the selected k −1peers 24:Determine a grid area A that covers the entire group 25:if A <A min then26:Extend the area of A till it covers A min 27:end if28:Randomly select a mobile client of the group as an agent 29:Forward the query and A to the agentmobile client can thus blur its exact location into a cloaked spatial region that is the minimum grid area covering the k −1peers and itself,and satisfies A min as well.The grid area is represented by the ID of the left-bottom and right-top cells,i.e.,(l,b )and (r,t ).In addition,each mobile client maintains a parameter h that is the required hop distance of the last peer searching.The initial value of h is equal to one.4.2AlgorithmFigure 3gives a running example for the P2P spatial cloaking algorithm.There are 15mobile clients,m 1to m 15,represented as solid circles.m 8is the request originator,other black circles represent the mobile clients received the request from m 8.The dotted circles represent the commu-nication range of the mobile client,and the arrow represents the movement direction.Algorithms 1and 2give the pseudo code for the request originator (denoted as m )and the re-quest receivers (denoted as p ),respectively.In general,the algorithm consists of the following three phases:Phase 1:Peer searching phase .The request origina-tor m wants to retrieve information from the location-based database server.m first sets h to h ,a set of discovered peers T to {∅}and the number of discovered peers k to zero,i.e.,|T |.(Lines 3to 4in Algorithm 1).Then,m broadcasts a FORM GROUP request along with a message sequence ID and the hop distance h to its neighboring peers (Line 6in Algorithm 1).m listens to the network and waits for the reply from its neighboring peers.Algorithm 2describes how a peer p responds to the FORM GROUP request along with a hop distance h and aFigure3:P2P spatial cloaking algorithm.Algorithm2P2P Spatial Cloaking:Request Receiver p1:Function P2PCloaking-Receiver(h)2://Let r be the request forwarder3:if the request is duplicate then4:Reply r with an ACK message5:return;6:end if7:h p←1;8:if h=1then9:Send the tuple T=<p,(x p,y p),v maxp ,t p,h p>to r10:else11:h←h−1;12:Broadcast a FORM GROUP request with the parameter h 13:T p is the set of peers that respond back to p14:for all T i∈T p do15:T i.h p←T i.h p+1;16:end for17:T p←T p∪{<p,(x p,y p),v maxp ,t p,h p>};18:Send T p back to r19:end ifmessage sequence ID from another peer(denoted as r)that is either the request originator or the forwarder of the re-quest.First,p checks if it is a duplicate request based on the message sequence ID.If it is a duplicate request,it sim-ply replies r with an ACK message without processing the request.Otherwise,p processes the request based on the value of h:Case1:h= 1.p turns in a tuple that contains its ID,current location,maximum movement speed,a timestamp and a hop distance(it is set to one),i.e.,< p,(x p,y p),v max p,t p,h p>,to r(Line9in Algorithm2). Case2:h> 1.p decrements h and broadcasts the FORM GROUP request with the updated h and the origi-nal message sequence ID to its neighboring peers.p keeps listening to the network,until it collects the replies from all its neighboring peers.After that,p increments the h p of each collected tuple,and then it appends its own tuple to the collected tuples T p.Finally,it sends T p back to r (Lines11to18in Algorithm2).After m collects the tuples T from its neighboring peers, if m cannotfind other k−1peers with a hop distance of h,it increments h and re-broadcasts the FORM GROUP request along with a new message sequence ID and h.m repeatedly increments h till itfinds other k−1peers(Lines6to14in Algorithm1).However,if mfinds the same set of peers in two consecutive broadcasts,i.e.,with hop distances h and h+1,there are not enough connected peers for m.Thus, m has to relax its privacy profile,i.e.,use a smaller value of k,or to be suspended for a period of time(Line11in Algorithm1).Figures3(a)and3(b)depict single-hop and multi-hop peer searching in our running example,respectively.In Fig-ure3(a),the request originator,m8,(e.g.,k=5)canfind k−1peers via single-hop communication,so m8sets h=1. Since h=1,its neighboring peers,m5,m6,m7,m9,m10, and m11,will not further broadcast the FORM GROUP re-quest.On the other hand,in Figure3(b),m8does not connect to k−1peers directly,so it has to set h>1.Thus, its neighboring peers,m7,m10,and m11,will broadcast the FORM GROUP request along with a decremented hop dis-tance,i.e.,h=h−1,and the original message sequence ID to their neighboring peers.Phase2:Location adjustment phase.Since the peer keeps moving,we have to capture the movement between the time when the peer sends its tuple and the current time. For each received tuple from a peer p,the request originator, m,determines the greatest possible distance between them by an equation,|mp |=|mp|+(t c−t p)×v max p,where |mp|is the Euclidean distance between m and p at time t p,i.e.,|mp|=(x m−x p)2+(y m−y p)2,t c is the currenttime,t p is the timestamp of the tuple and v maxpis the maximum speed of p(Lines18to20in Algorithm1).In this paper,a conservative approach is used to determine the distance,because we assume that the peer will move with the maximum speed in any direction.If p gives its movement direction,m has the ability to determine a more precise distance between them.Figure3(c)illustrates that,for each discovered peer,the circle represents the largest region where the peer can lo-cate at time t c.The greatest possible distance between the request originator m8and its discovered peer,m5,m6,m7, m9,m10,or m11is represented by a dotted line.For exam-ple,the distance of the line m8m 11is the greatest possible distance between m8and m11at time t c,i.e.,|m8m 11|. Phase3:Spatial cloaking phase.In this phase,the request originator,m,forms a virtual group with the k−1 nearest peers,based on the greatest possible distance be-tween them(Line22in Algorithm1).To adapt to the dynamic network topology and k-anonymity requirement, m sets h to the largest value of h p of the selected k−1 peers(Line15in Algorithm1).Then,m determines the minimum grid area A covering the entire group(Line24in Algorithm1).If the area of A is less than A min,m extends A,until it satisfies A min(Lines25to27in Algorithm1). Figure3(c)gives the k−1nearest peers,m6,m7,m10,and m11to the request originator,m8.For example,the privacy profile of m8is(k=5,A min=20cells),and the required cloaked spatial region of m8is represented by a bold rectan-gle,as depicted in Figure3(d).To issue the query to the location-based database server anonymously,m randomly selects a mobile client in the group as an agent(Line28in Algorithm1).Then,m sendsthe query along with the cloaked spatial region,i.e.,A,to the agent(Line29in Algorithm1).The agent forwards thequery to the location-based database server.After the serverprocesses the query with respect to the cloaked spatial re-gion,it sends a list of candidate answers back to the agent.The agent forwards the candidate answer to m,and then mfilters out the false positives from the candidate answers. 4.3Modes of OperationsThe P2P spatial cloaking algorithm can operate in twomodes,on-demand and proactive.The on-demand mode:The mobile client only executesthe algorithm when it needs to retrieve information from the location-based database server.The algorithm operatedin the on-demand mode generally incurs less communica-tion overhead than the proactive mode,because the mobileclient only executes the algorithm when necessary.However,it suffers from a longer response time than the algorithm op-erated in the proactive mode.The proactive mode:The mobile client adopting theproactive mode periodically executes the algorithm in back-ground.The mobile client can cloak its location into a spa-tial region immediately,once it wants to communicate withthe location-based database server.The proactive mode pro-vides a better response time than the on-demand mode,but it generally incurs higher communication overhead and giveslower quality of service than the on-demand mode.5.ANONYMOUS LOCATION-BASEDSERVICESHaving the spatial cloaked region as an output form Algo-rithm1,the mobile user m sends her request to the location-based server through an agent p that is randomly selected.Existing location-based database servers can support onlyexact point locations rather than cloaked regions.In or-der to be able to work with a spatial region,location-basedservers need to be equipped with a privacy-aware queryprocessor(e.g.,see[29,31]).The main idea of the privacy-aware query processor is to return a list of candidate answerrather than the exact query answer.Then,the mobile user m willfilter the candidate list to eliminate its false positives andfind its exact answer.The tighter the spatial cloaked re-gion,the lower is the size of the candidate answer,and hencethe better is the performance of the privacy-aware query processor.However,tight cloaked regions may represent re-laxed privacy constrained.Thus,a trade-offbetween the user privacy and the quality of service can be achieved[31]. Figure4(a)depicts such scenario by showing the data stored at the server side.There are32target objects,i.e., gas stations,T1to T32represented as black circles,the shaded area represents the spatial cloaked area of the mo-bile client who issued the query.For clarification,the actual mobile client location is plotted in Figure4(a)as a black square inside the cloaked area.However,such information is neither stored at the server side nor revealed to the server. The privacy-aware query processor determines a range that includes all target objects that are possibly contributing to the answer given that the actual location of the mobile client could be anywhere within the shaded area.The range is rep-resented as a bold rectangle,as depicted in Figure4(b).The server sends a list of candidate answers,i.e.,T8,T12,T13, T16,T17,T21,and T22,back to the agent.The agent next for-(a)Server Side(b)Client SideFigure4:Anonymous location-based services wards the candidate answers to the requesting mobile client either through single-hop communication or through multi-hop routing.Finally,the mobile client can get the actualanswer,i.e.,T13,byfiltering out the false positives from thecandidate answers.The algorithmic details of the privacy-aware query proces-sor is beyond the scope of this paper.Interested readers are referred to[31]for more details.6.EXPERIMENTAL RESULTSIn this section,we evaluate and compare the scalabilityand efficiency of the P2P spatial cloaking algorithm in boththe on-demand and proactive modes with respect to the av-erage response time per query,the average number of mes-sages per query,and the size of the returned candidate an-swers from the location-based database server.The queryresponse time in the on-demand mode is defined as the timeelapsed between a mobile client starting to search k−1peersand receiving the candidate answers from the agent.On theother hand,the query response time in the proactive mode is defined as the time elapsed between a mobile client startingto forward its query along with the cloaked spatial regionto the agent and receiving the candidate answers from theagent.The simulation model is implemented in C++usingCSIM[35].In all the experiments in this section,we consider an in-dividual random walk model that is based on“random way-point”model[7,8].At the beginning,the mobile clientsare randomly distributed in a spatial space of1,000×1,000square meters,in which a uniform grid structure of100×100cells is constructed.Each mobile client randomly chooses itsown destination in the space with a randomly determined speed s from a uniform distribution U(v min,v max).When the mobile client reaches the destination,it comes to a stand-still for one second to determine its next destination.Afterthat,the mobile client moves towards its new destinationwith another speed.All the mobile clients repeat this move-ment behavior during the simulation.The time interval be-tween two consecutive queries generated by a mobile client follows an exponential distribution with a mean of ten sec-onds.All the experiments consider one half-duplex wirelesschannel for a mobile client to communicate with its peers with a total bandwidth of2Mbps and a transmission range of250meters.When a mobile client wants to communicate with other peers or the location-based database server,it has to wait if the requested channel is busy.In the simulated mobile environment,there is a centralized location-based database server,and one wireless communication channel between the location-based database server and the mobile。
The basic problem is to quantify the randomness of a single real number;here The authors wish to thank Jack Lutz and Joe Miller for helpful discussions,and the referee for comments that greatly improved the paper.Much of the contents of this paper was discussed during the AIM workshop on Effective Randomness in August,2006.A preliminary version of this paper appeared in the Proceedings of CIE2006[2]Research partially supported by National Science Foundation grants DMS0532644, 0554841and0652732.Keywords:Computability,Randomness,Π01Classes1we will extend this problem to the randomness of the set of paths through a finitely-branching tree.Early in the last century,von Mises [30]suggested that a random real should obey reasonable statistical tests,such as having a roughly equal number of zeroes and ones of the first n bits,in the limit.Thus a random real would be stochastic in modern parlance.If one considers only computable tests,then there are countably many and one can construct a real satisfying all tests.An early approach to randomness was through betting.Effective betting on a random sequence should not allow one’s capital to grow unboundedly.The betting strategies used are constructive martingales,introduced by Ville [29]and implicit in the work of Levy [21],which represent fair double-or-nothing gambling.Martin-L¨o f [23]observed that stochastic properties could be viewed as special kinds of measure zero sets and defined a random real as one which avoids certain effectively presented measure 0sets.That is,a real x ∈2N is Martin-L¨o f randomif for every effective sequence S 1,S 2,...of sets with µ(S n )≤2−n ,x /∈ n S n .It is easy to see that this is equivalent to the condition that we get if we replace 2−n above with q n for a computable sequence (q i )of rationals such that lim i q i =0.At the same time Kolmogorov [17]defined a notion of randomness for fi-nite strings based on the concept of incompressibility .The stronger notion of prefix-free complexity was developed by Levin [20],G´a cs [16]and Chaitin [9]and extended to infinite words.Schnorr later proved [26]that the notions of constructive martingale randomness,Martin-L¨o f randomness,and prefix-free randomness are equivalent.In this paper we want to consider algorithmic randomness on the space C of nonempty closed subsets P of 2N .Some definitions are needed.Fix a finite alphabet A ={0,1,...,k −1}=k ;we will make use of the alphabets {0,1}and {0,1,2}.For a finite string σ∈A n ,let |σ|=n .Let λdenote the empty string,which has length 0.A word (a )of length 1is may be identified with the symbol a .For two strings σ,τ,say that τextends σand write σ τif |σ|≤|τ|and σ(i )=τ(i )for i <|σ|.Similarly σ x for x ∈2N means that σ(i )=x (i )for i <|σ|.Let σ τdenote the concatenation of σand τ.Let X n =(x (0),...,x (n −1)).Now a nonempty closed set P may be identified with a tree T P ⊆A ∗as follows.For a finite string σ,let I (σ)denote {x ∈2N :σ⊂x }.Then T P ={σ:P ∩I (σ)=∅}.Note that T P has no dead ends,that is if σ∈T P then either σ 0∈T P or σ 1∈T P .For an arbitrary tree T ⊆A ∗,let [T ]denote the set of infinite paths through T ,that is,x ∈[T ]⇐⇒(∀n )x n ∈T.It is well-known that P ⊆2N is a closed set if and only if P =[T ]for some tree T .P is a Π01class,or effectively closed set,if P =[T ]for some computabletree T .Note that if P is a Π01class,then T P is a Π01set,but not in generalcomputable.P is said to be a decidable Π01class if T P is computable.P is said tobe a strong Π02class ,if T P is a Π02set,or equivalently if P =[T ]for some ∆02tree;2P is said to be a strong∆02class if T P is∆02.Thus anyΠ01class is also a strong ∆02class.Any decidableΠ01class contains a computable element(in particular the leftmost and rightmost paths)and similarly any strong∆02class contains a∆02element.On the other hand,there existΠ01classes with no computable elements and strongΠ02classes with no∆02elements.The complement of aΠ01 class is sometimes called a set.There is a natural effective enumeration P0,P1,...of theΠ01classes and thus an enumeration of the sets.Thus we can say that a sequence S0,S1,...of sets is effective if there is a computable function,f,such that S n=2N−P f(n)for all n.For a detailed development ofΠ01classes,see[7] or[8].For background and terminology on computable functions and computably enumerable sets,see[27].The betting approach to randomness is formalized as follows:Definition1.1(Ville[29]).(i)A martingale is a function m:k<ω→[0,∞)such that for allσ∈k<ω,m(σ)=1kk−1i=0m(σ i).(ii)A martingale m succeeds on X∈k N iflim supn→∞d(X n)=∞.That is,the betting strategy results in an unbounded amount of money made on the k-ary infinite sequence X.(iii)The success set of m is the set S∞[m]of all sequences on which m succeeds.That is,a martingale on2<ωis the capital function of a fair double-or-nothing betting strategy.When working on3<ωthe strategy is triple-or-nothing. Definition1.2.A martingale m is constructive(effective,c.e.)if it is lower semi-computable;that is,if there is a computable functionˆm:k<omega×N→Q such that(i)for allσand t,ˆm(σ,t)≤ˆm(σ,t+1)<m(σ),and(ii)for allσ,lim t→∞ˆm(σ,t)=m(σ).In other words,m(w)is approximated from below by rationals uniformly in w.A sequence in k N is constructive martingale random if no constructive martingale succeeds on it.Someflexibility may be gained by also considering nonmonotonic martin-gales;i.e.,martingales which bet on the bits of a sequence out of order.While for a monotonic martingale only the amount of the next bet is determined from the bits seen previously,for a nonmonotonic martingale both the amount and3the location of the next bet are determined from the bits seen previously (the next bit may precede them,follow them,or lie in the middle).These martingales must obey two rules:the standard fair-betting rule that monotonic martingales obey,and the rule that they never bet on the same bit twice.We refer the reader to Downey and Hirschfeldt [11]for the formal definition.Although a priori allowing nonmonotonic martingales strengthens the no-tion of randomness,since more strategies must be defeated,in fact in the they are equivalent.Muchnik,Semenov,and Uspensky [24](Theorem 8.9)show that ML-random sequences defeat all computable nonmonotonic martin-gales (in fact they show this with respect to general measures,not just the coin-toss measure).The proof does not depend on the computability of the martingale,however;the martingale is used to define a Martin-L¨o f test which may be enumerated equally well alongside the enumeration of the martingale.Therefore,as defeating all c.e.nonmonotonic martingales is clearly sufficient to be ML-random,the two are equivalent.Prefix-free randomness for reals is defined as follows.A Turing machine M which takes inputs from A ∗,where A is a finite alphabet,is called prefix-free if it has prefix-free domain dom (M );that is,if σ τare strings in dom (M ),then σmust equal τ.For any finite string τ,the prefix-free complexity of τwith respect to M isK M (τ)=min {|σ|,∞:M (σ)=τ}.There is a universal prefix-free function U such that,for any prefix-free M ,there is a constant c such that for all τK U (τ)≤K M (τ)+c.We let K (τ)=K U (τ)and call it the prefix-free complexity of τ.Then x is called prefix-free random if there is a constant c such that K (x n )≥n −c for all n .This means that the initial segments of x are not compressible .The equivalence of these three notions of randomness (via tests,betting or incompressibility)is a result of Schnorr [26]and is a fundamental result in the theory of algorithmic randomness.While these definitions and results are usually given for binary strings and sequences,they carry over to k -ary strings and sequences as well.See for example Calude [5,6].The following lemma will be needed.Lemma 1.3.If P is a Π01class of measure 0,then P has no random elements.Proof.Let T be a computable tree such that P =[T ],and for each n ,letP n = {I (σ):σ∈T ∩{0,1}n }.Then {P n }n ∈N is an effective sequence ofclopen sets with P = n P n and lim n µ(P n )=µ(P )=0.Furthermore,µ(P n )=2−n |T ∩{0,1}n |and is therefore a computable sequence.Thus {P n }n ∈N is a Martin-L¨o f test,showing that P has no random elements.We will want to use the following result from the literature [30].4Theorem1.4(Von-Mises–Church–Wald Computable Selection The-orem).For any random sequence x and any computable1-1function g,the sequence z(n)=x(g(n))is random.2Martin-L¨o f Randomness of Closed SetsIn this section,we define a measure on the space C of nonempty closed subsets of2N and use this to define the notion of randomness for closed sets.We then obtain several properties of random closed sets.An effective one-to-one correspondence between the space C and the space 3N is defined as follows.Let a closed set Q be given and let T=T Q be the tree without dead ends such that Q=[T].Define the code x=x Q∈{0,1,2}N for Q as follows.Letλ=σ0,σ1,σ2,... enumerate the elements of T in order,first by length and then lexicographically. We now define x=x Q=x T by recursion as follows.For each n,x(n)=2if σ n0andσ n1are both in T,x(n)=1ifσ n0/∈T andσ n1∈T and x(n)=0 ifσ n0∈T andσ n1/∈T.For example,if Q={0,1}N,then x Q=(2,2,...) and if Q={y},then x Q=y.Let Q x denote the unique closed set Q such that x Q=x.Now define the measureµ∗on C byµ∗(X)=µ({x Q:Q∈X}).Informally this means that givenσ∈T Q,there is probability13that bothσ 0∈T Q andσ 1∈T Q and,for i=0,1,there is probability13that onlyσ i∈T Q.In particular,this means that Q∩I(σ)=∅implies that for i=0,1,Q∩I(σ i)=∅with probability23.Let us comment briefly on why some other natural representations were re-jected.Supposefirst that we simply enumerate all strings in{0,1}∗asσ0,σ1,... and then represent T by its characteristic function so that x T(n)=1⇐⇒σn∈T.Then in general a code x might not represent a tree.That is,once we have (01)/∈T we cannot later decide that(011)∈T.Suppose then that we allow the empty closed set by using codes x∈{0,1,2,3}∗and modify our original definition as follows.Let x(n)=i have the same definition as above for i≤2 but let x(n)=3mean that neitherσ n0norσ 1is in rmally,thiswould mean that for i=0,1,σ∈T implies thatσ i∈T with probability12.The advantage here is that we can now represent all trees.But this is also a disadvantage,since for a given closed set P,there are many different trees T with P=[T].The second problem with this approach is that we would have [T]=∅with positive probability.We briefly return to this subject in Section6.Now we will say that a closed set Q is(Martin-L¨o f)random if the code x Q is Martin-L¨o f random.This definition clearly relativizes to any oracle in accordance with the definitions of relative randomness in the Cantor space. Since random reals exist,it follows that random closed sets exists.Furthermore, there are∆02random reals,so we have the following.5Theorem 2.1.There exists a random closed set Q such that TQ is ∆02.Note that if T Q is ∆02,then Q must contain ∆02elements (in particular theleftmost path).Since there exist strong Π02classes with no ∆02elements,thereare strong Π02classes Q such that T Q is not ∆02.The following lemma will be needed throughout.Lemma 2.2.For any Q ⊆2N which is either closed or open,µ∗({P :P ⊆Q })≤µ(Q ).Proof.Let P C (Q )denote {P :P ⊆Q }.We first prove the result for nonempty clopen sets U in place of Q by the following induction.Suppose U = σ∈S I (σ),where S ⊆{0,1}n .For n =1,either µ(U )=1=µ∗(P C (U ))or µ(U )=12and µ∗(P C (Q ))=13.For the induction step,let S i ={σ:i σ∈S },letU i = σ∈S i I (σ),let u i =µ(U i )and let v i =µ∗(P C (U i )),for i =0,1.Then considering the three cases in which S includes both initial branches or just one,we calculate that µ∗(P C (U ))=13(v 0+v 1+v 0v 1).Thus by induction we haveµ∗(P C (U ))≤13(u 0+u 1+u 0u 1).Now2u 0u 1≤u 20+u 21≤u 0+u 1,and thereforeµ∗(P C (U ))≤13(u 0+u 1+u 0u 1)≤12(u 0+u 1)=µ(U ).For a closed set Q ,let Q = n U n ,where U n is clopen and U n +1⊆U n for alln .Then P ⊂Q if and only if P ⊆U n for all n .ThusP C (Q )= n P C (U n ),so thatµ∗(P C (Q ))=lim n →∞µ∗(P C (U n ))≤lim n →∞µ(U n )=µ(Q ).Finally,for an open set Q ,let Q = n U n be the union of an increasing sequenceof clopen sets U n .Then,by compactness,P C (Q )= nP C (U n ),so thatµ∗(P C (Q ))=lim n →∞µ∗(P C (U n ))≤lim n →∞µ(U n )=µ(Q ).This completes the proof of the lemma.6Next we will consider the intersection of a random closed set with an interval I(σ)and the disjoint union of random closed sets.First recall van Lambalgen’s theorem.Theorem2.3(van Lambalgen[28]).The following are equivalent.1.A⊕B is n-random.2.A is n-random and B is n-A-random.3.B is n-random and A is n-B-random.4.A is n-B-random and B is n-A-random.Let us call the coding of a closed set Q by the nodes of its representative tree with no dead ends the canonical code of Q.We wish now to introduce a second method of coding,the ghost code.A ghost code of Q is an infinite ternary string whose terms correspond to all nodes of2<ωin lexicographical order.The terms corresponding to the nodes of Q’s tree(the“canonical nodes”)agree with the corresponding terms in the canonical code;the remaining“ghost nodes”may hold any values.Ghost codes are non-unique,and every closed set has a non-random ghost code(if the closed set itself is random take the code with ghost nodes all equal to zero,say).This method of coding is more convenient for some purposes;for example,we will use it to show that if Q0,Q1are closed sets and Q={0 x:x∈Q0}∪{1 x:x∈Q1},Q is random if and only if the Q i are random relative to each other.The utility of the ghost codes rests on the following correspondence.Theorem2.4.The canonical code of a closed set Q⊆2N is random if and only if Q has some random ghost code.Furthermore,for any y,the canonical code r is y-random if and only if Q has a ghost code which is y-random. Proof.(⇐)Suppose the canonical code of Q is nonrandom.Then there is a c.e. martingale m that succeeds on it.From any initial segmentσof a ghost code g for Q,the subsequenceˆσof exactly the canonical nodes ofσis computable. Therefore it is computable whether the bit of g afterσis canonical or ghost. From m,define the martingale m which bets as follows:m (σ i)=m(ˆσ i)next bit is a canonical node m (σ)next bit is a ghost node.That is,m holds its money on ghost nodes and bets identically to m on canon-ical nodes.It is clear that m succeeds on the ghost code g and thus g is nonrandom.(⇒)Now suppose the canonical code r for Q is random,and let q be an infinite ternary string that is random relative to r(and so by Theorem2.3r⊕q is random).We claim the ghost code g obtained by using the bits of r as the canonical nodes and the bits of q in their original order as the ghost nodes is random.It is clear that g is a ghost code for Q.7Suppose m is a c.e.martingale that bets on g.From m it is straightforward to define a nonmonotonic martingale m which mimics m’s bets exactly but performs them on r⊕q,succeeding whenever m succeeds.As r and q were chosen to be relatively random,this will show g is random.As discussed previously,from g n it is computable whether g(n)will be a ghost node or a canonical node,and which position in g or r it occupies in either case.Therefore,assuming the bits seen so far may be assembled into an initial segmentσof g,m takes the values m(σ i),i<3,as its bets on the corresponding bit of r or g,whichever is appropriate.Having seen that bit,then,it can assemble a(|σ|+1)-length initial segment of g and repeat the process.As m makes identical bets to m and has identical outcomes,since it cannot succeed on r⊕g,m cannot succeed on g and g is random.To relativize(⇒),suppose that r is y-random,so that r⊕y is random by Van Lambalgen’s Theorem2.3.Then in the proof simply choose q to be random relative to r⊕y,and then g will be random relative to y.The other direction relativizes in a straightforward way.The primary purpose of the ghost codes is to remove the dependence on the particular closed set under discussion when interpreting bits of the code as nodes of the tree.This is especially useful when subdividing the tree,as in the following definition.Definition2.5.The tree join of closed sets P0and P1is the closed setQ={0 x:x∈P0}∪{1 x:x∈P1}.Given ghost codes r0,r1for the P i,their tree join r0 r1is the code for Q with the corresponding ghost node values.The standard recursion-theoretic join is defined byr0⊕r1=(r0(0),r1(0),r0(1),r1(1),...).We wish to relate the recursion-theoretic join and the tree join.Lemma2.6.Given two ghost codes r0,r1,the tree join r0 r1is random if and only if the recursion theoretic join r0⊕r1is random.Proof.It is clear that there is a computable permutationπwhich uniformly maps any tree join r0 r1to the recursion-theoretic join r0⊕r1.That is,in r0⊕r1,the entries of r0and r1alternate,whereas r0 r1starts with a2,followed by blocks from r0and r1,as follows.First r0(0),r1(0),then r0(1),r0(2),r1(1), r1(2),and continuing with pairs of blocks of size4,8and so on.The result now follows from the Von-Mises–Church–Wald Computable Selection Theorem 1.4.We now obtain the following corollary of Theorems2.3and2.4and Lemma 2.6.8Corollary2.7.Suppose P i,i=0,1,are closed sets with canonical codes r iand let P be the tree join of P0,P1.Then P is random if and only if r0⊕r1israndom.Proof.(⇐)Suppose that r0⊕r1is random.Then by Theorem2.3,r0and r1are mutually relatively random.By Theorem2.4,P0has a ghost code g0whichis random relative to r1,and so also vice-versa,and then P1has a ghost codeg1which is random relative to g0.Again by2.3,the recursion-theoretic joing0⊕g1is random,so by Theorem2.6the tree join g0 g1is also random,and hence P possesses a random ghost code and is random.(⇒)Suppose now that P is random,and therefore possesses a random ghost codeg.The code g may be thought of as a tree join g0 g1,which is therefore random, and so by Theorem2.6,g0⊕g1is random.By Theorem2.3,the individual codes g0,g1are therefore mutually relatively random.Now by the relatived version of Theorem2.4,r0is random relative to g1.But r1is computable from g1and hence r0is random relative to r1as well.Similarly,r1is r0-random and thus, again by2.3,r⊕r1is random.3Members of Random Closed SetsFor anyfinite stringσof length n,the probability that a closed set Q meetsI(σ)is(23)n.For a computable real y,the sqeuence{Q:Q∩I(y n)=∅}thus forms a Martin-L¨o f test in the space C of closed sets,which shows that y does not belong to any Martin-L¨o f random closed set.That is,for each n,{x:Q x∩I(y n)=∅}is a set and has measure(23)n in{0,1,2}N,where Q x is the closed set with code x.We omit the details,since we will now prove a stronger result.For any computable,non-decreasing function f,we say that a realβ∈{0,1}N is f-c.e.if there exists a computable approximating functionφsuch that,for all i∈N,(i)φ(i,0)=0;(ii)lim sφ(i,s)=β(i);(iii){s:φ(i,s+1)=φ(i,s)}has cardinality≤f(i).The reals which are f-c.e.for some computable function f are part of the well-known Ershov hierarchy[14,27].Theorem 3.1.Suppose that f is computable and bounded by a polynomial. Then no random closed set has any f-c.e.paths.Proof.Let f be as above,βan f-c.e.real and P a closed set containingβ.Let φbe the f-approximating function forβ.Also let M n⊆{0,1}n be the set of differentφ-approximations toβ n during the stages.9A priori,|M n |is exponential.However,for a fixed n ,β n can change at most i<n f (i )times,so |M n |is also bounded by a polynomial,i.e.there is k ∈N such that for almost all n ,|M n |<n k .Now letS n = σ∈M n{P |P ∈C &P ∩I (σ)=∅}.(1)Then (S n )is a uniformly c.e.sequence of open sets in the space C of closed sets of 2N and for all n ,P ∈S n .Also for almost all n ,µ∗(S n )≤ σ∈M nµ∗({P |P ∈C &P ∩I (σ)=∅})=|M n |· 23 n ≤n k · 23 n .Since lim n [n k ·(23)n ]=0there is a computable subsequence of (S n )which is aMartin-L¨o f test and so P is notrandom.For any K -trivial real A and any unbounded nondecreasing computable function h ,A is h -c.e.(Nies [25]).Thus it follows from Theorem 3.1that a random closed set can have no K -trivial paths.We observe that Theorem 3.1cannot be extended to ω general,because there are left-c.e.(and hence ω-c.e.)random reals,and by Theorem 3.9each of these belongs to a random closed set.The following theorem uses a method which was used in [18]to show that every random real is effectively bi-immune.Theorem 3.2.If Q is a random closed set,then Q has no isolated elements.Proof.Let Q =[T ]and suppose by way of contradiction that Q contains an isolated path x .Then there is some node σ∈T such that Q ∩I (σ)={x }.For each n ,letS n ={P ∈C :|{τ∈{0,1}n :P ∩I (σ τ)=∅}|=1}.That is,P ∈S n if and only if the tree T P has exactly one extension of σof length n +|σ|.It follows that|P ∩I (σ)|=1⇐⇒(∀n )P ∈S nNow for each n ,S n is a clopen set in C and again by induction,S n has measure (23)n .Thus the sequence S 0,S 1, a Martin-L¨o f test.It follows that for some n ,Q /∈S n .Thus there are at least two extensions in T Q of σof length n +|σ|,contradicting the assumption that x was the unique element of Q ∩I (σ).Corollary 3.3.If Q is a random closed set,then Q is perfect and hence has continuum manyelements.Theorem 3.4.Every random closed set contains a random element.10Proof.Suppose that a closed set Q has no random element and consider the following Martin-L¨o f test on the space C:U i={P|P∈C and P⊆V i}where(V i)is a universal Martin-L¨o f test on the Cantor space.By Lemma2.2,µ∗(U i)≤µ(V i)≤2−i so that(U i)is a Martin-L¨o f test on C.But Q∈∩i U i,so Q is notrandom.The previous results might suggest that every element of a random closed set is a random real.However,it turns out that every random closed set contains a non-random real.We need the following classic result of Chernoff[10](a version of Bernoulli’s Weak Law of Large Numbers)here and also for another theorem to follow.See [22]for an exposition.Lemma3.5(Chernoff).Let E be an event which we will refer to as‘success’. If E occurs with probability p,then for any natural numbers n and anyεwith 0≤ε≤1,the probability that out of n mutually independent trials,the number of successes differs from pn by>εpn is≤2−ε2pn/3.Theorem3.6.Not every element of a random closed set is random;in partic-ular,the leftmost and rightmost paths in a random closed set are not random reals.Proof.We will show that,for a random closed set Q,the leftmost path is notstochastically random,that is,the asymptotic frequency of0’s is23.Since aneffectively random real in2N must have asymptotic frequence of12for0’s and1’s,this will suffice to prove that the leftmost path is not random.We define a Martin-L¨o f test as follows.Fix a rationalεsuch that0<ε<1.For each n,let S n be the family of closed sets(that is,codes for closed sets)such that thefirstn bits of the leftmost path have either<23(1−ε)n,or>23(1+ε)n occurrencesof0.By the definition of our probability measure,we haveµ∗(S n)=|m−23n|>23εnnm23m13n−m.It now follows from Chernoff’s Lemma3.5thatµ∗(S n)≤2e−ε22n/9.Thus the measures of the test sets S n have effective limit zero.It is easy to see that the sequence{S n}is computably enumerable.For each n,S n is a clopen set and in fact the union of thefinite family of intervals I(σ)in C such thatσcodes a tree up to level n in which the leftmost path has either<23(1−ε)n,or>23(1+ε)n occurrences of0.Furthermore,S n=p≥nS p is also a Martin-L¨o f test.It follows that for anyrandom closed set Q,and anyε>0,there is an n such that for all m≥n,thefrequency of0’s in thefirst m bits of the leftmost path is always withinεof23.Thus the leftmost path is not effectively random.11Recall that the leftmost and rightmost elements of any strong∆02closed set are∆02.Given Theorems3.4and3.6,we ask:Does a∆02random closed set contain a∆02random path?Theorem3.7.Every random strong∆02closed set contains a random∆02real. Proof.Let Q be a random strong∆02class.By Theorem3.4,Q contains a random real x.Let P be aΠ01class in the Cantor space which contains only randoms and contains x(this exists since the class of random reals is an effective union ofΠ01classes).Note that P∩Q is a non-empty strong∆02class and it follows that the leftmost path of P∩Q is a∆02real which must be random since it belongs to P.Note that the above theorem does not combine with the low basis theorem to establish the existence of a low random real in any random strong∆02class. Thus we pose the question of whether for any random closed set Q,if T Q is low, then Q has a low random element.Next we want tofind a random closed set which does not contain a∆02path. Now it is easy[7,8]to construct a strongΠ02class P of positive measure which contains no∆02elements;of course P must contain a random real since it has measure1.The difficult problem is to construct a random strongΠ02class with no∆02elements.We have the following result in this direction,which yields a random strong∆03closed set with no∆02elements.Theorem3.8.For any set A there is an A-random closed set Q such that T Q≤T A but Q has no elements≤T A .Proof.It is enough if we prove the claim for A=∅because the argument relativises to any oracle A in a straightforward way.For A=∅we use afinite injury construction over∅ to construct Q with the above properties.In the construction we will∅ -approximate the canonical code of a tree T which has no∆02paths.To make sure that the tree T is random wefix aΠ01class P of positive measure in the space3N(where the code for T lies)which contains only randoms,and we make sure that at every stage our approximation(as afinite ternary string)to T’s canonical code can be extended to a path in P.Then by compactness the canonical code of our tree will be in P and so the tree will be random.The changes in the approximations are motivated by the requirements: R e:ifΦ∅ e is total then the real it defines is not in[T].Letαs be afinite string approximation of the canonical codeαwe are building. We will have|αs|=s.Strategy R e will come into power after stage e and will restrainαup to some r e≥e(the default value is r e[0]=e).Also it might request some changes inαafter the e-th bit.We start withα0=∅and at stage s+1,assuming inductively thatαs↓and[αs]∩P=∅we ask for the least i<s such that R i requires attention.This happens if(i)The longest defined initial segmentτofΦ∅is larger than ever before;i,s+112。