SLPA Uncovering Overlapping Communities in Social Networks via A Speaker-listener Interaction




Identification of overlapping community structure in complex networks using fuzzy c-means clustering

Identification of overlapping community structure in complex networks using fuzzy c-means clustering

Physica A 374(2007)483–490Identification of overlapping community structure in complexnetworks using fuzzy c -means clusteringShihua Zhang a,Ã,Rui-Sheng Wang b ,Xiang-Sun Zhang aa Academy of Mathematics &Systems Science,Chinese Academy of Science,Beijing 100080,Chinab School of Information,Renmin University of China,Beijing 100872,ChinaReceived 28June 2006Available online 7August 2006AbstractIdentification of (overlapping)communities/clusters in a complex network is a general problem in data mining of network data sets.In this paper,we devise a novel algorithm to identify overlapping communities in complex networks by the combination of a new modularity function based on generalizing NG’s Q function,an approximation mapping of network nodes into Euclidean space and fuzzy c -means clustering.Experimental results indicate that the new algorithm is efficient at detecting both good clusterings and the appropriate number of clusters.r 2006Elsevier B.V.All rights reserved.Keywords:Overlapping community structure;Modular function;Spectral mapping;Fuzzy c -means clustering;Complex network1.IntroductionLarge complex networks representing relationships among set of entities have been one of the focuses of interest of scientists in many fields in the recent years.Various complex network examples include social network,worldwide web network,telecommunication network and biological network.One of the key problems in the field is ‘How to describe/explain its community structure’.Generally,a community in a network is a subgraph whose nodes are densely connected within itself but sparsely connected with the rest of the network.Many studies have verified the community/modularity structure of various complex networks such as protein-protein interaction network,worldwide web network and co-author network.Clearly,the ability to detect community structure in a network has important practical applications and can help us understand the network system.Although the notion of community structure is straightforward,construction of an efficient algorithm for identification of the community structure in a complex network is highly nontrivial.A number of algorithms for detecting the communities have been developed in various fields (for a recent review see Ref.[1]and a recent comparison paper see Ref.[2]).There are two main difficulties in detecting community structure.The first is that we don’t know how many communities there are in a given network.The usual drawback in many /locate/physa0378-4371/$-see front matter r 2006Elsevier B.V.All rights reserved.doi:10.1016/j.physa.2006.07.023ÃCorresponding author.E-mail addresses:zsh@ (S.Zhang),wrs@ (R.-S.Wang),zxs@ (X.-S.Zhang).algorithms is that they cannot give a valid criterion for measuring the community structure.Secondly,it is a common case that some nodes in a network can belong to more than one community.This means the overlapping community structure in complex networks.Overlapping nodes may play a special role in a complex network system.Most known algorithms such as divisive algorithm [3–5]cannot detect them.Only a few community-detecting methods [6,7]can uncover the overlapping community structure.Taking into account the first difficulty,Newman and Girvan [8]has developed a new approach.They introduced a modularity function Q for measuring community structure.In order to write the context properly,we refer to a similar formulation in Ref.[5].In detail,given an undirected graph/network G ðV ;E ;W Þconsisting of the node set V ,the edge set E and a symmetric weight matrix W ¼½w ij n Ân ,where w ij X 0and n is the size of the network,the modularity function Q is defined asQ ðP k Þ¼X k c ¼1L ðV c ;V c ÞL ðV ;V ÞÀL ðV c ;V ÞL ðV ;V Þ 2"#,(1)where P k is a partition of the nodes into k groups and L ðV 0;V 00Þ¼P i 2V 0;j 2V 00w ði ;j Þ.The Q function measuresthe quality of a given community structure organization of a network and can be used to automatically select the optimal number of communities k according to the maximum Q value [8,5].The measure has been used for developing new detection algorithms such as Refs.[5,9,4].White and Smyth [5]showed that optimizing the Q function can be reformulated as a spectral relaxation problem and proposed two spectral clustering algorithms that seek to maximize Q .In this study,we develop an algorithm for detecting overlapping community structure.The algorithm combines the idea of modularity function Q [8],spectral relaxation [5]and fuzzy c -means clustering method[10]which is inspired by the general concept of fuzzy geometric clustering.The fuzzy clustering methods don’t employ hard assignment,while only assign a membership degree u ij to every node v i with respect to the cluster C j .2.MethodSimulation across a wide variety of simulated and real world networks showed that large Q values are correlated with better network clusterings [8].Then maximizing the Q function can obtain final ‘optimal’community structure.It is noted that in many complex networks,some nodes may belong to more than one community.The divisive algorithms based on maximizing the Q function fail to detect such case.Fig.1shows an example of a simple network which visually suggests three clusters and classifying node 5(or node 9)intoFig.1.An example of network showing Q and e Qvalues for different number k of clusters using the same spectral mapping but different cluster methods,i.e.k -means and fuzzy c -means,respectively.For the latter,it shows every node’s soft assignment and membership of final clusters with l ¼0:15.S.Zhang et al./Physica A 374(2007)483–490484two clusters at the same time may be more appropriate intuitively.So we introduce the concept of fuzzy membership degree to the network clustering problem in the following subsection.2.1.A new modular functionIf there are k communities in total,we define a corresponding n Âk ‘soft assignment’matrix U k ¼½u 1;...;u k with 0p u ic p 1for each c ¼1;...;k and P kc ¼1u ic ¼1for each i ¼1;...;n .With this we define the membership of each community as ¯V c ¼f i j u ic 4l ;i 2V g ,where l is a threshold that can convert a soft assignment into final clustering.We define a new modularity function e Q as e Q ðU k Þ¼X k c ¼1A ð¯V c ;¯V c ÞA ðV ;V ÞÀA ð¯V c ;V ÞA ðV ;V Þ 2"#,(2)where U k is a fuzzy partition of the vertices into k groups and A ð¯V c ;¯V c Þ¼P i 2¯V c ;j 2¯V c ððu ic þu jc Þ=2Þw ði ;j Þ,A ð¯V c ;V Þ¼A ð¯V c ;¯V c ÞþP i 2¯V c ;j 2V n ¯V c ððu ic þð1Àu jc ÞÞ=2Þw ði ;j Þand A ðV ;V Þ¼P i 2V ;j 2V w ði ;j Þ.This of coursecan be thought as a generalization of the Newman’s Q function.Our objective is to compute a soft assignment matrix by maximizing the new Q function with appropriate k .How could we do?2.2.Spectral mappingWhite and Smyth [5]showed that the problem of maximizing the modularity function Q can be reformulated as an eigenvector problem and devised two spectral clustering algorithms.Their algorithms are similar in spirit to a class of spectral clustering methods which map data points into Euclidean space by eigendecomposing a related matrix and then grouping them by general clustering methods such as k -means and hierarchical clustering [5,9].Given a network and its adjacent matrix A ¼ða ij Þn Ân and a diagonal matrix D ¼ðd ii Þ,d ii ¼P k a ik ,two matrices D À1=2AD À1=2and D À1A are often used.A recent modification [11]uses the top K eigenvectors of the generalized eigensystem Ax ¼tDx instead of the K eigenvectors of the two matrices mentioned above to form a matrix whose rows correspond to original data points.The authors show that after normalizing the rows using Euclidean norm,their eigenvectors are mathematically identical and emphasize that this is a numerically more stable method.Although their result is designed to cluster real-valued points[11,12],it is also appropriate for network clustering.So in this study,we compute the top k À1eigenvectors of the eigensystem to form a ðk À1Þ-dimensional embedding of the graph into Euclidean space and use ‘soft-assignment’geometric clustering on this embedding to generate a clustering U k (k is the expected number of clusters).2.3.Fuzzy c-meansHere,in order to realize our ‘soft assignment’,we introduce fuzzy c -means (FCM)clustering method [10,13]to cluster these points and maximize the e Qfunction.Fuzzy c -means is a method of clustering which allows one piece of data to belong to two or more clusters.This method (developed by Dunn in 1973[10]and improved by Bezdek in 1981[13])is frequently used in pattern recognition.It is based on minimization of the following objective functionJ m ¼Xn i ¼1X k j ¼1u m ij k x i Àc j k 2,(3)over variables u ij and c with P j u ij ¼1.m 2½1;1Þis a weight exponent controlling the degree of fuzzification.u ij is the membership degree of x i in the cluster j .x i is the i th d -dimensional measured data point.c j is the d -dimensional center of the cluster j ,and k Ãk is any norm expressing the similarity between any measured data and the center.Fuzzy partitioning is carried out through an iterative optimization of the objective function shown above,with the update of membership degree u ij and the cluster centers c j .This procedure converges to a local minimum or a saddle point of J m .S.Zhang et al./Physica A 374(2007)483–4904852.4.The flow of the algorithmGiven an upper bound K of the number of clusters and the adjacent matrix A ¼ða ij Þn Ân of a network.The detailed algorithm is stated straightforward for a given l as follows:Spectral mapping:(i)Compute the diagonal matrix D ¼ðd ii Þ,where d ii ¼P k a ik .(ii)Form the eigenvector matrix E K ¼½e 1;e 2;...;e K by computing the top K eigenvectors of thegeneralized eigensystem Ax ¼tDx .Fuzzy c -means:for each value of k ,2p k p K :(i)Form the matrix E k ¼½e 2;e 3;...;e k from the matrix E K .(ii)Normalize the rows of E k to unit length using Euclidean distance norm.(iii)Cluster the row vectors of E k using fuzzy c -means or any other fuzzy clustering method to obtain a softassignment matrix U k .Maximizing the modular function:Pick the k and the corresponding fuzzy partition U k that maximizes e QðU k Þ.In the algorithm above,we initialize FCM such that the starting centroids are chosen to be as orthogonal as possible which is suggested for k -means clustering method in Ref.[12].The initialization does not change the time complexity,and also can improve the quality of the clusterings,thus at the same time reduces the need for restarting the random initialization process.The framework of our algorithm is similar to several spectral clustering methods in previous studies[5,9,12,11].We also map data points (work nodes in our study)into Euclidean space by computing the top K eigenvectors of a generalized eigen system and then cluster the embedding using a fuzzy clustering method just as others using geometric clustering algorithm or general hierarchical clustering algorithm.Here,we emphasize two key points different from those earlier studies:We introduce a generalized modular function e Q employing fuzzy concept,which is devised for evaluating the goodness of overlapping community structure. In combination with the novel e Qfunction,we introduce fuzzy clustering method into network clustering instead of general hard clustering methods.This means that our algorithm can uncover overlapping clusters,whereas general framework:‘‘Objective function such as Q function and Normalized cut function+Spectral mapping+general geometric clustering/hierarchical clustering’’cannot achieve this.3.Experimental resultsWe have implemented the proposed algorithm by Matlab.And the fuzzy clustering toolbox [14]is used for our analysis.In order to make an intuitive comparison,we also compute the hard clustering based on the original Q -function,spectral mapping (same as we used)and k -means clustering.We illustrate the fuzzy concept and the difference of our method with traditional divisive algorithms by a simple example shown in Fig.1.Just as mentioned above,the network visually suggests three clusters.But classifying node 5(or node 9)simultaneously into two clusters may be more reasonable.We can see from Fig.1that our method did uncover the overlapping communities for this simple network,while the traditional method can only make one node belong to a single cluster.We also present the analysis of two real networks,i.e.the Zachary’s karate club network and the American college football team network for better understanding the differences between our method and traditional methods.S.Zhang et al./Physica A 374(2007)483–490486S.Zhang et al./Physica A374(2007)483–490487 3.1.Zachary’s karate clubThe famous karate club network analyzed by Zachary[15]is widely used as a test example for methods of detecting communities in complex networks[1,8,16,3,4,17,9,18,19].The network consists of34members of a karate club as nodes and78edges representing friendship between members of the club which was observed over a period of two years.Due to a disagreement between the club’s administrator and the club’s instructor, the club split into two smaller ones.The question we concern is that if we can uncover the potential behavior of the network,detect the two communities or multiple groups,and particularly identify which community a node belongs to.The network is presented in Fig.2,where the squares and the circles label the members of the two groups.The results of k-means and our analysis are illustrated in Fig.3.The k-means combined with Q function divides the network into three parts(see in Fig.3A),but we can see that some nodes in one cluster are also connected densely with another cluster such as node9and31in cluster 1densely connecting with cluster2,and node1in cluster2with cluster3.Fig.3B shows the results of our method,from which we can see that node1,9,10,31belong to two clusters at the same time.These nodes in the network link evenly with two clusters.Another thing is that the two methods both uncover three communities but not two.There is a small community included in the instructor’s faction,since the set of nodes5,6,7,11,17only connects with node1in the instructor’s faction.Note that our method also classifies node1into the small community,while k-means does of American college football teamsThe second network we have investigated is the college football network which represents the game schedule of the2000season of Division I of the US college football league.The nodes in the network represent the115 teams,while the links represent613games played in the course of the year.The teams are divided into conferences of8–12teams each and generally games are more frequent between members of the same conference than between teams of different conferences.The natural community structure in the network makes it a commonly used workbench for community-detecting algorithm testing[3,5,7].Fig.4shows how the modularity Q and e Q vary with k with respect to k-means and our method,respectively. The peak for k-means is at k¼12,Q¼0:5398,while for our algorithm at k¼10,e Q¼0:4673with l¼0:10. Both methods identify ten communities which contain ten conferences almost exactly.Only teams labeled as Sunbelt are not recognized as belonging to a same community for both methods.This group is classified as well in the results of Refs.[3,19].This happens because the Sunbelt teams played nearly as many games against Western Athletic teams as they played in their own conference,and they also played quite a number of gamesagainst Mid-American team.Our method identified11nodes(teams)which belong to at least twoFig.2.Zachary’s karate club network.Square nodes and circle nodes represent the instructor’s faction and the administrator’s faction, respectively.Thisfigure is from Newman and Girvan[8].communities (see Fig.5,11red nodes).These nodes generally connect evenly with more than one community,so we cannot classify them into one specific community correctly.These nodes represent ‘fuzzy’points which cannot be classified correctly by employing current link information.Maybe such points play a ‘bridge’role in two or more communities in complex network of other types.4.Conclusion and discussionIn this paper,we present a new method to identify the community structure in complex networks with a fuzzy concept.The method combines a generalized modularity function,spectral mapping,and fuzzy clustering technique.The nodes of the network are projected into d -dimensional Euclidean space which is obtained by computing the top d nontrivial eigenvectors of the generalized eigensystem Ax ¼tDx .Then the fuzzy c -means clustering method is introduced into the d -dimensional space based on general Euclidean distance to cluster the data points.By maximizing the generalized modular function e QðU d Þfor varying d ,we obtain the appropriate number of clusters.The final soft assignment matrix determines the final clusters’membership with a designated threshold l .Fig.3.The results of both k -means and our method applied to karate club network.A:The different colors represent three different communities obtained by k -means and the right table shows values of NG’Q versus different k .B:Four red nodes represent the overlap of two adjacent communities obtained by our method and the right table shows values of new Q versus different k with l ¼0:25.3.t y 0510152000. N G ' Q 0510152000. c-means K N e w Q Fig.4.Q and e Qvalues versus k with respect to k -means and fuzzy c -means clustering methods for the network of American college football team.S.Zhang et al./Physica A 374(2007)483–490488Although spectral mapping has been comprehensively used before to detect communities in complex networks (even in clustering the real-valued points),we believe that our method represents a step forward in this field.A fuzzy method is introduced naturally with the generalized modular function and fuzzy c -means clustering technique.As our tests have suggested,it is very natural that some nodes should belong to more than one community.These nodes may play a special role in a complex network system.For example,in a biological network such as protein interaction network,one node (protein or gene)belonging to two functional modules may act as a bridge between them which transfers biological information or acts as multiple functional units [6].One thing should be noted is that when this method is applied to large complex networks,computational complexity is a key problem.Fortunately,some fast techniques for solving eigensystem have been developed[20]and several methods of FCM acceleration can also be found in the literature [21].For instance,if we adopt the implicitly restarted Lanczos method (IRLM)[20]to compute the K À1eigenvectors and the efficient implementation of the FCM algorithm in Ref.[21],we can have the worse-case complexity of O ðmKh þnK 2h þK 3h Þand O ðnK 2Þ,respectively,where m is the number of edges in the network and h is the number of iteration required until convergence.For large sparse networks where m $n ,and K 5n ,the algorithms will scale roughly linearly as a function of the number of nodes n .Nonetheless,the eigenvector computation is still the most computationally expensive step of the method.We expect that this new method will be employed with promising results in the detection of communities in complex networks.AcknowledgmentsThis work is partly supported by Important Research Direction Project of CAS ‘‘Some Important Problem in Bioinformatics’’,National Natural Science Foundation of China under Grant No.10471141.The authors thank Professor M.E.J.Newman for providing the data of karate club network and the college football team network.Fig.5.Fuzzy communities of American college football team network (k ¼10and e Q¼0:4673)with given l ¼0:10(best viewed in color).S.Zhang et al./Physica A 374(2007)483–490489References[1]M.E.J.Newman,Detecting community structure in networks,Eur.Phys.J.B 38(2004)321–330.[2]L.Danon,J.Duch,A.Diaz-Guilera,A.Arenas,Comparing community structure identification,J.Stat.Mech.P09008(2005).[3]M.Girvan,M.E.J.Newman,Community structure in social and biological networks,A 99(12)(2002)7821–7826.[4]J.Duch,A.Arenas,Community detection in complex networks using extremal optimization,Phys.Rev.E 72(2005)027104.[5]S.White,P.Smyth,A spectral clustering approach to finding communities in graphs,SIAM International Conference on DataMining,2005.[6]G.Palla,I.Derenyi,I.Farkas,T.Vicsek,Uncovering the overlapping community structure of complex networks in nature and society,Nature 435(2005)814–818.[7]J.Reichardt,S.Bornholdt,Detecting fuzzy community structures in complex networks with a Potts model,Phys.Rev.Lett.93(2004)218701.[8]M.E.J.Newman,M.Girvan,Finding and evaluating community structure in networks,Phys.Rev.E 69(2004)026113.[9]L.Donetti,M.A.Mun oz,Detecting network communities:a new systematic and efficient algorithm,J.Stat.Mech.P10012(2004).[10]J.C.Dunn,A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters,J.Cybernet.3(1973)32–57.[11]D.Verma,M.Meila,A comparison of spectral clustering algorithms.Technical Report,2003,UW CSE Technical Report 03-05-01.[12]A.Ng,M.Jordan,Y.Weiss,On spectral clustering:analysis and an algorithm,Adv.Neural Inf.Process.Systems 14(2002)849–856.[13]J.C.Bezdek,Pattern Recognition with Fuzzy Objective Function Algorithms,Plenum Press,New York,1981.[14]Fuzzy Clustering Toolbox-h .[15]W.W.Zachary,An information flow model for conflict and fission in small groups,J.Anthropol.Res.33(1977)452–473.[16]M.E.J.Newman,Fast algorithm for detecting community structure in networks,Phys.Rev.E 69(2004)066133.[17]F.Radicchi,C.Castellano,F.Cecconi,V.Loreto,D.Parisi,Defining and identifying communities in networks,Proc.Natl.Acad.A 101(9)(2004)2658–2663.[18]F.Wu,B.A.Huberman,Finding communities in linear time:a physics approach,Eur.Phys.J.B 38(2004)331–338.[19]S.Fortunato,tora,M.Marchiori,A method to find community structures based on information centrality,Phys.Rev.E 70(2004)056104.[20]Z.Bai,J.Demmel,J.Dongarra,A.Ruhe,H.Vorst (Eds.),Templates for the Solution of Algebraic Eigenvalue Problems:A PracticalGuide,SIAM,Philadelphia,PA,2000.[21]J.F.Kelen,T.Hutcheson,Reducing the time complexity of the fuzzy c -means algorithm,IEEE Trans.Fuzzy Systems 10(2)(2002)263–267.S.Zhang et al./Physica A 374(2007)483–490490。












压力基求解器是从原来的分离式求解器发展来的,按顺序仪次求解动量方程、压力修正方程、能量方程和组分方程及其他标量方程,如湍流方程等,和之前不同的是,压力基求解器还增加了耦合算法,可以自由在分离求解和耦合求解之间转换, 需要注意的是,在压力基求解器中提供的几个物理模型,在密度基求解器中是没有的。






密度基求解器收敛速度快,需要内存和计算量比压力基求解器要大!特点:适用于压力基但不适用于密度基的模型:(1)空化模型(2) VOF模型(3) Mixture多相流模型(4) Eulerian多相流模型(5)非预混燃烧模型(6)预混燃烧模型(7)部分预混燃烧模型(8) 组合PDF传输模型密度基求解器(Coupled Sover)是同时fluent求解连续方程、动量方程、能量方程及组分输运方程的耦合方程组,然后逐一地求解湍流标量方程.由于控制方程是非线性的,且相互之间是耦合的,因此,在得到收敛解之前,要经过多轮迭代:1)根据当前的解的结果,更新所有流动变量。




  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

SLPA:Uncovering Overlapping Communities in Social Networks via A Speaker-listener Interaction Dynamic ProcessJierui Xie and Boleslaw K.Szymanski Department of Computer ScienceRensselaer Polytechnic InstituteTroy,New York12180 Email:{xiej2,szymansk}@Xiaoming Liu Department of Computer Science University of the Western Cape Belville,South Africa Email:andyliu5738@Abstract—Overlap is one of the characteristics of social networks,in which a person may belong to more than one social group.For this reason,discovering overlapping structures is necessary for realistic social analysis.In this paper,we present a novel,general framework to detect and analyze both individual overlapping nodes and entire communities.In this framework,nodes exchange labels according to dynamic interaction rules.A specific implementation called Speaker-listener Label Propagation Algorithm(SLPA1)demonstrates an excellent performance in identifying both overlapping nodes and overlapping communities with different degrees of diver-sity.Keywords-social network;overlapping community detection; label propagation;dynamic interaction;algorithm;I.I NTRODUCTIONModular structure is considered to be the building block of real-world networks as it often accounts for the functionality of the system.It has been well understood that people in a social network are naturally characterized by multiple community memberships.For example,a person usually has connections to several social groups like family,friends and colleges;a researcher may be active in several areas;in the Internet,a person can simultaneously subscribe to an arbitrary number of groups.For this reason,overlapping community detection algo-rithms have been investigated.These algorithms aim to discover a cover[1],which is defined as a set of clusters in which each node belongs to at least one cluster.In this paper, we propose an efficient algorithm to identify both individual overlapping nodes and the entire overlapping communities using the underlying network structure alone.II.R ELATED W ORKThe work on detecting overlapping communities was previously proposed by Palla[2]with the clique percolation algorithm(CPM).CPM is based on the assumption that a community consists of fully connected subgraphs and detects overlapping communities by searching for each such subgraph for adjacent cliques that share with it at least 1SLPA1.0:https:///site/communitydetectionslpa/certain number of nodes.CPM is suitable for networks with dense connected parts.Another line of research is based on maximizing a local benefit function.Baumes[3]proposed the iterative scan algorithm(IS).IS expands seeded small cluster cores by adding or removing nodes until the local density function cannot be improved.The quality of discovered communi-ties depends on the quality of seeds.LFM[4]expands a community from a random seed node until thefitness function is locally maximal.LFM depends significantly on a parameter of thefitness function that controls the size of the communities.GONGA[5]extends Girvan and Newman’s divisive clus-tering algorithm by allowing a node to split into multiple copies.Both splitting betweenness defined based on the number of shortest paths on the imaginary edge and the usual edge betweenness are considered.In the refined version of GONGO[6],local betweenness is used to optimize the speed.Copra[7]is an extension of the label propagation al-gorithm[8]for overlapping community detection.Each node updates its belonging coefficients by averaging the coefficients over all its neighbors.Copra produces a number of small size communities in some networks.EAGLE[9]uses the agglomerative framework to produce a dendrogram.All maximal cliques that serve as initial com-munities arefirst computed.Then,the pair of communities with maximum similarity is merged iteratively.Expensive computation is one drawback of this algorithm.Fuzzy clustering has also been extended to overlapping community detection.Zhang[10]used the spectral method to embed the graph into k-1dimensional Euclidean space. Nodes are then clustered by the fuzzy c-mean algorithm. Nepusz[11]modeled the overlapping community detection as a nonlinear constraint optimization problem.Psorakis et al.[12]proposed a model based on Bayesian nonnegative matrix factorization(NMF).The idea of partitioning links instead of nodes to discover community structure has also been explored[13],[14].As a result,the node partition of a line(or link)graph leads to2011 11th IEEE International Conference on Data Mining Workshopsan edge partition of the original graph.III.SLPA:S PEAKER-LISTENER L ABEL P ROPAGATIONA LGORITHMThe algorithm proposed in this paper is an extension of the Label Propagation Algorithm(LPA)[8].In LPA,each node holds only a single label that is iteratively updated by adopting the majority label in the neighborhood.Disjoint communities are discovered when the algorithm converges. One way to account for overlap is to allow each node to possess multiple labels as proposed in[15].Our algorithm follows this idea but applies different dynamics with more general features.In the dynamic process,we need to determine1)how to spread nodes’information to others;2)how to process the information received from others.The critical issue related to both questions is how information should be maintained. We propose a speaker-listener based information propagation process(SLPA)to mimic human communication behavior. In SLPA,each node can be a listener or a speaker.The roles are switched depending on whether a node serves as an information provider or information consumer.Typically, a node can hold as many labels as it likes,depending on what it has experienced in the stochastic processes driven by the underlying network structure.A node accumulates knowledge of repeatedly observed labels instead of erasing all but one of them.Moreover,the more a node observes a label,the more likely it will spread this label to other nodes (mimicking people’s preference of spreading most frequently discussed opinions).In a nutshell,SLPA consists of the following three stages (see algorithm1for pseudo-code):1)First,the memory of each node is initialized withthis node’s id(i.e.,with a unique label).2)Then,the following steps are repeated until the stopcriterion is satisfied:a.One node is selected as a listener.b.Each neighbor of the selected node sendsout a single label following certain speak-ing rule,such as selecting a random labelfrom its memory with probability propor-tional to the occurrence frequency of thislabel in the memory.c.The listener accepts one label from thecollection of labels received from neigh-bors following certain listening rule,suchas selecting the most popular label fromwhat it observed in the current step.3)Finally,the post-processing based on the labels inthe memories of nodes is applied to output thecommunities.SLPA utilizes an asynchronous update scheme,i.e.,when updating a listener’s memory at time t,some already updated Algorithm1:SLPA(T,r)[n,Nodes]=loadnetwork();Stage1:initializationfor i=1:n doNodes(i).Mem=i;Stage2:evolutionfor t=1:T doNodes.ShuffleOrder();for i=1:n doListener=Nodes(i);Speakers=Nodes(i).getNbs();for j=1:Speakers.len doLabelList(j)=Speakers(j).speakerRule();w=Listener.listenerRule(LabelList);Listener.Mem.add(w);Stage3:post-processingfor i=1:n doremove Nodes(i)labels seen with probability<r; neighbors have memories of size t and some other neighbors still have memories of size t−1.SLPA reduces to LPA when the size of memory is limited to one and the stop criterion is convergence of all labels.It is worth noticing that each node in our system has a memory and takes into account information that has been observed in the past to make current decision.This feature is typically absent in other label propagation algorithms such as[15],[16],where a node updates its label completely forgetting the old knowledge.This feature allows us to combine the accuracy of the asynchronous update with the stability of the synchronous update[8].As a result, the fragmentation issue of producing a number of small size communities observed in Copra in some networks,is avoided.A.Stop CriterionThe original LPA stop criterion of having every node assigned the most popular label in its neighborhood does not apply to the multiple labels case.Since neither the case where the algorithm reaches a single community(i.e., a special convergence state)nor the oscillation(e.g.,on bipartite network)would affect the stability of SLPA,we can stop at any time as long as we collect sufficient information for post-processing.In the current implementation,SLPA simply stops when the predefined maximum number of iterations T is reached.In general,SLPA produces relatively stable outputs,independent of network size or structure, when T is greater than20.B.Post-processing and Community DetectionSLAP collects only label information that reflects the underlying network structure during the evolution.The detection of communities is performed when the storedFigure 1.The convergence behavior of the parameter in LFR benchmarks with n=5000.y-axis is the ratio of the numbers of detected to true communities.Figure2.The execution time in second of SLPA in LFR benchmarks with k=20.n ranges from1000to50000.information is post-processed.Given the memory of a node, SLPA converts it into a probability distribution of labels. This distribution defines the strength of association to com-munities to which the node belongs.This distribution can be used for fuzzy communities detection[17].More often than not,one would like to produce crisp communities in which the membership of a node to a given community is binary, i.e.,either a node is in a community or not.To this end,a simple thresholding procedure is performed.If the probabil-ity of seeing a particular label during the whole process is less than a given threshold r∈[0,1],this label is deleted from a node’s memory.After thresholding,connected nodes having a particular label are grouped together and form a community.If a node contains multiple labels,it belongs to more than one community and is therefore called an overlapping node.In SLPA,we remove nested communities, so thefinal communities are maximal.As shown in Fig.1,SLPA converges(i.e.,producing similar output)quickly as the parameter r varies.The effective range is typically narrow.Note that the threshold is used only in the post-processing.It means that the dynamics of SLPA is completely determined by the network structure and the interaction rules.The number of memberships is constrained only by the node degree.In contrast,Copra uses a parameter to control the maximum number of memberships granted during the iterations.plexityThe initialization of labels requires O(n),where n is the total number of nodes.The outer loop is controlled by the user defined maximum iteration T,which is a small constant2.The inner loop is controlled by n.Each operation of the inner loop executes one speaking rule and one listening rule.For the speaking rule,selecting a label from the memory proportionally to the frequencies is,in principle, equivalent to randomly selecting an element from the array, 2In our experiments,we used T set to100.which is O(1)operation.For listening rule,since the listener needs to check all the labels from its neighbors,it takes O(K)on average,where K is the average degree.The complexity of the dynamic evolution(i.e.,stage1and2)for the asynchronous update is O(T m)on an arbitrary network and O(T n)on a sparse network,when m is the total number of edges.In the post-processing,the thresholding operation requires O(T n)operations since each node has a memory of size T.Therefore,the time complexity of the entire algorithm is O(T n)in sparse networks.For a naive implementation,the execution time on synthetic networks(see section IV)scales slightly faster than a linear growth with n as shown in Fig. 2.IV.E XPERIMENTS AND R ESULTSA.Benchmark NetworksTo study the behavior of SLPA for overlapping com-munity detection,we conducted extensive experiments on both synthetic and real-world networks.Table I lists the classical social networks for our tests and their statistics 3.For synthetic networks,we adopted the LFR benchmark [18],which is a special case of the planted l-partition model, but characterized by heterogeneous distributions of node degrees and community sizes.In our experiments,we used networks with size n=5000. The average degree is kept at k=10which is of the same order as most of the real-world networks we tested.The rest of the parameters are as follows:node degrees and commu-nity sizes are governed by the power laws,with exponents 2and1;the maximum degree is50;the community size varies between20and100;the mixing parameterµvaries from0.1to0.3,which is the expected fraction of links of a node connecting it to other communities.The degree of overlapping is determined by parameters O n(i.e.,the number of overlapping nodes)and O m(i.e., 3Data are available at /∼mejn/netdata/ and∼aarenas/data/welcome.htmFigure 3.F-score for networks with n =5000,k =10,µ=0.1.Figure 4.F-score for networks with n =5000,k =10,µ=0.3.Figure 5.NMI for networks with n =5000,k =10,µ=0.1.Figure 6.Ratio of the detected to the known numbers of memberships for networks with n =5000,k =10,µ=0.1.Values over 1are possible when more memberships are detected than there are known to exist.Figure 7.Ratio of the detected to the known numbers of communities for networks with n =5000,k =10,µ=0.1.Values over 1are possible when more communities are detected than there are known toexist.Figure 8.NMI for networks with n =5000,k =10,µ=0.3.Figure 9.Ratio of the detected to the known numbers of memberships for networks with n =5000,k =10,µ=0.3.Values over 1are possible when more memberships are detected than there are known to exist.Figure 10.Ratio of the detected to the known numbers of communities for networks with n =5000,k =10,µ=0.3.Values over 1are possible when more communities are detected than there are known to exist.the number of communities to which each overlapping node belongs).We fixed the former to be 10%of the total number of nodes.The latter,the most important parameter for our test,varies from 2to 8indicating the diversity of overlapping nodes.By increasing the value of O m ,we create harder detection tasks.We compared SLPA with three well-known algorithms,including CFinder (the implementation of clique propagation algorithm [2]),Copra [7](another label propagation algo-rithm),and LFM [4](an algorithm expanding communities based on a fitness function).Parameters for those algorithmswere set as follows.For CFinder,k varied from 3to 10;for Copra,v varied from 1to 10;for LFM αwas set to 1.0which was reported to give good results.For SLPA,the maximum number of iterations T was set to 100and r varied from 0.01to 0.1to determine its optimal value.The average performances over ten repetitions are reported for SLPA and Copra.B.Identifying Overlapping Nodes in Synthetic Networks Allowing overlapping nodes is the key feature of over-lapping communities.For this reason,the ability to identify overlapping nodes is an essential component for quantifyingTable IT HE Q ov’S OF DIFFERENT ALGORITHMS ON REAL-WORLD SOCIAL NETWORKS. Network n k SLPA std r Copra std v LFM Cfinder k karate34 4.50.650.210.330.440.1830.420.523 dolphins62 5.10.760.030.450.700.0440.280.663 lesmis77 6.60.780.030.450.720.0520.720.634 polbooks1058.40.830.010.450.820.0520.740.793 football11510.60.700.010.450.690.0320.644 jazz19827.70.700.090.450.710.0510.557 netscience379 4.80.850.010.450.820.0260.460.613 celegans4538.90.310.220.350.210.1410.230.264 email11339.60.640.030.450.510.2220.250.463 CA-GrQc4730 5.60.760.000.450.710.0110.450.513 PGP10680 4.50.820.010.450.780.0290.440.573the quality of a detection algorithm.However,the node level evaluation is often neglected in previous work.Note that the number of overlapping nodes alone is not sufficient to quantify the detection performance.To provide more precise analysis,we define the identification of overlapping nodes as a binary classification problem.We use F-score as a measure of accuracy,which is the harmonic mean of precision(i.e.,the number of overlapping nodes detected correctly divided by the total number of detected overlapping node)and recall(i.e.,the number of overlapping nodes discovered correctly divided by the expected value of overlapping nodes,500here).Fig.3and4show the F-score as a functions of the number of memberships.SLPA achieves the largest F-score in networks with different levels of mixture,as defined byµ.CFinder and Copra have close performance in the test.Interestingly,SLPA has a positive correlation with O m while other algorithms typically demonstrate a negative correlation.This is due to the high precision of SLPA when each node may belong to many groups.C.Identifying Overlapping Communities in Synthetic Net-worksMost measures for quantifying the quality of a partition are not suitable for a cover produced by overlapping detec-tion algorithms.We adopted the extended normalized mutual information(NMI)proposed by Lancichinetti[1].NMI yields the values between0and1,with1corresponding to a perfect matching.The best performances in terms of NMI are shown in Fig.5and Fig.8for all algorithms with optimal parameters.The higher NMI of SLPA clearly shows that it outper-forms other algorithms over different networks structures (i.e.,with differentµ’s).Comparing the number of de-tected communities and the average number of detected memberships with the ground truth in the benchmark helps understand the results.As shown in Fig.6,7,9and10, both quantities reported by SLPA are closer to the ground truth than those reported by other algorithms.This is even the case forµ=0.3and large O m.The decrease in NMI is also relatively slow,indicating that SLPA is less sensitive to diversity of O m.In contrast,Copra drops fastest with the growth of O m,even though it is better than CFinder and LFM on average.D.Identifying Overlapping Communities in Real-world So-cial NetworksTo evaluate the performance of overlapping community detection in real-world networks,we used an overlapping measure,Q ov,proposed by Nicosia[19].It is an extension of Newman’s Modularity[20].As the Q ov function,we adopted the one used in[7],f(x)=60x−30.Q ov values vary between0and1.The larger the value is,the better the performance is.In this test,SLAP uses r in the range from0.02to0.45. Other algorithms use the same parameters as before.In Table I,the r,v and k are parameters of the corresponding algorithms.LFM usedµ=1.0.For SLPA and Copra,the algorithms repeated100times and recorded the average and standard deviation(std)of Q ov.As shown in Table I,SLPA achieves the highest Q ov in almost all the test networks,except the jazz network for which SLPA’s result is marginally smaller(by0.01)than that of Copra.SLPA outperforms Copra significantly(by>0.1) on Karate,celegans and email networks.On average,LFM and CFinder perform worse than either SLPA or Copra. To have better understanding of the output from the detection algorithms,we showed(in Table II)the statistics, including the number of detected communities(denoted as Com#),the number of detected overlapping nodes(i.e.,O d n) and the average number of detected memberships(i.e.,O d m). Due to the space limitation,we present only results from SLPA(Columns2to4)and CFinder(Columns5to7).It is interesting that all algorithms confirm that the di-versity of overlapping nodes in the tested social networks is small(close to2),although the number of overlapping nodes differs from algorithm to algorithm.SLPA seems to have a stricter concept of overlap and returns smaller number of overlapping nodes than CFinder.We observed that the numbers of communities detected by SLPA are ingood agreement with the results from other non-overlapping community detection algorithms.It is consistent with the fact that the overlapping degree is relatively low in these networks.V.C ONCLUSIONSIn this paper,we present a dynamic interaction process and one of its implementation,SLPA,to allow efficient and effective overlapping community detection.This process can be easily modified to accommodate different rules(i.e. speaker rule,listening rule,memory update,stop criterion and post-processing)and different types of networks(e.g., k-partite graphs).Interesting future research directions in-clude fuzzy hierarchy detection and temporal community detection.Table IIT HE STATISTICS OF THE OUTPUT FROM SLPA AND CF INDER.SLPA CFinder Network Com#O d n O d m Com#O d n O d mkarate 2.12 1.80 2.0032 2.00dolphins 3.44 1.24 2.0046 2.00lesmis 5.01 1.13 2.0043 2.33polbooks 3.40 1.30 2.0049 2.00football10.30 1.47 2.00136 2.00jazz 2.71 2.00 2.00639 2.05 netscience37.84 2.25 2.006548 2.33celegans 5.6814.42 2.006192 2.70email27.96 5.57 2.004183 2.12CA-GrQc499.940.02 2.00605548 2.40PGP105141 2.00734422 2.22A CKNOWLEDGMENTThis work was supported in part by the Army Re-search Laboratory under Cooperative Agreement Number W911NF-09-2-0053and by the Office of Naval Research Grant No.N00014-09-1-0607.The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies either expressed or implied of the Army Research Labora-tory,the Office of Naval Research,or the ernment.R EFERENCES[1] ncichinetti,S.Fortunato,and J.Kert´e sz,“Detecting theoverlapping and hierarchical community structure of complex networks,”New Journal of Physics,vol.11,p.033015,2009.[2]G.Palla,I.Der´e nyi,I.Farkas,and T.Vicsek,“Uncoveringthe overlapping community structure of complex networks in nature and society,”Nature,vol.435,pp.814–818,2005. [3]J.Baumes,M.Goldberg,M.Krishnamoorthy,M.Magdon-Ismail,and N.Preston,“Finding communities by clusteringa graph into overlapping subgraphs,”in IADIS,2005.[4] ncichinetti,S.Fortunato,and J.Kertesz,“Detecting theoverlapping and hierarchical community structure in complex networks,”New J.Phys.,vol.11,p.033015,2009.[5]S.Gregory,“An algorithm tofind overlapping communitystructure in networks,”Lect.Notes Comput.Sci.,2007. [6]S.G.,“A fast algorithm tofind overlapping communities innetworks,”Lect.Notes Comput.Sci.,vol.5211,p.408,2008.[7]S..Gregory,“Finding overlapping communities in networksby label propagation,”New J.Phys.,vol.12,p.10301,2010.[8]U.N.Raghavan,R.Albert,and S.Kumara,“Near lineartime algorithm to detect community structures in large-scale networks,”Phys.Rev.E,vol.76,p.036106,2007.[9]H.Shen,X.Cheng,K.Cai,and M.-B.Hu,“Detect overlap-ping and hierarchical community structure,”Physica A,vol.388,p.1706,2009.[10]S.Zhang,R.-S.Wangb,and X.-S.Zhang,“Identification ofoverlapping community structure in complex networks using fuzzy c-means clustering,”Physica A,vol.374,pp.483–490, 2007.[11]T.Nepusz,A.Petr´o czi,L.N´e gyessy,and F.Bazs´o,“Fuzzycommunities and the concept of bridgeness in complex net-works,”Phys.Rev.E,vol.77,p.016107,2008.[12]I.Psorakis,S.Roberts,and B.Sheldon,“Efficient bayesiancommunity detection using non-negative matrix factorisa-tion,”arXiv:1009.2646v5,2010.[13]Y.-Y.Ahn,J.P.Bagrow,and S.Lehmann,“Link communitiesreveal multiscale complexity in networks,”Nature,vol.466, pp.761–764,2010.[14]T.Evans and mbiotte,“Line graphs of weighted net-works for overlapping communities,”Eur.Phys.J.B,vol.77, p.265,2010.[15]S.Gregory,“Finding overlapping communities in networks bylabel propagation,”New J.Phys.,vol.12,p.103018,2010.[16]J.Xie and B.K.Szymanski,“Community detection using aneighborhood strength driven label propagation algorithm,”in IEEE NSW2011,2011,pp.188–195.[17]S.Gregory,“Fuzzy overlapping communities in networks,”Journal of Statistical Mechanics:Theory and Experiment,vol.2011,no.02,p.P02017,2011.[18] ncichinetti,S.Fortunato,and F.Radicchi,“Benchmarkgraphs for testing community detection algorithms,”Phys.Rev.E,vol.78,p.046110,2008.[19]V.Nicosia,G.Mangioni,V.Carchiolo,and M.Malgeri,“Extending the definition of modularity to directed graphs with overlapping communities,”J.Stat.Mech.,p.03024, 2009.[20]M.E.J.Newman,“Fast algorithm for detecting communitystructure in networks,”Phys.Rev.E,vol.69,p.066133,2004.。
