基于SVM的P2P流量识别

合集下载

基于多属性P2P流量识别法的SVM(IJEM-V2-N4-1)

I.J. Engineering and Manufacturing, 2012,4, 1-8Published Online August 2012 in MECS ()DOI: 10.5815/ijem.2012.04.01Available online at /ijemSVM Based P2P Traffic Identification Method With MultiplePropertiesYao Zhao a, Zhixin Wei b, Hua Zou cState Key Laboratory of Networking and Switching Technology, Beijing University of Posts andTelecommunications Beijing, ChinaAbstractWith the rapid development of the Internet, P2P has become the main network application in the Internet, which consumes most of the network resources. Accurately identifying and making control of the P2P traffic is of great significance. As a mature classification theory, support vector machine (SVM) algorithm is suitable for P2P traffic identification. This paper proposes a SVM based P2P flow identification method, adopting multidimensional flow properties as the input vector, which can improve the P2P flow classification accuracy. Analysis shows this method has many advantages over the other methods.Index Terms: traffic identification; P2P; SVM.© 2012 Published by MECS Publisher. Selection and/or peer review under responsibility of the Research Association of Modern Education and Computer Science.1.IntroductionInvestigations show that the P2P application enlarges the resource share across the whole network as well as consumes most of the network resources, which leads to congestion and other traffic problems. In order to identify the P2P traffic and to make control of it, the P2P identification has become a hot research subject.2.Related Work2.1.Traditional P2P Flow identification technologyThe traditional traffic identification mainly uses port-based and deep packet inspection method. The P2P traffic can be identified according to the specific traffic port and special application tags of packets of the P2P traffic data[1][2]. With the rapid development of P2P, more and more P2P applications tend to use dynamical or anonymous ports as well as encrypted P2P traffic data. Under such circumstances, the above method is no longer applicable.* Corresponding author.E-mail address: a zhaoyao@; b whithin@; c zouhua@2SVM Based P2P Traffic Identification Method With Multiple Properties Some traffic identification methods are property-based[3][4]. These methods use P2P traffic properties as a basis for judging, such as TCP and UDP traffic exist between the P2P nodes, connections P2P nodes accept and initiate at the same time, balanced upload and download traffic on P2P nodes. These methods can detect traffic with encrypted data packets and dynamic ports. However, It is only applicable to the known P2P traffic. For the unknown or new P2P traffic, it cannot work appropriately. To solve this problem, experts and scholars apply classification algorithms of the machine learning filed to the P2P traffic identification and achieve a better recognition effect.More and more statistical decision-making, clustering pattern classification methods of the machine learning and data mining field have been applied to P2P traffic identification. Such as Bayesian classification algorithm, support vector machine (SVM), BP neural network and so on, among which SVM classification becomes a research hotspot with its superiority and significant effect in the P2P traffic classification. This paper introducesa P2P traffic identification method based on SVM classification.2.2.Introduction to SVMSVM is a machine learning method based on statistic theories[5]. It uses a pre-selected nonlinear transform, mapping unclassified problem of the low dimensional space to high dimensional feature space, and creates the optimal classification hyper plane in the space, which will classify the problems into two types. Therefore, it is suitable for P2P and non-P2P scene.When using the SVM as the classification method, each sample consists of a vector and a mark. The vector is composed of one or several properties, while the mark indicates the category of the sample. In this paper, P2P flow is tagged with +1, others tagged with -1. As follows: = (,) , is the feature vector, is the mark.Suppose there are n d-dimension samples, the sample type mark is + 1 or - 1, can be expressed as: ()*+If there exists a hyper plane , which makes:() (1) () (2) At this point, to make the classification interval maximum, is equivalent to make or minimal.Thus, the hyper plane that meets the conditions above and make minimal is called optimal classification plane. The training sample points that are parallel to and nearest to the hyper plane are called a support vector.Hyper plane is able to correctly identify the two types of samples and expresses optimization problems with the maximum interval as:,()- (3) The minimum value can be obtained under the restriction of (3)()〈〉 (4) This problem can be transformed to its dual problem:∑ (5)SVM Based P2P Traffic Identification Method With Multiple Properties 3 The maximum value can be obtained under the constraint of (5):()∑∑( ) (6) After computing the optimal solution of , and , the optimal classification function is obtained as follows:()*()+*∑()+ (7) The above function is appropriate for the liner classification problems. By turning the problem into its dual problem, the optimization objective function or the classification function only involves the inner product of the high dimensional space of training samples. For the non-liner problems, it can be solved by choosing the appropriate kernel function ( ) which can map the low dimension space to high dimension space to accumulate the inner product result. In this way, the non-liner classification can be transformed into liner classification problem. Then, the optimal objective function turns to be:()∑∑ ( ) (8) And the related classification function turns to be:()*∑()+ (9) 2.3.SVM based network traffic identification reviewGenerally speaking, the traditional statistical classification method is based on mathematical induction which get common abstraction from known data, then predict the unknown ones. This method works well only when the sample amount reaches a certain quantity. However, SVM classification method is different, it does not need to summarize the universal truth from the known, but through studying the known types, directly predict the unknown ones. At present, the development of the SVM theory is mature. Its superiority has been proved by many experiments, which shows that the application of SVM applied on P2P flow identification is worth studying.Much research work has been done on SVM based traffic identification and made great progress. Gabriel Gómez Sena[6]use the size of the first packets on both directions of a flow as a statistical fingerprint, by comparing the centroid clustering method with the SVM clustering method prove that the SVM based method has much higher accuracy in traffic identification. Rui Wang[7]and his colleagues concern that the peer of the P2P application connect with more different address than normal nodes, based on which he proposed a sensitive feature extraction algorithm and transformed the flow data to 3-dimension feature data as the input vector of the SVM algorithm. And experiment shows this method can identify the known as well as unknown P2P traffic with an acceptable accuracy.Both of the referenced SVM based P2P traffic identification methods work excellently on the P2P traffic identification, but they take advantage of only one or two flow statistical properties. Comparing with the single flow property as the input vector, the multiple flow properties have much better performance [8]. This paper proposes a new SVM based P2P traffic identification method which uses multiple statistical properties as the input vector of the SVM predict model4SVM Based P2P Traffic Identification Method With Multiple Properties3.SVM based P2P traffic identification method with multiple propertiesSVM classification method has been brought into the P2P flow identification field. Different from the existing single property input vector method, this paper adopts the multiple flow properties as the input vector of the SVM classification model. The SVM classification model is generated by the precisely pre-classified flow sample. Each time choose different properties combination as the input vector of the SVM classification model, record the classification accuracy. The flow properties combination that has highest accuracy will be finally selected as the input vector.The key points of this method are obtaining the precisely pre-classified P2P flow samples and the selection of best combination of the flow properties. Therefore, this paper will describe both of the aspects in detail.3.1.procedure descriptionLike most of the machine learning methods, the SVM based P2P traffic identification method includes two main phases[9], the SVM training phase and the SVM testing phase. In the SVM training phrase, the pre-classified flow samples are used to produce a SVM predict model, while in the SVM testing phase, the SVM predict model will be fully tested to see its accuracy. Before the above two phase, the priory work is to get the flow samples as the input data of the classification model. The whole procedure of the method can be described in Fig 1:Figure 1 SVM based P2P identification procedureFirstly, network packets are captured by capture tools. Then, flow samples are generated by a C++ program developed for this experiment. After that, flow Samples are pre-classified by marking with non-P2P traffic or P2P traffic tag. Then, the pre-classified are used by the SVM tool, combinations of different flow properties are selected as the input vectors for SVM training to generate the classification model. Select the one with the best accuracy as the final classification model, output the results of flow identification. Detailed contents will be introduced in the following content.3.2.Packets CaptureWire Shark is used to capture the network packets. The P2P peers use UDP to find each other and use TCP to transport the application data. To ease the analysis, before any capture, the unrelated packets like ARP, broadcast and multicast layer2 packets were all filtered, only the TCP and UDP packets were saved.SVM Based P2P Traffic Identification Method With Multiple Properties 53.3.Flow GenerationA C++ program was developed to generate the flow samples. Unpack the packets, record the source IP, destination IP, source port, destination port and protocol, the packets with the same five parameters are thought to belong to the same flow, generate a flow record.When getting a packet, the flow records of the five parameters were checked, if there is a flow record of that packet, update the flow information, and otherwise generate a new flow record. The three handshakes was thought as the beginning of a TCP flow, and the FIN/RST as the end. Besides, if no further packets come during a period of T seconds, such as 90s, the flow is thought to be over. Similarly, if there are no packets come in 90s, a UDP flow is thought to be over. Record all the flow properties, includes the total time of a flow, total packets, total bytes, the arrival time interval, the packet size, payload size etc.3.4.Pre-classify flow sampleIn order to get the accurately pre-classified flow samples, a specific LAN was established. In different time period, one or several hosts were designated to run the specific P2P application or non-P2P application. Specific application type of the flow can be identified by its IP address.At the edge router of the LAN, mirror all the interface packets to a fixed interface of the router, capture the packets of the host directly connect to that interface for later use. Mark the type of the flow based on its IP information. The flow sample will be marked as P2P or non-P2P flow as SVM training samples.After getting the necessary pre-classified samples, the paper will focus on the introduction of the P2P flow identification method.3.5.SVM training process1) Preparation workThis paper uses the open sourced LibSVM [10] tool to train and test the flow sample. LibSVM is developed by professor hih–Jen of national Taiwan University. It is an effective recognition and regression software package, including three basic tools: SVM-train, SVM-predict, SVM-scale. SVM-train is most important tool, which is used to train the sample and generate the predict model. SVM-predict is used to predict the data, output the accuracy of the test sample and predict model. SVM-scale is used to transform the vector values into [-1, 1]. The LibSVM source code is downloaded from the website and compiled under Linux to generate the executive file. The experiment adopted the proposed SVM classification steps:Firstly, the flow samples are transformed into the required form, each flow sample is described as <label> : <index1> : <value1> : <index2> : <value2> : <index3> : <value3> … , ended by …\n‟. <label> is an integer marking the type. <index>:<value> gives a property value. <index> starts from 1 and <value> is a real number of the property value.Secondly, conduct the SVM-scale on the flow samples, the value of each property is restricted to range [-1,1] to simplify calculation complexity.2) Kernel function and parameters selectionAppropriate kernel function and parameters can make the mapped distribution region of the sample more focused, thus strengthening the "linear divisible" degree of samples in the property space, to increase the classification precision and generalization capability.For the same experiment data, using different kernel functions and parameters, the classification accuracy can vary widely, even for the same kernel function; classification precision also can have bigger difference with different parameters. The commonly used kernel functions [5] are:Linear kernel:6SVM Based P2P Traffic Identification Method With Multiple Properties( )( ) (10) ●Polynomial kernel:( )[( ) ] (11) ●Radical Basis function(RBF):( ){ } (12) ●Sigmoid tanh:( ) ( ( ) ) (13) According to the functional theory, as long as there is one kind of kernel function satisfying the Mercer condition, it corresponds to the inner product of the high dimensional space.Polynomial kernel function can always satisfy mercer condition, but the parameters of the function is more, and the parameter selection influences classification accuracy influence; Sigmoid kernel function can satisfy Mercer condition only under specific circumstance; RBF kernel function parameter has only two parameters, at the same time, it can satisfy the Mercer conditions, so select RBF as the kernel function of the experiment.After choosing RBF as the kernel function, the next step is to find the best parameter C and r. To simplify the experiment, use the recommended cross-validation tool “grid-search” tool provided by the LibSVM. In cross-validation, the training set is first divided into N subsets of equal size. Sequentially one subset is tested using the classifier trained on the remaining N-1 subsets. Thus, each instance of the whole training set is predicted once, the cross-validation accuracy is the percentage of data which are correctly classified. After the best(C, r) is found, the whole training set is trained again to generate the predict model.The pre-classified training samples must be evenly distributed, so as to ensure the reliability of classification results. By testing the known classification samples, compare the results obtained to the pre-define result to see whether the predict model meets the demand.3) Flow properties selectionIn the process of the flow sample generation, all the information of the flow sample are recoded, such as the application port, the source and destination IP, the number of packets, the duration of the flow, the total bytes. Not all of the flow properties will affect the classification accuracy, only a few of the key properties affect flow classification accuracy.A flow sample is described as:()*+is the ith flow sample, indicates the P2P flow type ,+1 is P2P flow,-1 is not P2P flow;is the ith property of the flow, the value is in range of[-1,+1], after conducting the SVM-scale.This paper adopts discriminators selection method [8] to choose the properties combination as the input vector. Firstly, choose one property as the basic property, record the classification accuracy of the SVM model, then sequentially add one property each time, record the corresponding classification accuracy, see whether the added flow property affect the classification accuracy. In this way, find out the flow properties exclusively that influence the classification result least to gain the properties combination that have the maximum classification accuracy.The experiment proved that some of flow properties have more influence on the classification accuracy than the others, which can be selected as the input vector of the flow classification model: the flow duration, totalSVM Based P2P Traffic Identification Method With Multiple Properties 7 packets, total bytes, average maximum and minimum packet size, protocol type, average maximum and minimum payload size, average maximum and minimum interval of packet arrival time.4.Method AnalysisSVM is a very mature theory; scientific research has proved its superiority in classification field. It also has made certain progress in its application in the flow identification. SVM method studies the known flow samples to produce a classification model to predict the unknown flow. It doesn‟t need to do statistical computation on the known samples which reduce the required sample number.This paper adopts the multidimensional flow properties as the input vector of the P2P flow classification model, which has better accuracy than the single property. Comparison between the multidimensional flow properties and single property is showed in TABLE I, which prove that the multidimensional flow properties as the input vector of the P2P flow classification has better effect.Table 1. Comparison between single property and multiple properties method1 Method name Based on the early packets size Based on multiple properties2 Flow generate Save packets length < 200kbytes Save packets with all length3 Pre-classify sample deep packet inspection Specific network4 Pre-classification accuracy Can‟t classify new and unknownP2P flowCan classify all the flow5 Property number Single property Multiple property6 Identification accuracy About 80% Above 90%SVM based single flow property P2P traffic identification has its restriction. The flow type cannot be decided by one flow property. The combinations of the flow properties define the right type of the P2P flow. Although the affection of different property on the classification accuracy varies, the experiment proved that the combination of them as the study input of the SVM model has better classification effect. Besides, the pre-classify method of deep packet inspection can works well only with the existing P2P flows, for the new or unknown P2P flow, it can‟t correctly mark the flow type. Hence, this paper builds an experimental environment. The designated IP run the specific network applications, so the flow type can be marked accurately according to its IP, which provide completely accurate pre-classified flow samples5.SummarizeFirstly, this paper gives a summary of P2P traffic identification, and then describes principles as well as advantages and disadvantages of each traffic classification method. After that a multi-dimensional vector-based SVM classification method is presented, which make use of the discriminatory selection to find the flow properties with the greatest impact on the classification accuracy as the input vector of the classification model, which has better classification accuracy than SVM classification based one-dimensional input vectors.Further work will focus on the improvement of the method, enabling it to be applied to wide area networks and implementation of real-time P2P traffic identification.8SVM Based P2P Traffic Identification Method With Multiple PropertiesAcknowledgmentThis work was supported by National Key Basic Research Program of China ("973" Program) (2009CB320406), National High Technology Research and Development Program of China ("863" Program) (2008AA01A317), the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (Grant No.60821001), and Beijing Municipal Commission of Education to build the project special. References[1]Sen, S.,Jia Wang. Analyze P2P traffic across large network. Networking, IEEE/ACM Transactions on Volume: 12, Issue: 2.2004.[2]Ohzahata S ,Hagiwara Y, Terada M ,et al ,A Traffic Identification Method and Evaluations for a Pure P2P Application[M] .Lecture Notes in Computer Science ,2005.[3]Hongbo Jiang, Andrew W. Moore, Zihui Ge, Shudong Jin, Jia Wang. Aug. self-learning IP traffic classification based on statistical flow characteristics. Proceedings of the 2007 SIGCOMM workshop on Internet network management, 2007.[4]Kompella S,Wieselthier, J.E., Ephremides, A., Sherali, H.D. cross-layer peer-to-peer traffic identification and optimization based on active networking. Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks and Workshops, 2008.[5]Nello C, John S T. An introduction to Support Vector Machines and other Kernel-based Learning Methods. Cambridge University Press, 2004.[6]Gabr iel Gómez Sena, Pablo Belzarena, Early trafﬁc classiﬁcation using Support Vector Machines, Proceeding LANC '09 Proceedings of the 5th International Latin American Networking Conference,2009 [7]Rui Wang, Yang Liu, Yue-xiang Yang, Hai-long Wang. A new method for P2P Traffic Identification Based on Support vector Machine. AIML 06 International Conference, 13 - 15 June 2006, Sharm El Sheikh, Egypt [8]R. Yuan Z. Li, X. Guan, Accurate classification of the internet traffic based on the SVM method, in: Proceedings of the 42th IEEE International Conference on Communications (ICC 2007), June 2007[9]PAN S R,FU M ,SHI C Q. Application of the Supporting Vector Machine in P2P Traffic Identification, COMPU TER EN GINEERING & SCIENCE, Vol132 ,No12 ,2010[10]C hih-Chung Chang and Chi-Jen Lin. LIBSVM-A Library for support Vector Machines.http://www.csie.ntu.tw/~cjlin/libsvm/。

基于SVM的流量识别算法研究与应用

基于SVM的流量识别算法研究与应用一、引言网络安全作为当前互联网发展的一个重要方面，一直备受社会关注。

其中，流量识别作为网络安全的基础和核心技术之一，能够对网络流量进行分析和识别，进而发现和解决网络安全威胁，保障网络安全。

与此同时，随着网络带宽的不断增大和网络应用的多样化，流量识别技术也不断发展，从最初的基于端口号的分类到现在的深度学习模式，其技术难度和识别效果也在不断提高。

本文将围绕基于SVM的流量识别算法展开研究和应用探讨。

二、SVM算法原理及其应用SVM，即支持向量机，是一种广泛应用于分类和回归分析的算法。

它基于统计学习理论，通过将数据拟合到高维空间上，然后找到能够最大化类别边界的超平面，以达到对数据的分类目的。

而在流量识别中，SVM算法的应用具有一定的优越性。

因为在网络中，不同类型的流量通常具有不同的特征和行为模式，因此可以将网络流量数据高维度表示为数据特征向量，通过对特征向量的训练，得到SVM分类器模型，然后通过该模型实现网络流量的分类和识别。

三、SVM算法在流量识别中的应用实践具体而言，SVM算法在流量识别中的应用可以分为以下几个步骤：1、特征提取：通过网络流量抓包工具，获取网络数据包，并通过预处理方式将其处理成符合SVM算法训练要求的数据格式，比如向量形式。

2、数据预处理：将获得的向量数据样本根据特征分解等数学方法转化为向量空间模型，然后通过标准化、归一化等方法对向量进行预处理，以处理数据标准化问题。

3、训练模型：通过SVM算法对预处理后的数据集进行训练，以构建SVM模型。

具体而言，通过对训练数据进行分类和回归分析，生成支持向量，并根据支持向量构建分离超平面，最终得到SVM分类器。

4、流量识别：利用已训练好的SVM模型对输入的流量数据进行分类和识别，以判断其所属类型并做相应处理。

比如可以通过对恶意流量进行数据包拦截和流量控制，以保证网络安全。

四、SVM算法的优势与不足虽然SVM算法在流量识别中具有一定的优势，但同时也存在一些不足之处。

基于多分类支持向量机的P2P流量识别模型的设计

接口。
降低这些负面影响的危害程度。中，其如何准确识别
ＰＰ量成为ＰＰ量管理中重要的问题。２流２流
按照发展历程。传统的ＰＰ量识别方法可以２流
归结为如下几类：口识别法、Ｐ（层数据包检端ＤＩ深测）法、于行为特征的识别方法和机器学习的方方基法。端口识别技术和ＤＩ别技术都具有一定的局Ｐ识
该系统有２种运行模式：训练模式和识别模式。
在训练模式下。入已知的训练样本流量，过预处输经理、征选取、征向量表示模块进入ＳＭ训练模特特Ｖ块，进行训练，将训练结果入库；并在识别模式下，输入待测网络流量样本，过预处理、征选取、经特特征向量表示模块进入ＳＶＭ分类器模块，行未知流量进的识别，可判断出是否为ＰＰ２流量。
数据采集层为该系统的最底层，主要功能是对
ＳＭ的输入向量进行获取和可学习化处理。识别功Ｖ能层是该系统的中间一层，也是本文重点讨论的部
压力等。目前迫切需要有效的ＰＰ量管理技术以２流
分。制管理层是该系统的最高层，控该层提供相关的管理决策和更新支持向量库等功能．作为管理者的
目前几种经常研究的核函数有：性核、线多项式核、

基于AdaBoost-SVM的P2P流量识别方法

摘要：针对传统的P2P 流量识别技术存在识别率低和误判率高的缺点，将机器学习中AdaBoost 算法的良好分类能力和SVM 的泛化能力结合起来，提出一种基于AdaBoost-SVM 组合算法的P2P 网络流量识别模型，将SVM 作为AdaBoost 的基分类器，运用最小近邻法计算支持向量与训练集的样本间的距离实现分类进行P2P 流量识别。

最后，以4种P2P 流量数据为研究对象在MATLAB 上进行仿真，仿真结果表明，提出的AdaBoost-SVM 的组合算法在P2P 网络流量的分类性能和分类准确率上都优于单纯的AdaBoost 和SVM ，组合算法的P2P 流量平均识别率高达98.7%，远高于AdaBoost 和SVM 的识别率。

关键词：对等网络流量，支持向量机，分类器，分类能力，泛化能力中图分类号：TP393.08文献标识码：A基于AdaBoost-SVM 的P2P 流量识别方法*刘悦，李雪（开封大学信息工程学院，河南开封475004）P2P Traffic Identification Method Based on AdaBoost and SVMLIU Yue ，LI Xue（School of Information Engineering ，Kaifeng University ，Kaifeng 475004，China ）Abstract ：The traditional P2P traffic identification technology has shortcomings of low recognitionrate and high rate of false positives ，this paper combines AdaBoost algorithm with good generalizationability of SVM classification together ，Proposed P2P network traffic identification model based on a combination of AdaBoost -SVM algorithm.The minimum distance is calculated using the nearestneighbor method and support vector samples of the training set to achieve the classification between P2P traffic identification.Finally ，taking four kinds of P2P traffic data for example ，simulation results show that ，AdaBoost and combinations of the SVM algorithm is proposed in the classification performance and classification accuracy is better than pure AdaBoost and SVM ，the average recognition rate of Combination Algorithm was up to 98.7%，much higher than the recognition rate of AdaBoost and SVM.Key words ：P2P traffic ，support vector machine ，classifier ，classification capabilities ，generalization ability0引言随着对等网络技术的快速发展，P2P 技术被广泛应用于流媒体传输、文件共享以及即时通信等领域。

基于SVM的P2P流量识别

龙源期刊网
基于ＳＶＭ的Ｐ２Ｐ流量识别
作者：万力盘善荣傅明
来源：《计算技术与自动化》2009年第01期
摘要：根据P2P应用从最初的采用固定的端口号，发展到动态端口，再发展到伪装端口，甚至发展到现在的一些采用加密技术的具有反侦察意识的新型的P2P应用这样一个变化快的特点。

再利用支持向量机分类的本质，提出一种基于SVM的P2P流量识别方法。

通过实验证明，该方法具有较高的识别率，对未知协议的P2P识别精度也很高，说明采用支持向量机技术进行P2P流量识别的有效性。

关键词：SVM；P2P；流量识别。

采用两阶段策略模型(KTSVM)的P2P流量识别方法

ＰＰ流量识别方法２
丁要军，蔡皖东
（．１西北工业大学计算机学院，１１９西安；．７０２，２成阳师范学院信息工程学院，１００陕西咸阳）７２０，
摘要：针对识别加密ＰＰ网络流量比较困难的问题，出一种基于Ｋ均值和直推式支持向量机２提（ＳＴＶＭ）半监督学习模型— — 两阶段策略模型（Ｓ的ＫＴＶＭ，ｋｍｅｎａｅｒｎｄｃｉｅｓｐｏｔ－ａｓｂｓｄｔａｓｕｔｕｐｒｖｖｃｏｃｉｅ，ｅｔｒｍａｈｎ）以提高ＰＰ流量的识别精度．模型首先使用Ｋ均值半监督聚类算法计算训练２该
ａｄａｃｒｃｆＴＳｎｃｕａｙｏＶＭｒｍｐｏｅ．Ａｎｉｐｒａｔａｖｎａｅｏｈｄｌｓｔａｈｄｌａａｅｉｒｖｄｍｏｔｎｄａｔｇｆｔｅｍｏｅｈｔｔｅｍｏｅｎｉｃｂｒｉｅｙｂｔｂｌｄｓｍｐｅｎｎａｅｅａｐｅ。ａｄｔｅｍｏｅＳｓｉｂｅｆｒｔｅｉｅ — ｅｔａｎｄｂｏｈｌｅｅａｌｓａｄｕｌｂｌｄｓｍｌｓｎｈｄｌｉｕｔｌｏｈｄｎａａｔｉａｉｎｏ２ｒｆｉｔａＳｄｆｉｕｔｔｅｌｂｌｄｉｃｔｆＰＰｔａｆｈｔｉｉｆｌｏｂａｅｅ．ＥｘｅｉｎａｅｕｔｈｗｈｔｔｅｐｏｏｅｆｏｃｃｐｒｍｅｔｌｓｌｓｓｏｔａｈｒｐｓｄｒｍｏｅＳｂｔｅｈｎＴＳｄ１ｉｅｔｒｔａＶＭｎＶＭｏｅｓｉｃｕａｙａｄｓａｉｔａｄＳｍｄｌｎａｃｒｃｎｔｂｌｙ，ａｄｔａｓａｆｅｔｅｉｎｈｔｉｉｎｅｆｃｉｔｖｗａｏｉｒｖｈｃｕａｙｏ２ｒｆｉｄｎｉｉａｉｎｙｔｍｐｏｅｔｅａｃｒｃｆＰＰｔａｆｉｅｔｃｔ．ｃｆｏ

基于SVM分类算法的网络流量分析技术研究

基于SVM分类算法的网络流量分析技术研究网络安全问题一直是互联网发展过程中不可忽视的问题。

网络攻击手段层出不穷，一旦被攻击，企业或个人都将面临严重的后果。

因此，网络安全一直是各大企业和政府机构非常关心的问题。

而如何通过技术手段提高网络安全防御效果就成为了一项重要的研究课题。

其中，网络流量分析技术越来越被大家所重视，而基于SVM分类算法的网络流量分析技术更是受到了广泛关注。

一、 SVM分类算法SVM是一种二分类模型，主要用来解决分类问题。

SVM分类算法的核心思想是将输入空间映射到高维特征空间，在特征空间中寻找到一个最优的分离超平面，从而实现对数据进行分类。

以上述方案为例，我们可以通过SVM分类器对数据进行训练，找出最优的分类超平面。

在实际应用中，可以将样本数据按照不同特征进行划分，通过特征工程来提高SVM分类器的分类精度。

二、基于SVM算法的网络流量分析技术基于SVM分类算法的网络流量分析技术主要应用于网络安全监测与分析、网络内容分类、网络用户行为识别等方面。

下面我们将分别从这三个方面来详细介绍基于SVM的网络流量分析技术。

1、网络安全监测与分析网络攻击是企业和个人最为担忧的网络安全问题之一。

通过对网络流量进行监测和分析，可以有效地识别出网络攻击行为，并对其进行及时的防御和响应。

基于SVM的网络流量分析技术可以通过特征工程和训练模型，将正常流量和攻击流量进行分类，从而实现对网络攻击行为的检测和分析。

例如，可以通过构建不同的特征提取器，对网络流量数据进行处理，提取出包括传输速率、带宽等相关特征，利用SVM分类器对流量进行分类，实现对网络攻击行为的识别。

2、网络内容分类网络内容分类是指通过对网络内容进行分类，以便于用户获得更为精准的信息。

例如，对于搜索引擎来说，需要对搜索词进行分类，以便于给用户提供更为准确的搜索结果。

而基于SVM的网络流量分析技术可以通过对不同网络数据的分类，实现对网络内容的分类。

例如，可以对网络视频进行分类，提取包括视频码流、视频大小等特征进行分类，帮助用户快速找到自己想要的视频内容。

一种基于行为特征和SVM的P2P流量识别模型

关键词：为特征；行支持向量机；２；ＰＰ网络流量
中图分类号：Ｐ９Ｔ３３文献标识码：Ａ文章编号：０８—３３２１）３— ０９— ６１０４Ｘ（００００７０
０引言
当前，２ＰＰ网络技术迅速发展，网络规模急剧扩张．统计，２据ＰＰ网络流量已经占据了互联网流量总数的
收稿日期：００—０２１６—１２作者简介：李鹏（９３一）男，１８，河南开封人，助教．
７９
分类超平面．１明了它的基本思想．图表
图１支持向量机最优分类超平面示意图
图１中，两类样本分别用实心点和空心点来表示；Ｙ３类超平面用Ｈ来表示；两类样本中各自距离超平面
ＰＰ流量特征，２构建ＢＰ网络，通过该网络的足够训练，得到相关的测试结果．但是神经网络的层数和神经元个数难以确定，容易陷入局部极小，出现过学习现象．文献［］６提出使用支持向量机的方法进行ＰＰ流量５［］２识别．Ｖ在避免局部最优解，服 “ 数灾难 ” 解决小样本、ＳＭ克维，高维输入空间的ＰＰ流量识别问题上体现出２
开展ＰＰ网络的ＱＳ控制研究就显得非常必要．２ｏ而实现ＱＳ控制也成为ＰＰ网络管理的重要特征．ｏ２
要实现ＰＰ网络的ＱＳ控制管理，必须清楚地了解ＰＰ网络当前的工作情况，２ｏ就２以便及时对网络采取相应的管理措施．就需要对ＰＰ流量进行识别，２这２ＰＰ流量识别是实施有效的ＱＳ控制的基础．ｏ因此，深入研

基于优化SVM的P2P协议识别

第２卷第７期８
２１０１年７月
计算机应用研究
ＡｐｌｃｔｏｓａｃｆＣｏｕｅｓｐｉａｉｎＲｅｅｒｈｏｍｐｔｒ
Ｖ０＿８Ｎｏ７ｌ２．
Ｊ１２１ｕ．０１
基于优化ＳＶＭ的Ｐ２Ｐ协议识别水
随着计算机通信网络的不断发展，通过网络共享资源的ＰＰ应用越来越广泛。虽然ＰＰ应用从很大程度上缓解了Ｃ２２／
用以及非ＰＰ应用的分类，２进而实现对ＰＰ应用的监管。不２
同的协议具有不同的特征，本文通过获取数据包会话流的各种属性，对其进行统计分析，通过特征提取，择出最能识别出各选种应用的多种属性，利用粒子群算法优化的支持向量并机进行分类。
毛灵，陈兴蜀，吴仲光，谭骏，杜敏
（四川大学计算机学院，成都６０６）１０５摘要：针对ＰＰ应用提出了一种采用ＤＩ２Ｆ深度流分析的方法，通过还原会话流，提取ＰＰ２数据流的各种属性
特征，用ＧｉＳａｈ遗传算法、子群算法三种不同算法优化的支持向量机对网络数据流进行分类。通过实采ｒｅｒ、ｄｃ粒验测试，ＰＰ与非ＰＰ的多种应用中，用支持向量机进行设计的分类器分类准确率较高，在９％以上，在２２使均０最
ｓ架构模型中服务器的压力，但是ＰＰ应用也存在很大的安全２隐患：首先ＰＰ应用占用了大量的网络带宽，响其他应用的２影

适于P2P网络流量识别的SVM快速增量学习方法

增量学习就是如何优化确定一个缩减样本集合Ｄ降
低增量学习时间复杂度。文献［１０］论证了Ｄ眦中违背Ｈ．。的ＫＫＴ条件的样本集合Ｄ对Ｄ影响满足以下定理：定理１：若Ｄ不为空集，则Ｄ中必存在部分或全
性能好、解的全局最优性等优势．已在Ｐ２Ｐ网络流量识别中得到初步应用。文献ｆ７～８１提出使用ＳＶＭ机器学习
方法对Ｐ２Ｐ网络流量进行识别；文献『９１将小波理论与ＳＶＭ结合实现对Ｐ２Ｐ网络流量进行识别
？
文章编号：１００７ — １４２３（２０１４）１５ — ０００３ — ０４ＤＯＩ：１０．３９６９６．ｉｓｓｎ．１００７ — １４２３．２０１４．１５．００１
／
适于Ｐ２Ｐ网络流量识别的ＳＶＭ快速增量学习方法
应用层签名［￣３３ｐ２Ｐ流量识别的准确率大为降低
近年来．较多研究人员尝试将基于流量统计特征
的机器学习方法引入到Ｐ２Ｐ流量识别中。其中．支持
向量机（Ｓｕｐｐｏ￣ＶｅｃｔｏｒＭａｃｈｉｎｅ．ＳＶＭ）由于其具有泛化
增量学习方法。在对违背ＫａｍＳｈ — Ｋｕｈｎ — Ｔｕｃｋｅｒ条件的新增正负样本集分别进行聚类分析基础上，运用聚类簇中心对

基于SVM的P2P网贷平台风险评价体系构建

量资本的竞相追逐。

导致这一现象发生的原因是这两家共享单车企业拥有较大规模的用户数量，能够以此获得众多资本的青睐，获得资本投入的这两家企业又通过各种营销手段（二元包月、骑行有礼等）不断抢占市场份额，使得众多的中小单车企业用户数锐减，进而最终导致纷纷濒临或已经破产。

总而言之，这一时期共享单车市场具有的特征有：首先，提供共享单车服务的厂商相对较少，主要有两家（ofo共享单车和摩拜单车）占有绝对的市场份额；其次，共享单车企业所提供的服务基本上是具有同一性；再次，共享单车的定价具有相互依存性；最后，其他共享单车企业想要进入这一市场已经变得十分困难。

因此，这一时期的共享单车市场是一个非常典型的“双寡头垄断市场”。

2 共享单车行业的预测目前共享单车行业是一个需要不断投资的行业，资本逐渐汇集到了摩拜单车和ofo共享单车这两家，一线城市基本被这两家企业占据。

根据经济学原理分析，在竞争市场上，如果从经营过程中得到的收益低于它的总成本，那么该企业就应该退出市场，目前所有的共享单车都没有盈利，投入的资金全靠资本注入。

2017年中旬，ofo、摩拜两家接连宣布获得巨额E轮融资，使得ofo单车以及摩拜单车这两家企业得到支持继续发展下去，从ofo共享单车和摩拜单车活跃用户规模、新用户、使用频次和时长的稳定增长可以看出ofo 和摩拜两家单车的品牌效应日益凸显，逐渐拉大了与其竞争者的差距，行业领先地位正在进一步扩大，后入者生存空间越来越小。

到2017年底，共享单车市场仍然是一个不断烧钱的行业，但从长远看来，共享单车这个市场的潜力是巨大的，拥有相当高的资产回报率，这也是为何有众多资本追逐的原因。

随着共享单车行业寡头垄断的持续，我们发现对ofo和摩拜两家共享单车企业注入资本的投资机构实力都不可小觑。

然而，由于共享单车提供服务同质性以及服务人群具有高度一致性，所以两家企业极大的可能会达成某种合作或合并，就像是当年滴滴和快滴最终走向合并一样。

一种基于损失函数的SVM算法在P2P流量检测中的应用_吴敏

第36卷　第12期2009年12月计算机科学Computer Science V ol .36No .12Dec 2009到稿日期:2009-03-16　返修日期:2009-06-14 本文受江苏省自然科学基金(BK2008451),国家高科技863项目(2007AA01Z404,2007AA01Z478),南京市高科技项目(2007软资127),江苏省博士创新项目(C X08B -086Z )和南京邮电大学青蓝工程项目(NY206034,NY208011)资助。

吴　敏(1976-),女,博士生,讲师,主要研究方向为P2P 技术、流量检测、分布式计算、计算机密码学、人工智能,E -mail :w umin @nju pt .edu .cn ;王汝传(1943-),男,教授,博士生导师,CCF 会员,主要研究方向为计算机软件理论、移动代理技术、分布式计算和网格计算。

一种基于损失函数的SVM 算法在P 2P 流量检测中的应用吴　敏　王汝传(南京邮电大学计算机科学学院　南京210003)摘　要　P2P 流量逐渐成为互联网流量的重要组成部分,但在对Inte rne t 起巨大推动作用的同时,也带来了因资源过度占用而引起的网络拥塞以及安全隐患等问题,妨碍了网络业务的正常开展。

首先介绍了各种P2P 流量识别方法及其优缺点,然后提出一种基于损失函数机制的支持向量机算法,用于实时P2P 流量检测,并构建了一个基于本算法的检测控制模型。

实验结果显示,该算法更符合P2P 流量的实际检测要求,具有更好的检测精度。

关键词　对等网络,流量检测,损失函数,支持向量机中图法分类号　T P393.08 文献标识码　A Risk Function -based SVM Algorithm and its Application in P 2P -traffic DetectionW U M in W AN G Ru -chuan(Departmen t of Compu ter ,Nanjing University of Pos ts and Telecommunications ,Nanjin g 210003,China )A bstract P2P tr affic has taken g reat po rtio ns in the netwo rk traffic .While having a significant impact o n the Inte rne t ,it bring s se rious problem s such as ne tw o rk congestio n and traffic hindrance caused by the excessive o ccupatio n in the bandw idth .T he pape r fir stly int roduced me tho ds in ide ntify ing P2P traffic and their cha racters ,then put for war ds a Suppo r t Vecto r M achine (SV M )algo rithm based on risk f unction w hich can be used to identify P2P traffic .Finally ,amode l w as set up to fulfill the identifica tion of P2P traffic .Experimental results sho w that the metho d is mo re suitable for pr actical r equirements of tr affic identificatio n and has a mo re accura te precision .Keywords P2P ,T r affic identificatio n ,Risk functio n ,Suppor t vector machine 1　引言随着P2P 网络技术在20世纪90年代后期的兴起,P2P 流量逐渐成为互联网流量的重要组成部分。

P2P流量综合识别方法的研究

P2P流量综合识别方法的研究摘要P2P的网络传输优势将是民航未来网络传输的发展方向，而P2P流量占用了大量互联网带宽资源，为保证网络的正常运行，有必要对P2P流量加以识别并适当控制。

本文提出一种利用贝叶斯分类技术对网络中P2P流量进行分类的方法，结合深层数据包载荷特征识别和端口识别技术构建了P2P流量识别器。

关键词：P2P；贝叶斯；网络流量；流量控制；识别器0 引言随着网络技术的迅速发展，P2P技术得到了广泛的应用，其传输优势必将是民航未来网络传输的发展方向。

P2P技术不断发展的同时，各种P2P业务所产生的网络流量成为网络带宽的最大消费者，一定程度上影响了其他网络业务正常开展。

对P2P网络流量进行科学的管理和控制，已成为网络管理者面临的重要课题之一。

本文将探讨一种基于流量特征检测和深层数据分析的精确匹配方法，结合深层数据包载荷特征识别和端口识别技术构建P2P流量识别器，为网络管理者对P2P流量管理提供一种可行的方案。

1 P2P流量综合识别法P2P流量综合识别法是运用端口识别、流量特征和深层数据包检测共同对P2P流量进行分类识别的方法。

在流量识别的开始阶段使用端口识别技术，把网络流量中的一些常规网络业务流量（如www、FTP等）分离出来，去除这些不需要进行识别的常规网络流量，为后面的流量识别分类工作做好准备。

对剩余的网络流量，运用流量特征检测技术识别。

在进行识别时，结合数据挖掘技术对P2P 网络流量进行分析，获取P2P流量产生的特征属性集，用这些流量特征集来识别新的P2P流量。

最后通过深层数据包载荷检测技术对识别出来的P2P流量进行精确分类。

首先获取网络流量数据包，让其进入缓存队列，并对数据包进行完整的信息提取，获取网络流的五元组信息。

其次，对缓存队列里的数据包采用IP地址识别，端口识别，TCP/UDP识别技术进行流量识别，识别出一些常规的网络流量和一些采用固定端口进行流量传输的P2P业务。

第三，对于仍没有识别出来的网络流量，采用贝叶斯（Naïve Bayes，简记为NB）分类技术来进行识别，识别出具有P2P流量特征的网络流量，并对这些网络流量进行分类标识，未能识别出具体类型的P2P流量放到下一步去识别。

基于多分类支持向量机的P2P流量识别模型的设计

基于多分类支持向量机的P2P流量识别模型的设计
张莉
【期刊名称】《电信快报：网络与通信》
【年(卷),期】2010(000)010
【摘要】文章研究了SVM(支持向量机)在P2P流量识别中的应用技术.首先介绍了一个基于SVM的P2P流量识别方法,对网络中的P2P流量进行识别,接着对经典1-vs-all多分类SVM算法进行了改进,提出了一个新的基于MC-SVM(多分类支持向量机)的分类判别方法,用来把之前所识别出的未知具体应用层分类的P2P流量进行应用层分类,最后通过真实的网络流量数据的实验,证明了其可行性.
【总页数】3页(P43-45)
【作者】张莉
【作者单位】南京邮电大学计算机学院,江苏省南京市210003
【正文语种】中文
【相关文献】
1.基于流记录偏好度的多分类器融合流量识别模型 [J], 董仕;丁伟
2.一种基于多维支持向量机的P2P网络流量识别模型 [J], 杨光;马英瑞
3.基于流量与行为特征的P2P流量识别模型 [J], 邬书跃;余杰;樊晓平
4.基于签名的P2P流量识别模型的设计与实现 [J], 杜江;易鹤声
5.基于支持向量机的P2P流量管理模型设计 [J], 杜经纬
因版权原因，仅展示原文概要，查看原文内容请购买。

基于支持向量机的P2P流量管理模型设计

基于支持向量机的P2P流量管理模型设计杜经纬【期刊名称】《吉首大学学报（自然科学版）》【年(卷),期】2015(000)004【摘要】Network traffic recognition of P2P is one of the significant problems in P2P research .A classifi‐cation management model based on SVM for P2P is thus proposed for its management .Firstly ,the data is obtained from P2P network traffic;then the obtained sample data is input to the SVM for training ;fi‐nally ,the test sample is input to the SVM to realize its classification .The simulation experiment shows the method in this paper has the high detection rate and low false negative rate .%对P2 P 的网络流量进行识别是P2 P研究领域中的一个重大难题，为了实现对其管理，提出了一种基于支持向量机（SVM ）的P2P流量分类管理模型。

首先获取P2P网络流量数据，然后将获取的样本数据输入SVM 并对SVM 进行训练，最后将测试样本数据输入SVM进行P2P流量分类管理。

仿真实验证明了该方法具有较高的检测率和较低的漏报率。

【总页数】4页(P26-29)【作者】杜经纬【作者单位】运城学院计算机科学与技术系，山西运城044000【正文语种】中文【中图分类】TP393【相关文献】1.基于多分类支持向量机的P2P流量识别模型的设计 [J], 张莉2.一种基于多维支持向量机的P2P网络流量识别模型 [J], 杨光;马英瑞3.基于方差分析和支持向量机技术的P2P流量检测 [J], 吴敏;王汝传4.基于K均值和双支持向量机的P2P流量识别方法 [J], 郭伟;王西闯;肖振久5.基于小波支持向量机的P2P网络流量识别算法 [J], 刘悦;郭拯危因版权原因，仅展示原文概要，查看原文内容请购买。

基于SVM和Adaboost解决实时流量识别问题

基于SVM和Adaboost解决实时流量识别问题全良添【摘要】针对当前实时流量识别技术上的不足，本文基于支持向量机算法（SVM）和Adaboost算法，提出了一种实时流量识别算法。

这种方法将SVM算法使用在Adaboost算法框架中，通过Adaboost算法来提高SVM算法的对流量样本学习能力，改善了SVM算法在实时流量识别中的准确率，从而改善识别器的性能。

仿真实验证明，通过设定算法的迭代次数，这种方法的实时流量识别准确率能够达到85%以上。

%This paper proposes a real time traffic identification algorithm based on based on support vector machine (SVM) and Adaboost algorithm, to solve the accuracy problem in this ifeld. This algorithm could improve the learning ability of SVM by using Adaboost algorithm, thereby improving the performance of the SVM recognizer in real-time trafifc recognition. Simulation results showed that the accuracy of real-timetrafifc identiifcation could be more than 85%by using this algorithm.【期刊名称】《软件》【年(卷),期】2013(000)009【总页数】4页(P61-64)【关键词】计算机应用技术;实时流量识别;SVM;Adaboost;准确率【作者】全良添【作者单位】北京邮电大学网络技术研究院北京市 100876【正文语种】中文【中图分类】TP399上个世纪中期以来，信息网络技术给人类社会带来了极大的变化。