A Network Anomaly Detection Method Based on Transduction Scheme
NDM网络异常检测
2
It is easy to cause the network problems in large scale,without intranet system for dynamic monitoring
3
Can not bulid a unified network defense system,with safety and manageme nt function dispersed.
User Management
Network access
Unauthorized, changed ip or MAC address user without the right to surf the Internet.
identity integration
NDM not only integrates the external server , AD, with local database to realize synchronous authentication, identity import,but also provides identity management with the built-in Radius server
Accident Management
Localhoust security log 合法的IP/MAC Switch SNMP Trap Text third-party network Text devices Syslog
event log
define nonstandard syslog format Text accidents relation Text management Text unit protect system
一种有效的混合入侵检测系统:分层方法(IJCNIS-V7-N3-5)
Effective Hybrid Intrusion Detection System: A Layered Approach
Abebe Tesfahun, D. Lalitha Bhaskari
AUCE (A), Andhra University, Visakhapatnam, AP, India Email: abesummit@, lalithabhaskari@yahoo.co.in Abstract—Although there are different techniques proposed for intrusion detection in the literature, most of them consider standalone misuse or anomaly intrusion detection systems. However, by taking the advantages of both systems a better hybrid intrusion detection system can be developed. In this paper, we present an effective hybrid layered intrusion detection system for detecting both previously known and zero-day attacks. In particular, a two layer system that combines misuse and anomaly intrusion detection systems is proposed. The first layer consists of misuse detector which can detect
Modbus TCP功能代码流量异常检测方法:基于CUSUM算法说明书
4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)Anomaly Detection Approach based on Function Code Traffic by UsingCUSUM AlgorithmMing Wan a*, Wenli Shang b, Peng Zeng cShenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, ChinaKey Laboratory of Networked Control System, Chinese Academy of Sciences, Shenyang, Chinaa**************,b**************,c*********Keywords: Anomaly detection; Modbus/TCP; Function code traffic; Cumulative sum;Abstract. There is an increasing consensus that it is necessary to resolve the security issues in today’s industrial control system. From this point, this paper proposes an anomaly detection approach based on function code traffic to detect abnormal Modbus/TCP communication behaviors efficiently. Furthermore, this approach analyzes the Modbus/TCP communication packets in depth, and obtains the function code in each packet. According to the function code traffic change, this approach uses the Cumulative Sum (CUSUM) algorithm for change point detection, and generates an alarm. Our simulation results show that, the proposed approach is very available and effective to provide the security for industrial control system. Besides, we also discuss some advantages and drawbacks when using this approach.IntroductionNowadays, industrial control system has become an important part in many critical infrastructures, for example power, water, oil, gas, transportation, et al. With the development of modern networking, computing and control technologies, the deep integration of industrialization and informationization has been regarded as the inevitable tendency by both academia and industry. Especially, the “Industry 4.0” revolution, defined by Germany, further emphasizes the essential role of the networking technology [1]. However, the incoming networking technology has broken the original closure in industrial control system, and has brought some security problems into industrial control system [2]. Although there are various kinds of security methods in regular IT system, the traditional security methods cannot be applied directly to networked control system [3].There are two general approaches for improving the security in industrial control system. One is the communication control or access control approach, and its typical application is industrial firewall [4]. However, due to the manual rule setting and the real-time performance, this approach has been used to a limited extent. Secondly, the intrusion detection approach in industrial control system [5,6] is effective to identify network attacks, and it can give an alarm when suffering a great destruction. As a bypass approach to monitor the abnormal behaviors, intrusion detection technology has been attracting great interests of industry and researchers. Furthermore, intrusion detection can be into two categories: misuse detection and anomaly detection, and the proposed approach in this paper falls into the latter category.Anomaly detection technology in industrial control system can be divided into three categories [7,8]: statistics-based approach, knowledge-based approach, and machine learning-based approach. By supervising the industrial communication behaviors, these three categories of approaches can detect attacks, alarm and carry out the defensive measures before suffering from kinds of attacks. In the statistics-based approaches, Reference [9] uses the sequential detection model to realize the aberrant communication behaviors in control system. References [10] and [11] use the CUSUM algorithm to implement the communication traffic statistics in industrial control system, and explore the abnormal change point. However, the above statistical analysis only aims at the common industrial communication traffic, and cannot analyze the communication packets in depth according to the industrial communication protocol specification. In this paper, we propose an anomalyapproach based on function code traffic. In accordance with the Modbus/TCP protocol specification, this approach analyzes the Modbus/TCP communication packets in depth, and utilizes the function code traffic to detect abnormal Modbus/TCP communication behaviors. According to the function code traffic change, this approach uses the Cumulative Sum (CUSUM) algorithm for change point detection, and generates an alarm.Modbus/TCP and Vulnerability AnalysisModbus/TCP, regarded as an application layer protocol, is an open industrial communication protocol, and uses a typical master-slave communication mode. Namely, one Modbus master sends a request message to one Modbus slave, and the Modbus slave responds this message in accordanceAs shown in Fig. 1, the Modbus/TCP packet format mainly consists of three parts: MBAP (Modbus Application Protocol) header, Modbus function code and data. Wherein, MBAP header is a special header which is used to identify Modbus application data unit. Function code is a flag field to perform various operations, and is used to inform the slave to operate the corresponding function. The data domain can be regards as the parameters of function code, and indicates the specific data to perform one operation.The vulnerabilities of this protocol are increasingly exposed in recent years [12,13], and can be concluded as follows: firstly, Modbus/TCP lacks the authentication, and any Modbus master can use an illegal IP address and one function code to establish a Modbus session; secondly, Modbus/TCP does not consider the authorization, and any Modbus master can perform any operation by using some invalid function codes; finally, Modbus/TCP is short of the integrity detection, and the communication data may be tampered. For example, the function code can be changed to another illegal function code by one attacker.Anomaly Detection Approach based on Function Code TrafficIt is highly necessary to study on the anomaly detection approach in industrial control system. However, the industrial communication traffic is high-dimensional, and it hard to detect the abnormal communication behaviors. Therefore, we use the function code traffic to execute the anomaly detection, because the function code traffic is simple and single dimensional, and can indirectly reflect the industrial communication behaviors. In our approach, we first capture the industrial communication packets, and extract the Modbus/TCP communication packets. After that, we analyze these Modbus/TCP packets in depth, and get the function code in each packet. From this base, we perform a statistical analysis to form the function code traffic in each specified time interval. Finally, according to the function code traffic, we use the CUSUM algorithm to detect the change point. When one change point appears, the corresponding alarm will be generated. The CUSUM algorithm can be described as follows [14]:Assume the time sequence 1x , 2x ,…,1v x − are independent identically distributed variables withthe Gaussian distribution (0,1)N , and the time sequence v x , 1v x +,…, n x are independent identicallydistributed variables with the Gaussian distribution (,1)N δ, where v (v n <) is an unknown changepoint and the value i x represents the number of function codes in the th i time interval. Suppose thereis no change point, namely v =∞, the statistical value of the log-likelihood ratio is:11max ()2n n i v n i v Z x δ≤<+=−∑ (1) Eq. (1) describes the most ordinary CUSUM statistical value. Suppose h (0h >) is a chosen threshold which may be determined empirically through experiments. If i Z h ≤, 1,2,...,i n =, the former 1n − values are under normal conditions; if n Z h >, anomaly happens and an alarm should begenerated. Similarly, the foregoing judgment also can be understood that if an existing number r satisfies 0(1)2r n i i x r h δ−=−+>∑, where 01r n ≤≤−, then the anomaly happens and an alarm should begenerated.The aforementioned equation illustrates the basic CUSUM algorithm. However, the prerequisite is that we have assumed that {n x } are independent Gaussian random variables. Of course, this is nottrue for network traffic measurements owning to seasonality, trends and time correlations [16]. Therefore, in order to remove such non-stationary behaviors, the work in [15] further improves the basic CUSUM algorithm, and n Z can be calculated by:111120[()]20n n n n n n Z Z x Z αµαµµσ+−−−− =+−− = (2) where α is an amplitude percentage parameter, which intuitively corresponds to the most probable percentage of increase of the mean rate after a change has happened. 2σ is the variance of σ. Meanwhile, the mean n µ can be calculated by using an exponentially weighted moving average(EWMA) of previous measurements:1(1)nn n x µβµβ−=+− (3) where β is the EWMA factor. Thus, the conditions to generate an alarm can be summarized as follow:1, if ;()0, otherwise.n n Z h f Z > =(4) In Eq. (4), 1 indicates that the anomaly in the detected sequence {n x } is identified and an alarm isgenerated. By contrast, 0 indicates that the detected sequence {n x } is normal.However, a disadvantage or flaw exists in the CUSUM algorithm [17]. That is, when the anomaly or attack is over, CUSUM still continues generating the false alarms for a long time. Resulting from accumulation effect of the CUSUM algorithm, the increased amount to n Z caused by the attacktraffic is much greater than the decreasing amount provided by the normal traffic. In order to resolve this issue, our approach uses the following formula to revoke an alarm.2()0, if and v i n n i f Z Z h x ϕµ−=≥< (5)where ϕ is an amplitude and 1ϕ>. Assume an anomalous behavior happens at time v , and i x is the detected mapping request traffic in the th i time interval, i v > . 2v i µ− is the traffic mean of theformer 2v i − time intervals, which can be calculated by Eq. (3). The main idea of Eq. (5) is that when the traffic i x is less than the traffic mean 2v i µ− and n Z h ≥, the alarm will be revoked. In addition, inorder to revoke an alarm more accurately, the condition 2v i i x ϕµ−< can be improved as:201{}i j v i j k j x θϕµ+−−=≥<∑ (6)where θ is a positive integer and 1k θ>>. Eq. (6) describes that when the number satisfies the condition 2v i i x ϕµ−< is larger than θ, the alarm will be revoked. At the same time, after revoking thealarm, we also reset n Z between 0 and h .Performance Evaluations In the simulation experiment, we build a small SCADA system, whose communication is based on Modbus/TCP. As shown in Fig. 2, the whole technological process can be simply depicted as follows: when the valve switches A and B are respectively turned on, materials A and B successively flow into the container through the valve switches A and B to produce material C. When material C in the container reaches the level upper point, the valve switches A and B are turned off, and then the valve switch C is turned on. When material C in the container exhausts and reaches the level lower point, the valve switch C is turned off. Besides, the above-described technological process is repeatedly performed every 5 minutes.Fig.2 Simulation experiment topologyIn order to detect the abnormal communication behaviors, we deploy a monitoring computer on industrial switch to capture the communication packets between the supervisory control layer and the control unit layer. Furthermore, we carry out two experiments: one is under normal condition, and the other is under abnormal condition. Under normal condition, we run the simulation for 120 minutes. Fig. 3(a) shows the communication traffic captured by the monitoring computer per 1 minute, and Fig. 3(b) shows the corresponding function code traffic. From these two figures we can see that, the communication traffic is complex and changed, but the function code traffic varies periodically and can reflect every technological process. Under abnormal condition, we perform two attacks at 30th minute and at 80th minute respectively. Here, the attacker sends 50 Modbus/TCP packets whose function code is to write a coil at 30th minute, and sends the same 30 packets at 80th minute. Besides, we apply our anomaly detection approach to the corresponding function code traffic. Fig. 4(a) plots the communication traffic after the attacks. From this figure we can conclude that the attack traffic is hidden into the normal communication traffic, and we cannot identify the attack behaviors only from the communication traffic. Similarly, Fig. 4(b) plots the alarm points in the function code traffic after the attacks. From this figure we find that the proposed approach can detect the abnormal behaviors and generate alarms when the attacks happen. To sum up the above arguments, our approach is available and effective to identify and diagnose some network anomalies in industrial control system. In other words, compared with the anomaly detection using the communication traffic, our approach is more advantage.Fig.4 Under normal conditionConclusionThis paper aims to propose an anomaly detection approach based on function code traffic, and the basic idea behind the proposed approach is very simple. That is, identifying and detecting the anomalous communication behaviors in industrial control system by judging the function code traffic anomaly. In this paper, we first analyze Modbus/TCP protocol and its vulnerabilities, and then we present the detailed design of our approach, including the CUSUM algorithm. At last, we evaluate our approach in detail by simulation experiment. We show that, our approach is very available and effective to provide the security for industrial control system. Besides, we also discuss some drawbacks of our approach for our future research.AcknowledgementsThis work is supported by the National Natural Science Foundation of China (Grant No. 61501447) and Independent project of Key Laboratory of Networked Control System Chinese Academy of Sciences: Research on abnormal behavior modeling, online intrusion detection and self-learning method in industrial control network.References[1] H. Kagermann, W. Wahlster, J. Helbig, Recommendations for implementing the strategic initiative INDUSTRIE 4.0, Final Report, http://www.plattform-i40.de/finalreport2013, 2013.[2] B. Genge, C. Siaterlis, I. N. Fovino, et al., A cyber-physical experimentation environment for the security analysis of networked industrial control systems, Computer and Electrical Engineering, 38(5) (2012) 1146-1161.[3] C. Shao, L. G. Zhong, An information security solution scheme of industrial control system based on trusted computing, Information and Control, 44(5) (2015) 628-633.[4] S. S. Zhang, W. L. Shang, M. Wan, et al., Security defense module of Modbus TCP communication based on region/enclave rules, Computer Engineering and Design, 35(11) (2014) 3701-3707.[5] A. Carcano, A. Coletta, M. Guglielmi, et al., A multidimensional critical state analysis for detecting intrusions in SCADA systems, IEEE Transactions on Industrial Informatics, 7(2) (2011) 179-186.[6] A. Anoop, M. S. Sreeja, New genetic algorithm based intrusion detection system for SCADA, International Journal of Electronics Communication and Computer Engineering, , 2(2) (2013) 171-175.[7] S. M. Papa, V. S. S. Nair, A behavioral intrusion detection system for SCADA systems, Southern Methodist University, 2013.[8] B. Zhu, S. Sastry, SCADA-specific intrusion detection/prevention systems: a survey and taxonomy, The 1st Workshop on Secure Control Systems (SCS), 2010.[9] A. A. Cardenas, S. Amin, Z. S. Lin, Attacks against process control systems: risk assessment, detection, and response, The 6th ACM Symposium on Information, Computer and Communications Security, Hong Kong, 2011, pp.355-366.[10] Y. G. Zhang, H. Zhao, L. N. Wang, A non-parametric CUSUM intrusion detection method based on industrial control model, Journal of Southeast University(Natual Science Edition), A01 (2012) 55-59.[11] M. Wei, K. Kim, Intrusion detection scheme using traffic prediction for wireless industrial networks, Journal of Communications and Networks, 14(3) (2012) 310-318.[12] N. Goldenberg, A. Wool, Accurate modeling of Modbus/TCP for intrusion detection in SCADA systems, International Journal of Critical Infrastructure Protection, 6(2) (2013) 63-75.[13] T. H. Kobayashi, A. B. Batista, A. M. Brito, et al., Using a packet manipulation tool for security analysis of industrial network protocols, IEEE Conference on Emerging Technologies and Factory Automation. Patras, 2007, pp.744-747.[14] M. Wan, H. K. Zhang, T. Y. Wu, et al., Anomaly detection and response approach based on mapping requests, Security and Communication Networks, 7 (2014) 2277-2292.[15] V. A. Siris, F. Papagalou, Application of anomaly detection algorithms for detecting SYN flooding attacks, 2004 IEEE Global Telecommunications Conference GLOBECOM’04, Dallas, 2004, pp.2050-2054.[16] J. L. Hellerstein, F. Zhang, P. Shahabuddin, A statistical approach to predictive detection, International Journal of Computer and Telecommunications Networking, 35(1) (2001) 77-95. [17] H. H. Takada, U. Hofmann, Application and analyses of cumulative sum to detect highly distributed denial of service attacks using different attack traffic patterns, /dissemination/newsletter7.pdf, 2004.。
NetworkAnomalies
• Automated fault diagnosis and troubleshooting (root cause analysis)
Link 1 output traffic (May 2001)
Original SNMP traffic data
Regular component
Stochastic component
Routing Analysis
• BGP: Interdomain routing protocol • Internal route monitor to all route
• Advantages:
– Uncorrelated errors, correlated anomalies – Low false alarm rate, high detection rate – Simple and robust – Scalable, automated, self-training
• List of known events over a year
– Notified by operations – Considered important – Perfect detection accuracy
Evaluation of Individual Algorithms
Data set
1. Holt-Winters
2. Decomposition-based algorithm
– Decompose into 4 components:
Anomaly detection A survey
基于多维聚类挖掘的异常检测方法研究
0引言
随着因 特 网 的 不 断 发 展,DDoS 攻 击[1],网 络 蠕 虫[2]等在当前网络流量中所占据的比例越来越大。并 且每一年都有新的攻击形式,新的蠕虫病毒不断出现, 给网络管理带来了严重的挑战。因为 DDoS 攻击、网 络蠕虫等发生时,同时会带来流量的异常,可以通过挖 掘异常流量并进行分析来发现网络中的异常行为。
图 1 IP 聚类树 协议维和端口维层次结构较简单,而 IP 维层次结 构较复杂。图 1 为 IP 聚类树,图中圈出的节点为显著 节点,该 IP 树中包含了大量冗余信息。设流量门限值 为 H,压缩门限值为 C,H = C = 20,则节点 10. 10. 16. 8 / 31 的流量为 50,大于 20,为显著节点,但是它的流量 完全等于它的两个 孩 子 节 点 的 流 量 之 和 ,所 以 该 节 点 为冗余节点。节点 10. 10. 16. 8 /30 的流量为 60,大于 20,为显著节点,但是它的流量与它的显著孩子节点流 量的差值为 10,小于压缩门限值 20,所以也是冗余节 点。 采用以下的压缩算法去除冗余信息,如果两个孩 子节点流量都是显著类,则父亲节点肯定是显著类,父 亲节点被压缩掉。如果两个孩子节点流量都不是显著 类,则只有当父亲节点流量大于门限 H 时,父亲节点 流量才为显著类,否则压缩该父亲节点。如果两个孩 子节点的流量中有一个是显著类,另一个不是,则只有 父亲节点与显著孩子节点流量之差大于门限 C 时,父 亲节点为显著类,否则压缩该父亲节点。 设 C = H = 20,对图 1 的树结构进行压缩后,可以 得到图 2 的树结构。可以看到显著类的数目从 7 个降 到了 4 个。
图 2 压缩的 IP 聚类树
2 数据流的多维流量聚类
多维流量聚类树结构,是一种类似于网状的结构, 它由多个单维聚类树组合而成。如图 3 所示,右上部 分是一个前缀树结构,代表了某个学校( 如 E 和 M) 的 不同部门的流量情况,而左上部分是一个协议的树结 构,代表了不同的 TCP 和 UDP 流量。通过组合这两个 单维聚类树,就得到了图 3 的下半部分,一个多维聚类 树。通过对每一个单 维 聚 类 树 从 上 到 下 、从 左 到 右 的
异常侦测集群AnomalyDetectionclustering
類神經網路分析 (Neural Network Analysis) : 利用類神經網路具有學習能力的特性, 經由適當入侵資料與正常資料的訓練 後,使其具有辨識異常行為發生的能 力,目前已廣泛使用在信用卡詐欺偵 測中。
Anomaly Detection的優點及缺點
異常偵測的優點:
異常偵測主要的優點是不需要針對每一個攻擊徵兆建立資料庫,並提出解 決方法,所以在資料庫的成長速度較慢,且在資料比對執行速度會比誤用 偵測速度要來的快。 異常偵測主要利用學習的技術來學習使用者行為,僅需要取出某個正常使 用者的資料模型便可以進行比對,所以節省了資料定義與輸入的時間。
異常偵測集群 Anomaly Detection clustering
Anomaly Detection的崛起
根據美國電腦網路危機處理暨協調中心報告指出,在過 去的幾年內攻擊事件正以指數方式增加,而目前最常用 於入侵偵測的方式是不當行為偵測(misuse detection), 但此方法是利用先前已知的事件建立各種攻擊模式,再 比對找出異於正常行為的行為模式。 然而缺點是必須時常更新特徵資料庫或偵測系統,倘若 現行攻擊行為不存在於攻擊模式資料中,將無法偵測此 行為。 因為如此的限制,使得近來結合Data Mining方法於異常 偵測(anomaly detection)受到廣大的矚目與研究。
資料集群演算與標稱概念圖
集群演算法本身是非監督式學習方法(unsupervised learning), 因此無法得知每一個集群得本身所含的資訊或其所代表的意義, 如下圖之資料集群演算與標稱概念圖所示,集群演算結果仍然 無法判斷測試資料的行為模式。 有鑑於此,系統的建立仍須利用標記技術(labeling technique), 標稱每一個集群為正常或攻擊模式,而這一組具標稱的集群變 成為我們實驗中異常偵測系統的核心,因此我們可以利用這些 具有標稱的集群作測試資料的比對並預測其行為模式。
An anomaly intrusion detection method by clustering normal user behavior
Sang Hyun Oh and Won Suk Lee
Department of Computer Science Yonsei University 134 Seodaemoon-gu, Shinchon-dong Seoul, 120-749, Korea e-mail:{osh,leewo}@ amadeus.yonsei.ac.kr
The methodology of intrusion detection is classified into a misuse detection model [11, 12, 13] and an anomaly detection model [1, 2, 3, 14, 15, 16, 17]. The misuse detection model utilizes the well-known weakness of a target domain. However, intrusion methods have been evolved into more sophisticated forms, and many new intrusion methods are being invented as well. As a result, handling the wellknown intrusion methods individually is not enough to preserve the security of a target domain any longer. To cope with this problem, the anomaly detection model has been studied. For anomaly detection, previous works are concentrated on statistical techniques [1, 2, 3]. To represent the characteristics of an activity in an audit data set, various features can be considered. For examples, they can be CPU usage, the frequency of a system call, the number of file accesses and so forth. Depending on the type of an activity, different features are related. The typical system of statistical analysis is NIDES [2] developed in SRI. In NIDES, a term “measure” is used to denote a feature and the abnormal rate of each measure is examined independently. NIDES models the historical behavior of a user in terms of various features and generates a long-term profile containing a statistical summary for each feature. For detecting an anomaly, the information of the on-line activities of the user is summarized into a short-term profile, and then it is compared with the long-term profile of the user. If the difference between two profiles is large enough, the on-line activities are considered as anomalous behavior. The strong point of statistical analysis is that it can generate a concise profile containing only a statistical
Network anomaly detection
专利名称:Network anomaly detection发明人:James Peroulas,Poojita Thukral,DuttKalapatapu,Andreas Terzis,Krishna Sayana申请号:US16397082申请日:20190429公开号:US10891546B2公开日:20210112专利内容由知识产权出版社提供专利附图:摘要:A method for detecting network anomalies includes receiving a controlmessage from a cellular network and extracting one or more features from the control message. The method also includes predicting a potential label for the control messageusing a predictive model configured to receive the one or more extracted features from the control message as feature inputs. Here, the predictive model is trained on a set of training control messages where each training control message includes one or more corresponding features and an actual label. The method further includes determining that a probability of the potential label satisfies a confidence threshold. The method also includes analyzing the control message to determine whether the control message corresponds to a respective network performance issue. When the control message impacts network performance, the method includes communicating the network performance issue to a network entity responsible for the network performance issue.申请人:Google LLC地址:Mountain View CA US国籍:US代理机构:Honigman LLP代理人:Brett A. Krueger更多信息请下载全文后查看。
基于多分辨率网格的异常检测方法
基于多分辨率网格的异常检测方法刘文芬1,穆晓东1,黄月华1,21.桂林电子科技大学广西密码学与信息安全重点实验室,广西桂林5410042.桂林航天工业学院计算机科学与工程学院,广西桂林541004摘要:作为一种重要的数据挖掘手段,异常检测在数据分析领域有着广泛的应用。
然而现有的异常检测算法针对不同的数据,往往需要调整不同的参数才能达到相应的检测效果,在面对大型数据时,现有算法检测的时间效率也不尽如人意。
基于网格的异常检测技术,可以很好地解决低维数据异常检测的时间效率问题,然而检测精度严重依赖于网格的划分尺度和密度阈值参数,该参数鲁棒性较差,不能很好地推广到不同类型数据集上。
基于上述问题,提出了一种基于多分辨率网格的异常检测方法,该方法引入一个鲁棒性较好的子矩阵划分参数,将高维数据划分到多个低维的子空间,使异常检测算法在子空间上进行,从而保证了高维数据的适用性;通过从稀疏到密集的多分辨率网格划分,综合权衡了数据点在不同尺度网格下的局部异常因子,最终输出全局异常值的得分排序。
实验结果表明,新引入的子矩阵划分参数具有较好的鲁棒性,该方法能较好地适应高维数据,并在多个公开数据集上都能得到良好的检测效果,为解决高维数据异常检测的相关问题提供了一种高效的解决方案。
关键词:异常检测;多分辨率网格;高维数据;子空间;数据挖掘文献标志码:A中图分类号:TP311.13doi:10.3778/j.issn.1002-8331.1908-0188刘文芬,穆晓东,黄月华.基于多分辨率网格的异常检测方法.计算机工程与应用,2020,56(17):78-85.LIU Wenfen,MU Xiaodong,HUANG Yuehua.Anomaly detection method based on multi-resolution puter Engi-neering and Applications,2020,56(17):78-85.Anomaly Detection Method Based on Multi-resolution GridLIU Wenfen1,MU Xiaodong1,HUANG Yuehua1,21.Guangxi Key Laboratory of Cryptography and Information Security,Guilin University of Electronic Technology,Guilin, Guangxi541004,China2.College of Computer Science and Engineering,Guilin University of Aerospace Technology,Guilin,Guangxi541004,ChinaAbstract:As an important means of data mining,anomaly detection is widely used in the field of data analysis.However, existing anomaly detection algorithms often need to adjust different parameters for different data to achieve the corre-sponding detection effect.In the face of big data,the detection time efficiency of existing algorithms is not satisfactory. The anomaly detection technology based on grid can well solve the problem of time efficiency of low-dimensional data anomaly detection.However,the detection accuracy depends heavily on the grid partition scale and density threshold parameters,which have poor robustness and cannot be well extended to different types of data sets.Based on the above problems,the proposed method firstly introduces a submatrix partition parameter with good robustness,divides high-dimensional data into several low-dimensional subspaces,and makes the anomaly detection algorithm carry out on the⦾大数据与云计算⦾基金项目:国家自然科学基金(No.61862011);广西自然科学基金(No.2018GXNSFAA138116);广西密码学与信息安全重点实验室研究课题(No.GCIS201704);桂林电子科技大学硕士研究生创新项目(No.2019YCXS052)。
ANOMALY DETECTION METHOD AND SYSTEM AND MAINTENANC
专利名称:ANOMALY DETECTION METHOD ANDSYSTEM AND MAINTENANCE METHOD ANDSYSTEM发明人:Yasuhiko Matsunaga,JunichiTakeuchi,Takayuki Nakata申请号:US11914156申请日:20060602公开号:US20090052330A1公开日:20090226专利内容由知识产权出版社提供专利附图:摘要:A network management apparatus in a mobile communication network holds acommunication quality index in a normal operation and periodically receives input of a communication quality measurement result from a radio base station control apparatus. When a connection failure count f with respect to a connection request count a of each radio cell is obtained as a measurement result, letting pbe the call loss rate in the normal operation, an upper probability B of a binomial distribution representing the connection failure count becomes larger than f is obtained (step ). The negative logarithm of the upper probability B is obtained as the score of the degree of abnormality (step ). Anomaly of communication is detected when the score of the degree of abnormality exceeds a predetermined threshold value (steps and ). After that, maintenance control is executed in accordance with the calculated score of the degree of abnormality, thereby appropriately avoiding a fault of the communication system. This allows to calculate the degree of abnormality from the measurement result of the communication quality index in the mobile communication network in consideration of the statistical reliability and execute maintenance corresponding to the degree of abnormality.申请人:Yasuhiko Matsunaga,Junichi Takeuchi,Takayuki Nakata地址:Tokyo JP,Tokyo JP,Tokyo JP国籍:JP,JP,JP更多信息请下载全文后查看。
基于贝叶斯网的网络异常检测方法研究
1引言自1987年Denning[1]首次提出异常检测的概念之后,异常检测很快成为入侵检测领域研究的热点。
已经有很多学者使用不同方法(如数据挖掘,人工智能等)从不同方面(如系统日志,进程调度顺序,网络流量等)对异常检测进行了研究,但是将贝叶斯网应用到网络异常检测的却很少。
Sebyala等人[2]将贝叶斯网应用到proxylet的异常检测中,根据CPU和内存利用率构建了一个仅有3个变量的贝叶斯网作为proxylet分类模型,他们的贝叶斯网过于简单,而且没有给出具体的检测方法和检测结果。
张琨等人[3]用贝叶斯网作为分类器,并用这些分类器作为入侵检测的分布式代理来构造大型网络的入侵检测系统,但测试结果表明他们的方法并不理想。
由于现有的贝叶斯网异常检测模型和方法不理想,所以本文提出一种新的基于贝叶斯网的网络异常检测方法。
2相关定义和引理2.1数据定义1,随机变量简称变量,是定义在样本空间上的函数,通常用大写字母表示,如X,Y,Z。
随机变量的取值通常用小写字母表示,如x,y,z。
随机变量X的所有可能取值的集合称为它的值域,也称状态空间,记为ΩX。
状态空间是离散的随机变量称为离散随机变量,状态空间是连续的随机变量称为连续随机变量。
如未特别说明,本文中所说的变量均指离散随机变量。
离散随机变量的状态空间ΩX的大小即其包含的可能值的个数,记为|ΩX|。
若Θ={X1,…,X n}为一组离散随机变量,变量X i的第i个取值为x ij,变量X i的状态空间为ΩXi,简记为Ωi,则Ωi={x i1,…,x im}。
Ωi的大小为|Ωi|,且|Ωi|=m。
对于任意两个变量X i和X j,i≠j,|Ωi|不一定等于|Ωj|。
定义2,设Θ={X1,…,X n}为一组离散随机变量。
由Θ中所有或部分变量的状态所构成的向量称为数据样本(data sample)或数据向量(data vector),简称样本(或向量),记为d。
数据样本d的大小即其包含的变量的个数,记为|d|。
基于机器学习的网络异常检测与防范研究
基于机器学习的网络异常检测与防范研究Chapter 1: IntroductionWith the rapid development of the Internet, network security has become a significant concern for individuals, organizations, and governments. The increasing number of cyber-attacks has highlighted the necessity for proactive measures to detect and prevent network anomalies. In recent years, machine learning techniques have shown great potential in addressing this issue. This research aims to explore the application of machine learning in detecting and preventing network anomalies.Chapter 2: Network Anomalies2.1 Definition and Types of Network AnomaliesNetwork anomalies refer to abnormal patterns or behaviors that deviate from the expected normal state of a network. They can be categorized into various types, such as network intrusion, denial of service (DoS) attacks, network traffic anomalies, and network-based malware.2.2 Challenges in Network Anomaly DetectionDetecting network anomalies poses several challenges due to the increasing complexity and diversity of cyber-attacks. These challenges include the high volume and velocity of network data, the presence ofunknown and evolving anomalies, and the necessity for real-time detection without affecting network performance.Chapter 3: Machine Learning Techniques for Network Anomaly Detection3.1 Supervised LearningSupervised learning algorithms utilize labeled training data to train a model that can classify network traffic as normal or anomalous. Popular techniques include support vector machines (SVM), random forests, and neural networks. These algorithms can achieve high accuracy but heavily rely on labeled data, which can be costly and time-consuming to obtain.3.2 Unsupervised LearningUnsupervised learning algorithms aim to detect anomalies without prior knowledge of normal and abnormal instances. Clustering algorithms, such as k-means and DBSCAN, can group similar instances together and identify outliers as potential anomalies. However, they may generate false positives and struggle to differentiate between different types of anomalies.3.3 Reinforcement LearningReinforcement learning algorithms learn from interactions with an environment to make informed decisions. In the context of network anomaly detection, reinforcement learning can be applied to createadaptive and evolving models that can dynamically update anomaly detection strategies to handle new and evolving network attacks.Chapter 4: Feature Selection and Extraction4.1 Network Data RepresentationNetwork data can be represented in various formats, such as raw packet-level data or aggregated flow-level data. Feature selection is critical to extract relevant information from the data and reduce dimensionality. Techniques like principal component analysis (PCA), information gain, and genetic algorithms can be used to select informative features.4.2 Feature EngineeringFeature engineering involves transforming and creating new features that better represent the underlying patterns of network traffic. Statistical measures, domain knowledge, and expert insights are leveraged to engineer features that capture the characteristics of normal and abnormal network behavior.Chapter 5: Evaluation Metrics and Performance Analysis5.1 Evaluation MetricsTo assess the performance of network anomaly detection systems, various evaluation metrics are employed, including accuracy, precision, recall, and F1 score. A comprehensive evaluation framework helpsresearchers and practitioners compare different approaches and select the most suitable one for specific network environments.5.2 Performance AnalysisReal-world network datasets are used to evaluate the performance of machine learning algorithms for network anomaly detection. Comparative analysis and statistical methods are employed to analyze the results, identify limitations, and propose improvements for future research.Chapter 6: Network Anomaly PreventionWhile detection is crucial, prevention plays an equally important role in network security. This chapter explores different preventive measures, including access control, firewalls, intrusion prevention systems, and network segmentation. Machine learning techniques can be integrated into these preventive measures to enhance their effectiveness and adaptability in detecting and responding to emerging threats.Chapter 7: ConclusionIn conclusion, machine learning offers promising solutions for network anomaly detection and prevention. The combination of supervised, unsupervised, and reinforcement learning techniques, along with effective feature selection and engineering, can significantly improve the accuracy and efficiency of detecting network anomalies. However, continuous research and development are needed to addressthe evolving nature of network attacks and enhance the overall resilience of network security systems.。
基于实船结构监测数据的异常检测及处理方法
装备环境工程第20卷第9期·152·EQUIPMENT ENVIRONMENTAL ENGINEERING2023年9月基于实船结构监测数据的异常检测及处理方法孙梦丹,汪雪良,吴国庆,姚骥,蒋镇涛(中国船舶科学研究中心, 江苏 无锡 214082)摘要:目的获得高质量实船结构监测数据,减少数据因外界干扰造成的异常现象,现开发一套面向真实海况的实船结构监测数据的异常处理方法。
方法将统计学中的Z-score异常检测方法与数据处理中常用的平均值计算法相结合,实现对船舶结构应力数据的异常处理。
基于正常信号创建含有异常现象的验证数据,对新的异常处理方法进行精度验证。
结果相比Hampel滤波法、Smooth平滑函数等传统的信号处理方法,Z-score异常检测及平均值计算方法的异常处理精度最高,且以此为基础计算的结构信号统计值准确度较好。
结论 Z-score异常检测及平均值计算方法可实现真实海况下的实船结构监测数据的异常处理,并在结构应力数据的价值挖掘上可提供有力支撑。
关键词:实船结构监测数据;Z-score异常值检测;平均值法;误差分析;精度验证;统计值计算中图分类号:TP39 文献标识码:A 文章编号:1672-9242(2023)09-0152-08DOI:10.7643/ issn.1672-9242.2023.09.017Anomaly Detection and Processing Method Based on Real ShipStructure Monitoring DataSUN Meng-dan, WANG Xue-liang, WU Guo-qing, YAO Ji, JIANG Zhen-tao(China Ship Scientific Research Center, Jiangsu Wuxi 214082, China)ABSTRACT: To obtain high-quality real ship structure monitoring data and reduce signal errors caused by external factors, the work aims to develop a set of fast and accurate anomaly processing methods to preprocess the real ship structure monitoring data. The method combined the Z-score anomaly detection method in statistics with the commonly used average calculation method in data processing to achieve high-precision and fast anomaly processing of ship monitoring data. Based on normal sig-nals, validation data containing abnormal phenomena was created to verify the accuracy of the new anomaly processing method.Finally, the validation data that had undergone anomaly processing were subject to structural statistical value calculations in-cluding signal filtering, signal component extraction, and signal feature value calculation, and the calculation errors were com-pared. Compared with traditional signal processing methods such as Hampel filter and Smooth smoothing function, the Z-score anomaly detection and average value calculation method had the highest accuracy in anomaly processing, and the structural sig-nal statistical values calculated based on this were the most accurate, with an average calculation error of less than 10%. In con-clusion, the Z-score anomaly detection and average value calculation method is suitable for anomaly handling of real ship struc-ture monitoring data and plays an important role in mining the value of structure data.KEY WORDS: real ship structure monitoring data; Z-score anomaly detection; average method; error analysis; accuracy veri-fication; calculation of statistical values收稿日期:2023-07-19;修订日期:2023-09-06Received:2023-07-19;Revised:2023-09-06引文格式:孙梦丹, 汪雪良, 吴国庆, 等. 基于实船结构监测数据的异常检测及处理方法[J]. 装备环境工程, 2023, 20(9): 152-159.SUN Meng-dan, WANG Xue-liang, WU Guo-qing, et al. Anomaly Detection and Processing Method Based on Real Ship Structure Monitoring Data[J]. Equipment Environmental Engineering, 2023, 20(9): 152-159.第20卷第9期孙梦丹,等:基于实船结构监测数据的异常检测及处理方法·153·为适应世界海运发展,加强海洋资源开发,船舶技术发展尤为重要。
基于机器学习的网络入侵检测方法研究
通信网络技术DOI:10.19399/j.cnki.tpt.2023.02.057基于机器学习的网络入侵检测方法研究孙玉坤1,韩聿彪2(1.中化学交通建设集团运营管理(山东)有限公司,山东济南250014;2.山东省信息技术产业发展研究院,山东济南250014)摘要:考虑到传统方法在检测网络入侵数据时存在准确率、检测率和F1分数低的问题,提出了基于机器学习的网络入侵检测方法。
根据网络入侵数据传输量的变化情况,估计出网络入侵数据的传输量,通过初始化机器学习算法的参数,获得网络入侵数据提取结果的概率矩阵,将网络入侵数据检测的特征向量作为机器学习算法的输入,构建网络入侵检测模型,实现了网路入侵的检测。
实验结果表明,所提方法在检测网络入侵数据的过程中可以有效提高检测的检测率和F1分数,具有更好的检测性能。
关键词:机器学习算法;网络入侵;特征提取;传输量;检测方法;观测向量Research on Network Intrusion Detection Method Based on Machine LearningSUN Yukun1, HAN Yubiao2(1.Sinochem Transportation Construction Group Operation Management (Shandong) Co., Ltd., Jinan 250014, China;2.Shandong Electronic Information Products Inspection Institute,Jinan 250014, China)Abstract: Considering the low accuracy, detection rate and F1 score of traditional methods in detecting network intrusion data, a network intrusion detection method based on machine learning is proposed. According to the change of network intrusion data transmission, the transmission amount of network intrusion data is estimated. By initializing the parameters of machine learning algorithm, the probability matrix of network intrusion data extraction results is obtained.The feature vector of network intrusion data detection is used as the input of machine learning algorithm to build a network intrusion detection model, and network intrusion detection is realized. The experimental results show that the method in this paper can effectively improve the detection rate and F1 score in the process of detecting network intrusion data, and has better detection performance.Keywords: machine learning algorithm; network intrusion; feature extraction; transmission quantity; detection method; observation vector0 引 言互联网技术不断发展的过程中,经常会遭到网络入侵,因此对网络入侵数据和行为进行检测,可以保证网络的安全性[1-3]。
基于随机映射与聚类的网络流量异常检测
第36卷第3期计算机仿真2019年3月文章编号:1006-9348 (2〇19 )03-〇289-〇5基于随机映射与聚类的网络流量异常检测刘雅婷1,王永程2,强延飞1,谷源涛1(1.清华大学电子工程系,北京1〇〇〇84;2.西南电子电信技术研究所,四川成都610041)摘要:研究网络安全领域的网络流量异常检测问题。
针对传统异常检测算法实时性差、对数据分布特性要求高、正确识别率 低且误判率高等问题,采用滑动窗口、多次随机映射以及无监督聚类算法相组合的新型方法。
利用随机映射进行网络数据 包的汇聚获取待测对象的时间序列;对各滑动窗口内的流量序列进行KmeanS++聚类检测得到多个待定异常集;对多个待定 异常集进行交集操作,从而得出最终异常对象集。
通过仿真得出结论,改进算法具有高准确率和低误判率,能够实时检测网 络中的异常数据。
关键词:网络流量异常检测;随机映射;聚类;时间序列;交集中图分类号:TP393.08 文献标识码:BNetwork Traffic Anomaly Detection Based onRandom Projection and ClusteringLIU Ya—ting1,WANG Yong-cheng2, JIANG Yan-fei1,GU Yuan—tao1(1. Department of Electronic Engineering, Tsinghua University, Beijing 100084, China;2. Southwest Electronics and Telecommunication Technology Research Institute, Chengdu CSichuan 610041 , China))A B S T R A C T:In t h i s paper, network t r a f f i c anomaly detection of security was researched.To solve the problems including poor r e a l time, high requirement f o r data distribution, low true positive r a t e and high fal s e positive r a t e of t r ad itional anomaly detection methods,a new combination method i s adopted, which integrates sliding time window,multiple random projections and unsupervised clustering algorithm.W e f i r s t aggregated network t r a f f i c by using ran-dom projection t o get time s e ries of object.Then, Kmeans + + clustering detection was applied with t r a f f i c series of sliding windows t o get multiple alarm sets.W e next exploited the intersection operation t o determine f i n a l anomaly st, based on M A W I L A B we experimented with dataset and obtained the conclusion t hat the new detection method has higher true positive r a t i o and lower fal s e positive r a t i o and can detect anomaly of network in r e a l time.K E Y W O R D S:Network t r a f f i c anomaly detection;Random projection;Clustering;Time series;Intersectioni引言随着互联网技术的快速发展与普及,网络安全问题变得 曰益突出,大量的威胁与安全隐患例如木马程序、蠕虫病毒、D D0S攻击等扰乱了社会的正常运转与经济的持续发展。
基于改进K-Means的网络异常检测
设计应用K-Means的网络异常检测徐翔(广东海洋大学教育信息中心,广东湛江海量网络数据流量检测可以发现网络中的异常行为,如Probe确定聚类中心具有较大的随机性,聚类结果不够精准,针对这一问题提出一种基于改进K-Means聚类算法,分布式执行网络数据聚类任务,节省聚类时间开销,再算法的聚类中心,提高网络异常数据聚类的精准度。
实验结果显示该方法检测率保持左右,具有较高的实际应用价值。
模型;最小生成树;聚类中心;异常检测Network Anomaly Detection Based on Improved K-MeansXU XiangGuangdong Ocean UniversityMassive network data traffic detection can find abnormal behavior in the networketc.The determination of clustering center by traditional K-Means algorithm has great randomness法聚类结果,求取各个对象与聚类中心之间的距离,函数将结果按照类别进行划分,相同类别数据归类为一个簇,算法获取网络异常样本数据聚类函数迭代运算的输。
此过程中,一致的数据,求取累K-Means算法新的聚类中心,以此类推进行迭代运算,最终输出结果算法的网络异常数据聚类算法以原型目标函数为基础进行硬聚类,在此类聚类算法中极具代表性,具体方式是确定原始聚类中心,通过数学方法运算原始聚类中心与聚类对象样本,从而得到新的聚类中心,完成精准的数据样K-Means挖掘算此外,用的聚类中心。
一幅完全图的最小连通子图即为最小生成树,完全图的全部点均包含在最小生成树算法之内据此定义确定聚类中心,需要将网络异常流量数据视为完全图的连接点,由此建立的带权完全图表达为:式中,H即连接点之间的长度;基于上述公式原理,通过生成树,由此确定到网络异常流量聚类结果。
基于时间序列分析的网络流量异常检测
基于时间序列分析的网络流量异常检测李彦【摘要】A network traffic anomaly detection model based on time series analysis is proposed to detect the network traffic anomaly accurately and ensure the network normal operation. The wavelet analysis is used to decompose the network traffic ac-cording to the similarity of the network traffic data, so as to divide it into the components with smaller scale. And then the gray model and Markov model of the time series analysis method are used to perform the network traffic anomaly detection for the high-frequency component and low frequency component respectively, their results are fused with the wavelet analysis, and analyzed with the simulation experiment of the network traffic anomaly. The results show that the time series analysis model has simple working process, increased the detection rate of the network traffic anomaly, its false alarm rate is lower than that of other net-work traffic anomaly detection models, and can obtain better real-time performance of the network traffic anomaly detection.%为了准确检测出网络流量的异常现象,保证网络的正常工作,提出基于时间序列分析的网络流量异常检测模型.根据网络流量数据间的相似性,采用小波分析对网络流量进行分解,划分为更小尺度的分量,然后采用时间序列分析法——灰色模型和马尔可夫模型分别对高频分量和低频分量进行网络流量异常检测,并采用小波分析对它们的检测结果进行融合,最后采用网络流量异常仿真实验进行分析.结果表明,时间序列分析模型的工作过程简单,提高了网络流量异常检测率,误检率要低于其他网络流量异常检测模型,获得更优的网络流量异常检测实时性.【期刊名称】《现代电子技术》【年(卷),期】2017(040)007【总页数】4页(P85-87,91)【关键词】网络系统;流量异常检测;灰色模型;小波分析【作者】李彦【作者单位】景德镇陶瓷大学信息工程学院,江西景德镇 333403【正文语种】中文【中图分类】TN915.07-34;TP391随着计算机技术的不断发展和成熟,网络上的业务种类越来越多,如视频,图像等,网络成为了一种主要的通信载体[1]。
磁探测定位系统噪声抑制技术研究
磁探测定位系统噪声抑制技术研究发布时间:2022-12-12T06:33:46.544Z 来源:《科学与技术》2022年16期作者:官业欣,赵文纯[导读] 本文通过基于磁偶极子理论,运用BAS算法,建立磁探测定位系统的噪声抑制模型,针对水下运动平台不同的噪声来源建立数学模型,并运用添加随机噪声的虚拟仿真验证算法,验证了噪声抑制技术的可行性。
官业欣,赵文纯(1、中国船舶集团第七一〇研究所,湖北宜昌443000;2、磁学研究中心,湖北宜昌443000)Abstract:In this paper, the noise suppression model of magnetic detection and positioning system is established by using BAS algorithm based on magnetic dipole theory, establishing mathematical models for different noise sources of underwater motion platforms, and verifying the feasibility of noise suppression technology by using virtual simulation of adding random noise to verify the algorithm. The noise suppression technique proposed in this paper has a good application value in the noise suppression of underwater motion platform, and is of great significance to the development of underwater non-acoustic detection technology in China.摘要:本文通过基于磁偶极子理论,运用BAS算法,建立磁探测定位系统的噪声抑制模型,针对水下运动平台不同的噪声来源建立数学模型,并运用添加随机噪声的虚拟仿真验证算法,验证了噪声抑制技术的可行性。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
ISSN 1000-9825, CODEN RUXUEW E-mail: jos@Journal of Software, Vol.18, No.10, October 2007, pp.2595−2604 DOI: 10.1360/jos182595 Tel/Fax: +86-10-62562563© 2007 by Journal of Software. All rights reserved.∗基于直推式方法的网络异常检测方法李洋1,2+, 方滨兴1, 郭莉1, 陈友1,21(中国科学院计算技术研究所,北京 100080)2(中国科学院研究生院,北京 100049)A Network Anomaly Detection Method Based on Transduction SchemeLI Yang1,2+, FANG Bin-Xing1, GUO Li1, CHEN You1,21(Institute of Computing Technology, The Chinese Academy of Sciences, Beijing 100080, China)2(Graduate University, The Chinese Academy of Sciences, Beijing 100049, China)+ Corresponding author: Phn: +86-10-62600951, Fax: +86-10-62600905, E-mail: liyang@, Li Y, Fang BX, Guo L, Chen Y. A network anomaly detection method based on transduction scheme. Journalof Software, 2007,18(10):2595−2604. /1000-9825/18/2595.htmAbstract: Network anomaly detection has been an active and difficult research topic in the field of intrusiondetection for many years. Up to now, high false alarm rate, requirement of high quality data for modeling thenormal patterns and the deterioration of detection rate because of some “noisy” data in the training set still make itnot perform as well as expected in practice. This paper presents a novel network anomaly detection method basedon improved TCM-KNN (transductive confidence machines for K-nearest neighbors) machine learning algorithm,which can effectively detect anomalies using normal data for training. A series of experiments on well known KDDCup 1999 dataset demonstrate that it has lower false positive rate, especially higher confidence under the conditionof ensuring high detection rate than the traditional anomaly detection methods. In addition, even provided withtraining dataset contaminated by “noisy” data, the proposed method still holds good detection performance.Furthermore, it can be optimized without obvious loss of detection performance by adopting small dataset fortraining and employing feature selection aiming at avoiding the “curse of dimensionality”.Key words: network security; anomaly detection; strangeness; TCM (transductive confidence machines);TCM-KNN (transductive confidence machines for K-nearest neighbors) algorithm摘要: 网络异常检测技术是入侵检测领域研究的热点和难点内容,目前仍然存在着误报率较高、对建立检测模型的数据要求过高、在复杂的网络环境中由于“噪音”的影响而导致检测率不高等问题.基于改进的TCM-KNN(transductive confidence machines for K-nearest neighbors)置信度机器学习算法,提出了一种网络异常检测的新方法,能够在高置信度的情况下,使用训练的正常样本有效地对异常进行检测.通过大量基于著名的KDD Cup1999数据集的实验,表明其相对于传统的异常检测方法在保证较高检测率的前提下,有效地降低了误报率.另外,在训练集有少量“噪音”数据干扰的情况下,其仍能保证较高的检测性能;并且在采用“小样本”训练集以及为了避免“维∗ Supported by the National Natural Science Foundation of China under Grant No.60573134 (国家自然科学基金); the NationalInformation Security 242 Project of China under Grant No.2005C39 (国家242信息安全计划项目)Received 2006-10-10; Accepted 2007-01-232596 Journal of Software软件学报 V ol.18, No.10, October 2007灾难”而进行特征选取等优化处理后,其性能没有明显的削减.关键词: 网络安全;异常检测;奇异值;直推式信度机;TCM-KNN算法中图法分类号: TP393文献标识码: A入侵检测系统是网络安全防御体系的一个重要组成部分,它通过对网络和主机上某些关键信息进行收集和分析,检测其中是否有违反安全策略的事件或攻击事件发生,并对检测到的事件发出警报.目前,常用的入侵检测技术主要有两种:误用检测和异常检测[1].误用检测是建立在使用某种模式或者特征描述方法对任何已知攻击进行表达这一理论基础上的.误用检测系统是将已知的攻击特征和系统弱点进行编码,存入知识库中,入侵检测系统(intrusion detection system,简称IDS)将所监视的事件与知识库中的攻击模式进行匹配.当发现有匹配时,则认为有入侵发生,从而触发相应机制.这种技术的优点是可以有针对性地建立高效的入侵检测系统,误报率低;缺点是对未知的入侵活动或已知入侵活动的变异无能为力,攻击特征提取困难,需要不断更新知识库.异常检测基于已掌握了被保护对象的正常工作模式,并假定正常工作模式相对稳定.当有入侵发生时,用户或系统的行为模式会发生一定程度的改变.一般方法是建立一个对应“正常活动”的系统或用户的正常轮廓.检测入侵活动时,异常检测程序产生当前的活动轮廓并与正常轮廓比较,当活动轮廓与正常轮廓发生显著偏离时即认为是入侵,从而触发相应机制.异常检测与系统相对无关,通用性较强.它最大的优点是有可能检测出以前从未出现过的攻击方法,不像误用检测那样受已知脆弱性的限制,然而其误报率过高.异常检测的思想最早由Denning提出[2],即通过监视系统审计记录上系统使用的异常情况,可以检测出违反安全的事件.该思想很快被应用到网络异常检测.网络异常检测方法又分为两类[3]:有指导异常检测(supervised anomaly detection)和无指导异常检测(unsupervised anomaly detection).对于前者,系统会被给定一个全部为正常样本的数据集和一系列没有被标记的样本,任务就是找出这些未标记数据是否与正常的数据有偏离;概率统计分析方法[4]、人工免疫算法[5]、数据挖掘方法[6]等都属于该类范畴.然而,该类方法需要完全“干净”的数据集来生成模型,这在复杂的网络条件下常常是不可满足的,因而在实际中的应用并不是很普遍;对于后者,系统通常被给定一个未标记的训练集,并且不知道训练集中哪些数据是正常的,哪些数据是异常的,目标就是发现其中的异常样本.Columbia大学的Eskin[7]等人提出的基于聚类的估计算法、改进的K-近邻方法以及one-class SVM(support vector machines)方法都属于此类范畴.这些方法相对于有指导异常检测方法来说,在应用范围上更加广泛,它们并不需要完全“干净”的数据集建立模型,仅仅需要提供的训练集中正常数据相对于异常数据来说占绝大多数(通常情况下,正常数据的比率通常占98.5%~99%左右,而异常数据占1%~1.5%左右).并且,其中one-class SVM方法的检测率高达98%,然而其误报率同样较高(高达10%).针对以上两类异常检测方法的优、缺点,本文提出了一种异常检测新方法,它基于TCM-KNN(transductive confidence machines for K-nearest neighbors)算法.该算法依据Kolmogorov的算法随机性理论,是一种有效的基于置信度机制的机器学习方法,已经广泛地应用于模式识别[8]、欺诈检测(fraud detection)及“离群点”检测(outlier detection)[9]等领域,并取得了较好的实践效果.本文首次将其应用于入侵检测的异常检测领域,并对其进行了改进.通过大量基于著名的KDD Cup 1999数据集的实验测试,验证了其有效性.该方法与其他同类异常检测方法相比,可以在保证高检测率的前提下,极大地减少误报率.更为重要的是,在训练集存在“噪音”数据的干扰,以及在仅有“小样本”训练集的环境中,其均能保证较高的检测性能.本文第1节介绍TCM-KNN算法的理论背景.第2节全面阐述基于该算法的网络异常检测方法.第3节给出算法的实验结果,并对实验结果进行分析和对比.第4节给出本文的结论.1 TCM-KNN算法理论背景在统计学习理论领域中,直推式(transduction)方法通常是指对于一个样本的类别预测可以直接通过训练数据中的所有样本来获得,而不是使用传统的归纳(induction)方法采用从训练数据中得出的通用规则的方法来进行[9].这一概念被广泛应用于机器学习领域,因为它只需要满足iid假设(即:待归类的样本以及用于训练的数据李洋 等:基于直推式方法的网络异常检测方法 2597集都是独立且同分布的).并且,它并不需要知道样本数据的分布类型以及分布参数.直推信度机(transductiveconfidence machines,简称TCM)[8]则使用Kolmogorov 的算法随机性理论建立了一种适应范围较广的机器学习置信度(confidence)机制.它被用来衡量一个样本分别属于已经存在的几个类别的可信程度.TCM 中所采用的置信度机制基于随机性检测.然而,Martin-Lof 证明[9],这种随机性检测是不可计算的.因此,我们必须采用一种可计算且满足Kolmogorov 的算法随机性理论的随机性检测函数来对该置信度进行估算.这种检测函数的值称为P值.我们通常将P 值定义为待分类样本属于已存在的几类样本空间的概率.其相对于某类样本空间的值越大,则表明它属于该类样本空间的可能性越大.TCM-KNN 将经典的分类算法K -近邻结合在TCM 中,采用距离计算的方法(在本文中,样本之间的距离计算均通过表示它们的特征向量进行)根据已分类的数据集对观测点进行分类.因此,在TCM-KNN 中,为了计算待检测样本的P 值,我们定义一种称为奇异值(strangeness)的指标.定义1. 待检测样本i 相对于类别y 的奇异值定义αiy 为 ∑∑−===y ij kj y ij k j iy D D 11α (1)其中,表示样本i 与类别y 中所有样本的距离的序列,则表示该序列中第j 个最短的距离;同理,则代表样本i 与其他类别(除类别y 外)中所有样本的距离序列,同样表示该序列中第j 个最短的距离.参数k 则y i D y ij D y i D −y ij D −表示我们所要考虑的最近邻的数目.通过该定义不难看出:奇异值是基于样本特征向量在特征空间上的距离来设计的.一般说来,同类别的样本由于具有相似性,它们的特征向量在特征空间上的分布具有聚集性,样本之间的距离比较小;不同类别的样本由于具有相异性,它们的特征向量在特征空间上的分布具有分散性,样本之间的距离比较大.奇异值实际上是待检测样本i 与待加入的类中其他样本最小的k 个距离之和,与其他类别中样本的最小的k 个距离之和的比率.在定义1中,本文结合K -近邻方法给出了奇异值的定义,并且采用Euclidean 距离(欧氏距离)来计算样本之间的距离,其计算方式如下所示:12(,)distance Y Y (2)其中,Y 1和Y 2分别指代两个样本(由该样本的特征向量表示),Y ij 表示特征向量Y i 的第j 维特征,|Y i |则表示特征向量Y i 的特征维数.结合定义1,我们可以给出TCM-KNN 中,P 值的计算方法如下所示.定义2. 待检测样本i 相对于类别y 的P 值计算为 1}:{#)(+≥=n j p i j i ααα (3) 其中,#表示集合的“势”,通常计算为有限集合的元素个数;αi 为待检测样本的奇异值;n 为集合的个数;αj 表示集合中任意样本的奇异值.因此,P 值可以计算为1+n j (j 为类别y 中奇异值大于待检测样本i 奇异值的样本个数). 并且在计算过程中,通常一次处理一个样本.不难看出,P 值取值区间为[0,1],并且其值越大,表明样本i 归属于类别y 的可能性越大.以定义1和定义2为基础的TCM-KNN 算法在本质上为分类算法.在处理分类问题的应用中,它试图将样本归为已有分类中的某一类.在计算过程当中,当训练集中的某类的任一样本与待分类样本的距离要小于用于计算奇异值的k 个最短距离中的最大值时,则需要为该类中所有样本重新计算奇异值,从而为待分类样本重新计算P 值(注意:对应于训练集中的每一类,待分类样本都有一个相应的P 值需要计算).最后,我们将待分类样本划分到最大的P 值所对应的类,并且确定该种分类的置信度值为1(第2最大P 值).经典的TCM-KNN 算法伪代码如下所示.算法1. 经典的TCM-KNN 算法.2598Journal of Software 软件学报 V ol.18, No.10, October 2007算法参数说明:k (选取的最近邻数目)、m (训练集样本数目)、c (已有分类数)输入:r (待检测样本);输出:class _id (样本的类别编号)./*算法开始*/for i =1 to m {根据定义1为训练集中的每个样本计算,并存储;y i D y i D −根据式(1)计算训练集中每个样本的奇异值α并存储;}for j =1 to c {对于类j 中的每个样本t ,if (>dist (t ,r ))j tk D 将r 加入类j ,并根据式(1)重新为样本t 计算奇异值α;对于非类j 中的每个样本t ,if (>dist (t ,r ))j tk D −将r 加入类j ,并根据式(1)重新为样本t 计算奇异值α;为待检测样本r 计算归属于类j 的奇异值;为待检测样本r 计算归属于类j 的P 值;}将待检测样本r 归为P 值最大时所对应的类,该分类结果的置信度为1(第2最大P 值),return class _id ;/*算法结束*/2 基于改进的TCM-KNN 算法的网络异常检测模型第1节所述的TCM-KNN 算法从本质上来说是一种基于置信度的分类方法,它只要求学习样本是独立同分布的,且不需要知道样本分布的具体类型和参数,因此适应性比较广泛.这种弱前提条件也很有利于它与其他学习机器算法的融合.其不同之处还在于,它并非从训练样本得到一个通用的判断规则后,再依此对所有未知样本进行非此即彼的判断.这种学习算法不一定需要在某个模式类别的闭集上进行,只需根据不同假设类别情况下的置信度之间的相对大小来判断.然而,将其应用于入侵检测的异常检测领域需要进一步地改进,主要是因为异常检测并不需要事先提供详尽的攻击数据,建立相应的分类.因此,不同于模式识别和误用检测,它不是一种简单的分类问题,而是根据已建立的“正常模式”对新来的数据作异常与否的一种判定.本节将详细阐述对其进行的改进以及基于此的一个网络异常检测框架的总体结构.2.1 改进的TCM-KNN 算法网络异常检测的任务是根据正常训练集建立的模型,判定新来的数据正常或者异常.根据TCM-KNN 算法的判定要求,我们可以将用于异常检测的训练集定义为从网络数据中抽取的具有正常行为模式的样本集,每个样本以其特征向量表示,其多维特征可定义为IP 地址对、端口号、协议类型、TCP 连接的统计信息等.这非常类似于KDD Cup 1999数据集,其每条记录都包含提取的41个特征.那么,接下来的异常检测任务就是需要判定新来数据的特征向量相对于正常训练集是否异常来进行判定.并且在异常检测中,训练集中只有一类正常数据,不存在多类,所以正如第1节所述,我们需要对与该算法紧密相关的奇异值进行重新定义,其新定义如下:定义3. 待检测样本i 相对于正常类别y 的奇异值定义αiy 为∑==y ij kj iy D 1α (4)其中,各个符号的含义与式(1)中的完全相同,这里不再赘述.该新定义使得不属于正常类样本的奇异值远远大于在该正常类中样本的奇异值,因而它充分地将非正常数据与正常数据进行“隔离(isolation)”.该定义先后为Daniel [9]和Angiulli [10]使用,实践中取得了很好的区分效果.李洋 等:基于直推式方法的网络异常检测方法2599因此,本文也借用了该定义.第3节的实验结果也证明了该定义在异常检测领域的有效性. 另外,基于奇异值的P 值的计算方法与定义2相同,不需要作改变,算法2给出了本文所提出的改进的面向异常检测的TCM-KNN 算法伪代码.使用改进的TCM-KNN 算法进行异常检测的流程可以简单地描述为:给定正常训练集和一批待检测样本,通过其奇异值的计算以及事先计算好的正常训练集中所有样本的奇异值,我们可以得到待检测样本相对于正常训练集的P 值,如果该值小于预定义的阈值τ(通常为0.05),则我们可以以置信度为1−τ(通常情况下为95%)来判定其为异常;否则认为其正常.算法2. 改进的用于异常检测的TCM-KNN 算法.算法参数说明:k (选取的最近邻数目)、m (训练集样本数目)、设定的置信度阈值τ输入:r (待检测样本);输出:normal 或者abnormal ./*算法开始*/for i =1 to m {根据定义1为训练集中的每个样本计算并存储;y i D 根据式(4)计算训练集中每个样本的奇异值α并存储;}根据式(4)计算待检测样本r 的奇异值;根据式(3)计算待检测样本r 的P 值;if (p ≤τ)以置信度(1−τ)判定样本r 为异常,return abnormal ;else以置信度(1−τ)判定样本r 为正常,return normal ;/*算法结束*/下面我们简单分析该算法的时间复杂度.首先,为了确定正常训练集中各样本的奇异值,需要耗费O (m 2)的时间开销.其次,为了计算s 个待检测样本的奇异值,则需要O (sm )的时间开销,而计算其相应的P 值则只需要时间开销O (m ).不难看出,第一个时间开销大的运算结果都可以在实际的异常检测中通过一次离线计算方式得到并多次使用,不需要在异常检测的判定中临时计算,而只有后两个时间开销(O (sm )+O (m ))所完成的计算需要在判定时计算而得.由于我们的检测模式是每次一个样本的判定方法.因此不难看出,影响本算法时间开销的主要因素集中在数据集的规模以及样本所对应的特征向量的维数上,在实践中我们对此可以加以控制以降低时间开销.本文第3节将会对此以实验说明对本方法采用“小样本”训练和降维处理的可行性.2.2 基于TCM-KNN 算法的异常检测框架本节基于上述改进的TCM-KNN 算法构建了一个网络异常检测框架,旨在说明如何在实际中使用该方法进行异常检测.如图1所示,在示意图的下半部分,训练阶段(training phase)为了建立实际应用中正常的训练集的工作主要包括如下几部分:• 正常数据收集(normal data collection):从网络中采集能够反映应用正常行为的数据,不包括异常行为数据.用于构建检测阶段中用于异常检测的正常行为数据集;• 正常数据选择(data selection):为了降低本文所述方法在距离计算中由于正常训练集规模过大而导致的计算量庞大的问题,需要根据实际应用对正常的网络数据进行有针对性的采样,比如:根据协议(Http,FTP,SMTP 等)的不同选取代表性的流量;根据一定的时间间隔来选择统计信息(例如:2秒内网络中SYN 和ACK/SYN 包的比率)等方法,而不是对所有时刻的所有流量都进行相应的元信息和统计记录;• 特征选择及向量化(feature selection and vectorlize):为了避免距离计算中有可能遇到的“维灾难”问题,需要对设定的特征进行特征选择.然后,将所有正常数据映射为表征数据的特征向量加入正常训练集,2600 Journal of Software软件学报 V ol.18, No.10, October 2007作为异常检测的基准库(baseline).图1上半部分给出了检测阶段(detection phase)的主要工作:• 数据采集(data collection):网络数据采集模块主要负责从所监控的目标网络段中收集原始的网络数据.主要包括网络数据在链路层、网络层和传输层的数据元信息等;• 数据预处理(data preprocess):由于改进的TCM-KNN算法处理的对象是由特征向量表征的独立点,并且用于异常检测的正常训练集也是由许多独立同分布的点所构成的,所以,该模块负责将从数据采集模块中收集来的原始数据按照实际应用中事先定义好的特征,处理成由这些特征组成的特征向量,交由后续模块处理;• 异常检测(anomaly detection based on TCM-KNN):该模块则使用改进的TCM-KNN算法,根据正常训练集中的所有特征向量对新来的数据一个一个地进行判定.Fig.1 An anomaly detection framework based on TCM-KNN图1 基于TCM-KNN算法的异常检测框架3 实验及其结果分析本节我们将对所提出的异常检测方法的有效性进行验证.为了保证实验的说服力和方便性,本节采用研究领域共同认可及广泛使用的基准评测数据集KDD Cup 1999进行测试.由于该数据集的完备性,其实质上已经完成了图1中所述的训练阶段的大部分工作.我们只需要在实验中对数据进行相应的提取和特征选择工作,本节后面将会作详细介绍.在实验中,首先,我们将本文所述方法与无指导异常检测领域较为著名的Cluster方法、K-近邻方法(K-nearest neighbors,简称KNN)和one-class SVM方法以及常用的基于神经网络和基于超球面(quarter-sphere)空间划分的SVM方法的异常检测效果进行了比较;然后,我们评估了本文所述方法在训练集中的正常数据存在“噪音”数据(攻击数据)干扰下的性能;最后,我们测试了该方法在降低运算时间开销(采用“小样本”的正常训练集进行训练以及对训练样本实行降维处理)情况下的性能.在实验中,本文采用的评价指标为国际上通用的检测率(true positive rate,简称TP)和误报率(false positive rate,简称FP)指标.3.1 实验数据集本文采用的KDD Cup 1999数据集包括大约4 900 000条数据记录,每条都是从军方网络环境中模拟攻击所得的原始网络数据中根据设定的41个特征提取出来的,它们都是描述网络连接统计信息的特征向量,包含有5类数据:DoS,Probe,R2L,U2R这4类攻击数据(共包含24种攻击类型)以及正常数据.为了进行上述几个实验,我们将KDD Cup 1999的数据集进行了提取.对于本文所述方法与无指导异常检测方法的对比实验,我们从随机数据集中提取了196 485条正常数据和2 050条攻击数据(包括上述4类攻击),攻击数据占整个数据集的1%,这主要是为了满足无指导异常检测方法的需求[7];对于后续对比本文所述方法在有“噪音”数据干扰下以及降低运算时间开销情况下和正常情况下的性能,我们对上述随机提取的数据集进行了再处理,具体数据组成将在相应实验中加以详述.李洋 等:基于直推式方法的网络异常检测方法26013.2 数据预处理 在该数据集所提取的41个特征中,主要有两类数据类型:数值型和名词型.为了应用TCM-KNN 算法进行实验,首先需要对其中的数值型数据进行归一化(normalization)处理,因为需要计算特征向量间的欧氏距离,而该距离容易出现由于取值范围的差异,而造成一个数值型数据影响另一个数值型数据的情况,所以需要对它们进行处理.归一化处理的方法步骤为:首先,分别计算出训练样本每个特征属性的均值和标准差:11[][]n i i mean j instance j n ==∑ (5)[]standard j (6) 其中,instance i [j ]表示训练样本i 中的第j 个属性,n 表示样本的数目.然后,我们将训练集中的样本按如下方式转换: [][][][]instance j mean j newinstance j standard j −= (7) 可见,式(7)实际上是将属性的取值转换为这个取值偏离均值时标准差的倍数,这样,我们就可以把样本的属性值从它自己的取值空间映射到标准的取值空间.对于数据集中诸如协议类型、服务类型等名词型属性,我们则根据其每个取值在取值空间中出现的频率进行标准化,这样,这些属性的取值空间将被限定在0~1之间.3.3 与相关工作的对比实验对比本文方法与Columbia 大学的Eskin [7]等人提出的3种著名的无指导异常检测算法的性能,我们在实验中使用了随机提取出来的196 485条正常数据和2 050条攻击数据,采用十折交叉验证(ten fold cross-validation)的方法得到了如图2所示的ROC(receiver operating characteristic)曲线示意图(图中的每条ROC 曲线通过调整相应算法的阈值得到).不难看出,本文方法具有很高的检测率,同时保证了相对较低的误报率,效果非常理想.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 TCM-KNN Cluster KNN One-Class SVM 1.00.90.80.70.60.50.40.30.20.10.0T r u e p o s i t i v e r a t eFalse positive rate Fig.2 ROC curves for contrast experiment图2 对比实验的ROC 曲线示意图同时,为了更有力地说明本文方法的有效性,我们采用上述的2 050条攻击数据作为独立测试集,将本文方法对几类具体攻击的检测效果与效果较好的基于神经网络(neural networks)[11,12]和基于超球面(quarter-sphere)空间划分方法的SVM 检测方法[13]的检测效果进行了详细对比,结果见表1.结果表明,本文所述方法在几类具体攻击的检测效果上均明显优于其他方法,充分说明了本文所述方法的有效性.然而,在U2R 和R2L 这两类攻击的检测效果上还有待提高.这主要是因为这类攻击在行为上与正常的行为有极大的相似性,非常难以分辨,因而检测率并不是十分理想,这项研究将成为我们下一步检测工作的重点.。