数据挖掘之异常检测汇编

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
– Univariate
– Multivariate
Engine Temperature
192 195 180 199 19 177 172 285 195 163
10
Input Data
• Most common form of data handled by anomaly detection techniques is Record Data
Speaker: Wentao Li
Outline
• Definition • Application • Methods
– Limited time, So I just draw the picture of anomaly detection, for more detail, please turn to the paper for help.
What are Anomalies?
• Anomaly is a pattern in the data that does not conform to the expected behavior
• Anomaly is A data object that deviates significantly from the normal objects as if it were generated by a different mechanism
• Map
– Related areas(theory) – Application(practice) – Problem formulation
• Detection effect +
Aspects of Anomaly Detection Problem
• Nature of input data
often not precise • Availability of labeled data for training/validation • The exact notion of an outlier is different for different
application domains • Data might contain noise • Normal behavior keeps evolving • Appropriate selection of relevant features
Related problems
• Outliers are different from the noise data
– Noise is random error or variance in a measured variable
– Noise should be removed before outlier detection – Outliers are interesting: It violates the mechanism
– Univariate
– Multivariate
Tid SrcIP
Start time
Dest IP
Dest Number Attack Port of bytes
1 206.135.38.95 11:07:20 160.94.179.223 139 192 No
2 206.163.37.95 11:13:56 160.94.179.219 139 195 No
that Baidu Nhomakorabeaenerates the normal data
• Outlier detection vs. novelty detection: early stage, outlier; but later merged into the model
Key Challenges
• Defining a representative normal region is challenging • The boundary between normal and outlying behavior is
3 206.163.37.95 11:14:29 160.94.179.217 139 180 No
• Also referred to as outliers, exceptions, peculiarities, surprises, etc.
• Anomalies translate to significant (often critical) real life entities
– Cyber intrusions – Credit card fraud – Faults in mechanical systems
Anomaly Detection: A introduction
Source of slides: Tutorial At American Statistical Association (ASA2008) Jiawei Han-data mining : concepts and techniques Tutorial at the European Conference on Principles and Practice of Knowledge Discovery in Databases
• Output of anomaly detection
– Score vs label
• Evaluation of anomaly detection techniques
– What kind of detection is good
Input Data
• Most common form of data handled by anomaly detection techniques is Record Data
– What is the characteristic of input data
• Availability of supervision
– Number of label
• Type of anomaly: point, contextual, structural
– Type of anomaly
相关文档
最新文档