大数据数据挖掘培训讲义:偏差检测
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
The large increase in m1 in group s1 was caused by an increase in m3, which was caused by a rise in m5 , primarily in sector s13.
13
Report Generation
▪ Automatic generation of business-user-oriented reports
▪ Drill Down through the search space ▪ Generate a finding for each measure
▪ deviation from previous period ▪ deviation from norm ▪ deviation projected for next period, if no action
▪Focus on what is actionable!
4
Problem: Healthcare Costs
▪ Healthcare costs in US: 1 out of 7 GDP $ and rising
▪ potential problems: fraud, misuse, … ▪ understanding where the problems are is first step to
▪ Natural language generation with template matching
▪ Graphics ▪ delivered via browser
14
Sample KEFIR pages
Overview Inpatient admissions
16
Status
▪ Prototype implemented in GTE in 1995
Then Utilization review is needed in the area of admission certification.
Expected Savings: 20%
Explanation
A measure is explained by finding the path of related measures with the highest impact
10
Interestingness of Deviations
Impact: how much the deviation affects the bottom line Savings Percentage: how much of the deviation from the norm can be expected to be saved by the action
▪ KEFIR received GTE’s highest award for technical achievement in 1995
▪ Key business user left GTE in 1996 and system was no longer used
▪ Publication:
5
GTE Key Findings Reporter: KEFIR
▪ KEFIR Approach:
▪ Analyze all possible deviations ▪ Select interesting findings ▪ Augment key findings with:
▪ Explanations of plausible causes ▪ Recommendations of appropriate actions
Summarization and Deviation
Detection --
What is new?
Outline
▪ Summarization ▪ KEFIR – Key Findings Reporter ▪ WSARE – What is Strange About
Recent Events
▪ Selecting and Reporting What is Interesting: The KEFIR Application to Healthcare Data, C. Matheus, G. Piatetsky-Shapiro, and D. McNeill, in Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996
▪ Convert findings to a user-friendly report with text and graphics
6
KEFIR Search Space
Drill-Down Example
8
ቤተ መጻሕፍቲ ባይዱ
What Change Is Important?
9
Deviation Detection
fixing them
▪ GTE – self insured for medical costs
▪ GTE healthcare costs – $X00,000,000
▪ Task: Analyze employee health care data and generate a report that describes the major problems
2
What is New?
Old data
new data
3
Summarization
▪Concisely summarize what is new and different, unexpected
▪ with respect to previous values ▪ with respect to expected values ▪…
Recommendations
Hierarchical recommendation rules define appropriate intervention strategies for important measures and study areas.
Example: If measure = admission rate per 1000 & study_area = Inpatient admissions & percent_change > 0.10
13
Report Generation
▪ Automatic generation of business-user-oriented reports
▪ Drill Down through the search space ▪ Generate a finding for each measure
▪ deviation from previous period ▪ deviation from norm ▪ deviation projected for next period, if no action
▪Focus on what is actionable!
4
Problem: Healthcare Costs
▪ Healthcare costs in US: 1 out of 7 GDP $ and rising
▪ potential problems: fraud, misuse, … ▪ understanding where the problems are is first step to
▪ Natural language generation with template matching
▪ Graphics ▪ delivered via browser
14
Sample KEFIR pages
Overview Inpatient admissions
16
Status
▪ Prototype implemented in GTE in 1995
Then Utilization review is needed in the area of admission certification.
Expected Savings: 20%
Explanation
A measure is explained by finding the path of related measures with the highest impact
10
Interestingness of Deviations
Impact: how much the deviation affects the bottom line Savings Percentage: how much of the deviation from the norm can be expected to be saved by the action
▪ KEFIR received GTE’s highest award for technical achievement in 1995
▪ Key business user left GTE in 1996 and system was no longer used
▪ Publication:
5
GTE Key Findings Reporter: KEFIR
▪ KEFIR Approach:
▪ Analyze all possible deviations ▪ Select interesting findings ▪ Augment key findings with:
▪ Explanations of plausible causes ▪ Recommendations of appropriate actions
Summarization and Deviation
Detection --
What is new?
Outline
▪ Summarization ▪ KEFIR – Key Findings Reporter ▪ WSARE – What is Strange About
Recent Events
▪ Selecting and Reporting What is Interesting: The KEFIR Application to Healthcare Data, C. Matheus, G. Piatetsky-Shapiro, and D. McNeill, in Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996
▪ Convert findings to a user-friendly report with text and graphics
6
KEFIR Search Space
Drill-Down Example
8
ቤተ መጻሕፍቲ ባይዱ
What Change Is Important?
9
Deviation Detection
fixing them
▪ GTE – self insured for medical costs
▪ GTE healthcare costs – $X00,000,000
▪ Task: Analyze employee health care data and generate a report that describes the major problems
2
What is New?
Old data
new data
3
Summarization
▪Concisely summarize what is new and different, unexpected
▪ with respect to previous values ▪ with respect to expected values ▪…
Recommendations
Hierarchical recommendation rules define appropriate intervention strategies for important measures and study areas.
Example: If measure = admission rate per 1000 & study_area = Inpatient admissions & percent_change > 0.10