STATISTICALLYSIGNIFICANT,BUTISITSIGNIFICANT

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

What Researchers Really Want/Need to Know
• Is the effect we see in our data real, or is it due to chance? • If we believe the effect is real, how large is the effect? • Given the size of the effect, is it of any practical use?
American Psychological Association Task Force
• The APA convened a task force in response to the 1994 American Psychologist article by Jacob Cohen, “The Earth is Round (p<.05)”
Yate’s View
In 1951 statistician, Frank Yates, made the following observation that use of the NHST--
“has caused scientific research workers to pay undue attention to the results of the tests of significance they perform on their data, and too little to the estimates of the magnitude of the effects they are investigating. . . . The emphasis on tests of significance, and the consideration of the results of each experiment in isolation, have had the unfortunate consequence that scientific workers have often regarded the execution of a test of significance on an experiment as the ultimate objective.”
APA Task Force Suggestions
Hypothesis Tests
“It is hard to imagine a situation in which a dichotomous accept-reject decision is better than reporting an actual p value or, better still, a confidence interval. Never use the unfortunate expression ‘accept the null hypothesis.’ Always provide some effect size estimate when reporting a p value.”
APA Task Force Suggestions
Effect Sizes
“Always present effect sizes for primary outcomes. If the units of measurement are meaningful on a practical level (e.g., number of cigarettes smoked per day), then we usually prefer an unstandardized measure (regression coefficient or mean difference) to a standardized measure (r or d). It helps to add brief comments that place these effect sizes in a practical and theoretical context.”
• Significant results are often not replicated • P values can be made to be arbitrarily small by using
large samples
Two More Issues
• If the significance level is 0.05, then 1 out of 20 results are due to chance. Because medical research results that reject the null hypothesis are more likely to be published, the false positive rate is likely to be higher • In survey research with long lists of questions, false positive are a certainty
• Assumed that when results are significant, the experiments will be replicated to confirm the results
• Concluded that a large p-value indicated that a larger sample size was needed to improve the effect estimate
• The task force addressed the problems of NHST and developed guidelines to be followed in APA journals.
• The 1999 article in the American Psychologist, “Statistical Methods in Psychology Journals: Guidelines and Explanations” contains an ommendations. The three following slides contain quotes from this article.
• Some equate the rejection of the null hypothesis with effect size
• Assuming that rejection of the null hypothesis implies that the result is generalizable
• Sample size • Significance level of the test • Magnitude of the difference between the null
and alternative hypotheses • Power of the significance test
“Critical tests of this kind may be called tests of significance, and when such tests are available we may discover whether a second sample is or is not significantly different from the first.”
STATISTICALLY SIGNIFICANT,
BUT IS IT SIGNIFICANT?
EVALUATING AND PRESENTING RESULTS THAT ARE MEANINGFUL
Nora Galambos, PhD Office of Institutional Research Stony Brook University
Hypothesis Testing Basics
APA Task Force Suggestions
Interval Estimates “Interval estimates should be given for any effect sizes involving principal outcomes. Provide intervals for correlations and other coefficients of association or variation whenever possible.”
“What’s Wrong with NHST?”
“What we want to know is ‘Given these data, what is the probability that Ho is true?’ But as most of us know, what it tells us is ‘Given that Ho is true, what is the probability of these (or more extreme) data?’ These are not the same…”
Confidence intervals can be used to compare results across studies.
Just What is Statistical Significance?
A Brief Review
• The purpose of this review is to demonstrate the relationship between:
Null Hypothesis Significance Testing (NHST)
The statistician and English geneticist, Ronald Fisher was responsible for the significance testing concept, writing the following in his 1925 book, Statistical Methods for Research Workers
In Defense of Fisher
• Realized that small studies can not be used as the final confirmation of results
• Operated under the philosophy that a few false positive results were better than possibly missing something useful
Jacob Cohen
Other Issues and Problems with NHST
• Equating statistical significance with scientific importance
• Assuming that “no relationship” is proven when the result is not significant or the test fails to reject
Association Abundance
Sterne, A. C. & Smith, G. D. (2001). Sifting the evidence: What’s wrong with significance tests? British Medical Journal. 322 (7280), 226-231
相关文档
最新文档