英语教育测量与评价-效度

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

3.2 Criterion—related validity
(效标关联效度)
Definition: Classification

Definition:


Another approach to test validity is to see how far results on the test agree with those provided by some independent and highly dependable assessment of the candidates ability. (CET摸拟题与CET实考题的各项分数 的相关性) 效标就是考察检定考试效度的参照标准,效标关联效度也就 是与该参照标准有关的效度。效标关联效度就是以某一种测 验分数与其效标分数之间的相关来表示的效度。换言之,一 组考生参加某种考试的分数与同一组考生参加另一公认、的 有较高效度的测验结果的一致性程度就是该考试的效标效度。
3.3 Construct Validity(结构效度) :

A test, part of a test, or a testing technique is said to have construct validity if it can be demonstrated that it measures just the ability which it is supposed to measure.
There are two kinds of criterion— related validity:

Concurrent validity Concurrent ( 共时性 ) validity is established when the test and criterion are administered at about the same time.(例如:设计一次口语测试,因为没有充足的时间去测试 课程标准要求的全部语言功能,所以,对每位测试者进行10 钟的口试。但我们不能确定 10 分钟的口试能否反映被试着 的实际口语能力。我们先设计了一次考试,考试时间 45 分 钟,并覆盖较多的语言功能,以此,建立了一个口试标准; 再将着10分钟的口语考试结果与这个标准作相关分析,然后 确定这十分钟的效度。)

3.1 Content validity

A test is said to have content validity if its content constitutes a representative sample of the language skills, structures, etc. with which it is meant to be concerned.

按照假设分别测量各项能力:
假设: ↙ 标点 / ∣ ﹨ 70 66 68 写作能力 ↓ 语法 / ∣ ﹨ 55 / ↘ 体裁 ∣ ﹨
单选 填空 改错 单选 填空 改错 单选 填空 改错
60
62
85
80
82
3.4 Face Validity


e.g. An oral test doesn’t require candidates to speak would not have face validity. A test is said to have face validity if it looks as if it measures what it supposed to measure.
Reader
Activities:
Consider any test with which you are familiar. Assess each of them in terms of the various kinds of validity that have presented in this topic.
e.g. Could a proficiency test predict a student’s ability to cope with a graduate course at a British university.
Discussion: Validating a placement test. How?
An oral achievement test
Course objectives A large number of ‘functions’ 45 minutes Full test scores 10 minutes? shortened test scores
Evidence for concurrent validity: Degree of agreement

The word ‘construct’ refers to any underlying ability (or trait) which is hypothesized in a theory of language ability.


For example, when testing reading ability, we hypothesize that the ability to read involves a set of sub-abilities, such as guessing unknown word meanings from the context. Then we should refer to research literature to establish whether or not such an ability existed and could be measured in a part of the test. If we were able to demonstrate that we were indeed measuring that ability, then we say the test has construct validity.

calculation of degree of agreement: correlation coefficient (possible value: -1~1, the higher the better) 1: perfect(positive) agreement 0: zero agreement -1: negative agreement
A comparison of test specification and test content is the basis for judgments as to content validity

The importance of content validity
Firstly, the greater a test’s content validity, the more likely to be an accurate measure of what is supposed to measure .
Secondly, such a test, a test-specifications not represented at all, is likely to have a harmful backwash effect. Areas which are not tested are likely to become ignored in teaching and leaning.

In order to judge whether or not the test has a content validity ,we need a specification of the skills or structures, ect. that it is meant to cover.
Hypothesis: Our theory of writing tells us that underlying writing ability are a number of sub-abilities: control of punctuation sensitivity to styles structures coherence and cohesion…. In order to measure these sub-abilities, we administer a pilot test which involves two steps:
用选择题测试写作能力(间接考试)步骤:
1.
2.
找一组被试(第一次参加考试),对他们的写作 能力全面抽样(让他们做包含标点、语法、体裁 等的选择题 ); 把测试结果与范文相比较:如果结果与范文相关 系数高,说明可以用选择题测试写作能力
1. We obtain extensive samples of the writing ability of the group to whom test is first administered, and have these reliably scored. 2. We then compare scores on the pilot test with the scores given for the samples of writing. If the agreement is high, we are measuring writing ability with the test.
Topic 3 Validity
Validity
Definition: A test is said to be valid if it measures accurately what it is intended to measure.
Types of validity:
Content validity Criterion-related validity Construct validity Face validity
A test of oral ability for a high level diplomatic post Not rational to run the risk of appointing someone with insufficient oral ability Lower agreement NOT allowed


Baidu Nhomakorabea
结构效度研究是一个搜集证据证实某一考试的确在 测试我们想测试的那种心理属性的过程。一个考试 如果被证实是在测量我们想测量的那种能力,那么 这个考试就具有较好的结构效度,它是一个编制假 设再证实假设的过程。
How do we know that the test is measuring writing ability?
A brief interview forming part of a placement test
Lower agreement allowed

Calculating correlation coefficient in SPSS.

Predictive Validity This concerns the degree to which a test can predict candidates’ future performance.
End !
相关文档
最新文档