Each year, multiple versions of each of the
six IELTS modules (Listening, Academic Reading, General Training
Reading, Academic Writing, General Training Writing, and Speaking)
are released for use by centres testing IELTS candidates.
Reliability estimates for the objectively and subjectively scored
modules used in 2010 are reported below.
Reliability of objectively-scored
modules (Reading and Listening)
The reliability of Listening and Reading tests
is reported using Cronbach's alpha, a reliability estimate which
measures the internal consistency of the 40-item test. The
following Listening and Reading material released in 2010 had
sufficient candidate responses to estimate and report meaningful
reliability values as follows:
|
Module (All Academic and General
Training versions)
|
|
Alpha
|
|
Listening version
|
487
|
0.89
|
|
Listening version
|
488
|
0.91
|
|
Listening version
|
489
|
0.90
|
|
Listening version
|
490
|
0.91
|
|
Listening version
|
491
|
0.90
|
|
Listening version
|
492
|
0.90
|
|
Listening version
|
493
|
0.91
|
|
Listening version
|
494
|
0.90
|
|
Listening version
|
495
|
0.90
|
|
Listening version
|
496
|
0.89
|
|
Listening version
|
497
|
0.90
|
|
Listening version
|
498
|
0.89
|
|
Listening version
|
499
|
0.91
|
|
Listening version
|
500
|
0.92
|
|
Listening version
|
501
|
0.90
|
|
Listening version
|
502
|
0.89
|
|
Listening version
|
503
|
0.92
|
|
Listening version
|
504
|
0.92
|
|
Listening version
|
505
|
0.91
|
|
Listening version
|
506
|
0.89
|
|
Listening version
|
507
|
0.91
|
|
Listening version
|
508
|
0.91
|
|
Listening version
|
509
|
0.91
|
|
Listening version
|
510
|
0.91
|
|
Listening version
|
511
|
0.91
|
|
Listening version
|
512
|
0.92
|
|
Listening version
|
513
|
0.91
|
|
Listening version
|
514
|
0.92
|
|
Listening version
|
515
|
0.90
|
|
Listening version
|
516
|
0.92
|
|
Listening version
|
517
|
0.91
|
|
Listening version
|
518
|
0.93
|
|
Listening version
|
519
|
0.92
|
|
Listening version
|
520
|
0.90
|
|
Listening version
|
521
|
0.90
|
|
Listening version
|
522
|
0.91
|
|
Listening version
|
523
|
0.91
|
|
Listening version
|
524
|
0.89
|
|
Listening version
|
525
|
0.90
|
|
Listening version
|
526
|
0.91
|
|
Listening version
|
527
|
0.90
|
|
Listening version
|
528
|
0.90
|
|
Listening version
|
529
|
0.91
|
|
Listening version
|
530
|
0.91
|
|
Listening version
|
531
|
0.90
|
|
Listening version
|
532
|
0.91
|
|
Listening version
|
533
|
0.92
|
|
Listening version
|
534
|
0.87
|
|
Average Alpha across
versions
|
|
0.91
|
|
Module
|
|
Alpha
|
|
General Training reading version
|
372
|
0.87
|
|
General Training reading version
|
373
|
0.90
|
|
General Training reading version
|
374
|
0.89
|
|
General Training reading version
|
375
|
0.90
|
|
General Training reading version
|
376
|
0.91
|
|
General Training reading version
|
377
|
0.91
|
|
General Training reading version
|
378
|
0.91
|
|
General Training reading version
|
379
|
0.91
|
|
General Training reading version
|
380
|
0.92
|
|
General Training reading version
|
381
|
0.92
|
|
General Training reading version
|
382
|
0.93
|
|
General Training reading version
|
383
|
0.91
|
|
General Training reading version
|
384
|
0.89
|
|
General Training reading version
|
385
|
0.92
|
|
General Training reading version
|
386
|
0.90
|
|
General Training reading version
|
387
|
0.92
|
|
General Training reading version
|
388
|
0.93
|
|
General Training reading version
|
389
|
0.92
|
|
General Training reading version
|
390
|
0.94
|
|
General Training reading version
|
391
|
0.91
|
|
General Training reading version
|
392
|
0.90
|
|
General Training reading version
|
393
|
0.92
|
|
General Training reading version
|
394
|
0.91
|
|
General Training reading version
|
395
|
0.92
|
|
Average Alpha across
versions
|
|
0.91
|
|
Module
|
|
Alpha
|
|
Academic reading version
|
487
|
0.88
|
|
Academic reading version
|
488
|
0.90
|
|
Academic reading version
|
489
|
0.92
|
|
Academic reading version
|
490
|
0.90
|
|
Academic reading version
|
491
|
0.90
|
|
Academic reading version
|
492
|
0.92
|
|
Academic reading version
|
493
|
0.89
|
|
Academic reading version
|
494
|
0.87
|
|
Academic reading version
|
495
|
0.91
|
|
Academic reading version
|
496
|
0.88
|
|
Academic reading version
|
497
|
0.92
|
|
Academic reading version
|
498
|
0.91
|
|
Academic reading version
|
499
|
0.84
|
|
Academic reading version
|
500
|
0.91
|
|
Academic reading version
|
501
|
0.90
|
|
Academic reading version
|
502
|
0.89
|
|
Academic reading version
|
503
|
0.89
|
|
Academic reading version
|
504
|
0.89
|
|
Academic reading version
|
505
|
0.93
|
|
Academic reading version
|
506
|
0.91
|
|
Academic reading version
|
507
|
0.92
|
|
Academic reading version
|
508
|
0.85
|
|
Academic reading version
|
509
|
0.86
|
|
Academic reading version
|
510
|
0.90
|
|
Academic reading version
|
511
|
0.89
|
|
Academic reading version
|
512
|
0.90
|
|
Academic reading version
|
513
|
0.91
|
|
Academic reading version
|
514
|
0.91
|
|
Academic reading version
|
515
|
0.88
|
|
Academic reading version
|
516
|
0.92
|
|
Academic reading version
|
517
|
0.90
|
|
Academic reading version
|
518
|
0.91
|
|
Academic reading version
|
519
|
0.91
|
|
Academic reading version
|
520
|
0.91
|
|
Academic reading version
|
521
|
0.89
|
|
Academic reading version
|
522
|
0.92
|
|
Academic reading version
|
523
|
0.90
|
|
Academic reading version
|
524
|
0.90
|
|
Academic reading version
|
525
|
0.92
|
|
Academic reading version
|
526
|
0.91
|
|
Academic reading version
|
527
|
0.90
|
|
Academic reading version
|
528
|
0.89
|
|
Academic reading version
|
529
|
0.89
|
|
Academic reading version
|
530
|
0.91
|
|
Academic reading version
|
531
|
0.89
|
|
Academic reading version
|
532
|
0.89
|
|
Academic reading version
|
533
|
0.90
|
|
Academic reading version
|
534
|
0.90
|
|
Average Alpha across
versions
|
|
0.90
|
The figures reported for Listening and Reading
modules indicate the expected levels of reliability for tests
containing 40 items. On the basis of these reliability figures, an
estimate of the standard error of measurement (SEM) may be
calculated for these modules using the following formula:

st is the standard deviation of the
test
rxx' is the reliability of the
test
Table 1 Mean, standard deviation and standard
error of measurement of Listening and Reading (2010)
|
|
Mean
|
St deviation
|
SEM
|
|
Listening
|
6.04
|
1.30
|
0.389
|
|
Academic Reading
|
5.97
|
1.21
|
0.382
|
|
General Training Reading
|
5.74
|
1.37
|
0.412
|
The SEM should be interpreted in terms of the
final band scores reported for Listening and Reading modules (which
are reported in half-bands).
Reliability of subjectively-scored
modules (Writing and Speaking)
The reliability of the Writing and Speaking
modules cannot be reported in the same manner as for
Reading/Listening because they are not item-based; candidates'
writing and speaking performances are rated by trained and
standardised examiners according to detailed descriptive criteria
and rating scales. The assessment criteria used for rating Writing
and Speaking performance are described in the IELTS 2006 Handbook.
Benchmarked example writing performances and CD-based speaking
performances at different levels can be found, along with examiner
comments, in the IELTS official practice materials which can be
ordered from the IELTS website. User-oriented band descriptors
describing levels of Writing and Speaking performance are also
available on the website. In addition, the “IELTS Scores Explained”
DVD provides information specifically tailored to organizations
wanting a detailed description of IELTS scores. This information
helps in setting appropriate standards of English proficiency.
Click here for more information.
Reliability of rating is assured through the
face-to-face training and certification of examiners and all must
undergo a retraining and recertification process every two years. A
Professional Support Network (PSN) manages and standardizes the
examiner cadre, including face to face examiner monitoring as well
as distance monitoring (using recordings of the Speaking tests). A
‘jagged profile’ system maintains a further check on the global
reliability of IELTS performance assessment. Routine targeted
double marking identifies the level of divergence (i.e., jagged
profile) between Writing and/or Speaking scores and Reading and
Listening scores. This process allows for the identification of
possible misclassified candidates. The jagged profile system is
also combined with ‘Targeted sample monitoring’ to further identify
possible faulty ratings by examiners. Selected centres worldwide
are required to provide a sample of examiners' marked tapes and
scripts. Tapes and scripts are then second-marked by a team of
IELTS Principal Examiners and assistant Principal Examiners.
Principal Examiners monitor for quality of both test conduct and
rating, and feedback is returned to each test centre. The outcomes
that emerge from these reliability measures feed back into examiner
retraining and continually build on quality management and
assurance systems for IELTS.
Experimental generalisability studies were
carried out as part of the IELTS Speaking and Writing Revision
Projects to investigate the reliability of ratings (Shaw, 2004;
Taylor & Jones, 2001). More recent G-studies based on examiner
certification data showed coefficients of 0.81-0.89 for
Speaking and 0.80-0.83 for Writing
The IELTS exam contains four components upon
which an overall band score is awarded. Thus an estimate of
composite reliability offers a useful measure for overall test
reliability. Following Feldt & Brennan (1989), composite
reliability estimates were calculated based on test data from 2009.
To generate an appropriately cautious estimate, minimum alpha
values were used for the objectively marked papers, and
G-coefficients for the single rater condition on subjectively
marked scores. The composite reliability estimate for the Academic
module was 0.95 and produced a composite SEM of 0.22. For General
Training, the composite reliability was 0.96 with a SEM of 0.24. If
average rather than minimum values are used for the objective paper
alphas, composite reliability becomes 0.96 for both versions.
References
Feldt, LS & Brennan, RL (1989)
Reliability. In RL Linn (Ed), Educational measurement (3rd ed,
105-146). New York: Macmillan
Shaw, SD (2004) IELTS writing: revising
assessment criteria and scales (Phase 3). Research Notes 16,
3-7
Taylor, L & Jones, N (2001) Revising the
IELTS Speaking Test. Research Notes 4, 9-12