Test performance

Each year, multiple versions of each of the six IELTS sections (Listening, Academic Reading, General Training Reading, Academic Writing, General Training Writing, and Speaking) are released for use by centres testing IELTS teat takers. Reliability estimates for the tests used in 2019 are reported below.

Reliability of Listening and Reading sections

The reliability of Listening and Reading tests is reported using Cronbach's alpha, a reliability estimate which measures the internal consistency of the 40-item test. The following Listening and Reading material released in 2019 had sufficient test taker responses to estimate and report meaningful reliability values as follows:

Section (All Academic and General Training tests)  Version  Alpha
 Listening version  19818  0.92
 Listening version  19819  0.88
 Listening version  19820  0.89
 Listening version  19821  0.90
 Listening version  19822  0.90
 Listening version  19823  0.91
 Listening version  19824  0.90
 Listening version  19825  0.92
 Listening version  19826  0.92
 Listening version  19827  0.91
 Listening version  19828  0.91
 Listening version  19829  0.91
 Listening version  19831  0.89
 Listening version  19832  0.90
 Listening version  19833 0.93
 Listening version  19834  0.92
 Average Alpha across versions    0.91

 

 Section Version  Alpha 
 General Training Reading   19667  0.92
 General Training Reading  19668  0.91
 General Training Reading  19670  0.92
 General Training Reading
 19671  0.92
 General Training Reading  19672  0.93
 General Training Reading  19673  0.90
 General Training Reading  19674  0.92
 General Training Reading  19676  0.92
 General Training Reading  19679  0.93
 General Training Reading  19680  0.92
 General Training Reading  19681  0.93
 General Training Reading  19682  0.93
 General Training Reading  19683  0.93
 General Training Reading  19684  0.92
 General Training Reading  19685  0.91
 General Training Reading  19689  0.92
 General Training Reading  19667  0.92
 General Training Reading  19668  0.91
 Average Alpha across versions    0.92

 

Section  Version  Alpha
 Academic Reading version  19852  0.92
 Academic Reading version  19853  0.91
 Academic Reading version  19854  0.92
 Academic Reading version  19855  0.92
 Academic Reading version  19856  0.90
 Academic Reading version  19857  0.93
 Academic Reading version  19858  0.90
 Academic Reading version  19879  0.90
 Academic Reading version  19880  0.91
 Academic Reading version  19878  0.91
 Academic Reading version  19877  0.91
 Academic Reading version  19873  0.91
 Academic Reading version  19874  0.89
 Academic Reading version  19872  0.92
 Academic Reading version  19871  0.92
 Academic Reading version  19869  0.91
 Academic Reading version  19870  0.91
 Average Alpha across versions    0.91

   

The figures reported for Listening and Reading sections indicate the expected levels of reliability for tests containing 40 items. On the basis of these reliability figures, an estimate of the standard error of measurement (SEM) may be calculated for these sections using the following formula:

Standard error of measurement formula

st is the standard deviation of the test

rxx' is the reliability of the test

Table 1 Mean, standard deviation and standard error of measurement of Listening and Reading (2019)

 Section  Mean  SD Alpha SEM
 Listening  6.43  1.30  0.91  0.39
 Academic Reading  6.2  1.21  0.91  0.37
 General Training Reading  6.41  1.41  0.92  0.43

The SEM should be interpreted in terms of the final band scores reported for Listening and Reading sections (which are reported in half-bands).

Reliability of Writing and Speaking sections

For more information about the assessment criteria used for rating Writing and Speaking performance, read about the IELTS scoring in detail. Benchmarked example writing performances and CD-based speaking performances at different levels can be found, along with examiner comments, in the IELTS official practice materials. In addition, you can order the “IELTS Scores Explained” DVD which provides information specifically tailored to organisations wanting a detailed description of IELTS scores. This information helps in setting appropriate standards of English proficiency.

Reliability of rating is assured through the face-to-face training and certification of examiners and all must undergo a retraining and recertification process every two years. A Professional Support Network (PSN) manages and standardises the examiner cadre, including face to face examiner monitoring as well as distance monitoring (using recordings of the Speaking tests). A ‘jagged profile’ system maintains a further check on the global reliability of IELTS performance assessment. Routine targeted double marking identifies the level of divergence (i.e., jagged profile) between Writing and/or Speaking scores and Reading and Listening scores. This process allows for the identification of possible misclassified test takers. The jagged profile system is also combined with ‘Targeted sample monitoring’ to further identify possible faulty ratings by examiners. Selected centres worldwide are required to provide a sample of examiners' marked tapes and scripts. Tapes and scripts are then second-marked by a team of IELTS Principal Examiners and assistant Principal Examiners. Principal Examiners monitor for quality of both test conduct and rating, and feedback is returned to each test centre. The outcomes that emerge from these reliability measures feed back into examiner retraining and continually build on quality management and assurance systems for IELTS.

Generalisability studies based on examiner certification data show coefficients of 0.83-0.86 for Speaking and 0.81-0.89 for Writing.

The IELTS test contains of four sections upon which an overall band score is awarded. Thus an estimate of composite reliability offers a useful measure for overall test reliability. Composite reliability estimates have been calculated following Feldt & Brennan (1989). To generate an appropriately cautious estimate, alpha values were used for the objectively marked papers, and G-coefficients for the single rater condition on subjectively marked scores. The composite reliability estimate for both the Academic and General Training tests was 0.96 and produced a composite SEM of 0.23. 

References
Feldt, LS & Brennan, RL (1989) Reliability. In RL Linn (Ed), Educational measurement (3rd ed, 105-146). New York: Macmillan
Shaw, SD (2004) IELTS writing: revising assessment criteria and scales (Phase 3). Research Notes 16, 3-7
Taylor, L & Jones, N (2001) Revising the IELTS Speaking Test. Research Notes 4, 9-12