IELTS - Home > Researchers > Analysis of test data > Test performance 2011
 
IELTS | Researchers - Test performance 2011

 

Each year, multiple versions of each of the six IELTS components (Listening, Academic Reading, General Training Reading, Academic Writing, General Training Writing, and Speaking) are released for use by centres testing IELTS candidates. Reliability estimates for the objectively and subjectively scored modules used in 2011 are reported below.

 

Reliability of objectively-scored components (Reading and Listening)

The reliability of Listening and Reading tests is reported using Cronbach's alpha, a reliability estimate which measures the internal consistency of the 40-item test. The following Listening and Reading material released in 2011 had sufficient candidate responses to estimate and report meaningful reliability values as follows:

 

Component

 

Alpha

Listening version

535

0.90

Listening version

536

0.90

Listening version

537

0.92

Listening version

538

0.89

Listening version

539

0.91

Listening version

540

0.91

Listening version

541

0.92

Listening version

542

0.91

Listening version

543

0.89

Listening version

544

0.89

Listening version

545

0.92

Listening version

546

0.89

Listening version

547

0.89

Listening version

548

0.91

Listening version

549

0.93

Listening version

550

0.92

Average Alpha across versions

 

0.91

 

 

Component

 

Alpha

General Training reading version

401

0.92

General Training reading version

402

0.92

General Training reading version

403

0.91

General Training reading version

404

0.90

General Training reading version

405

0.92

General Training reading version

406

0.92

General Training reading version

407

0.92

General Training reading version

408

0.93

General Training reading version

409

0.94

General Training reading version

410

0.92

General Training reading version

411

0.94

General Training reading version

412

0.92

General Training reading version

413

0.92

General Training reading version

414

0.92

General Training reading version

415

0.91

General Training reading version

416

0.94

Average Alpha across versions

 

0.92

 

 

Component

 

Alpha

Academic reading version

535

0.93

Academic reading version

536

0.88

Academic reading version

537

0.90

Academic reading version

538

0.88

Academic reading version

539

0.89

Academic reading version

540

0.89

Academic reading version

541

0.91

Academic reading version

542

0.90

Academic reading version

543

0.90

Academic reading version

544

0.89

Academic reading version

545

0.91

Academic reading version

546

0.91

Academic reading version

547

0.91

Academic reading version

548

0.90

Academic reading version

549

0.89

Academic reading version

550

0.89

Average Alpha across versions

 

0.90

 

 

The figures reported for Listening and Reading modules indicate the expected levels of reliability for tests containing 40 items. On the basis of these reliability figures, an estimate of the standard error of measurement (SEM) may be calculated for these modules using the following formula:

 

Research formula

 

st is the standard deviation of the test

 

rxx' is the reliability of the test

 

 

Table 1 Mean, standard deviation and standard error of measurement of Listening and Reading (2011)

 

Mean

St deviation

SEM

Listening

6.1

1.3

0.390

Academic Reading

5.9

1.2

0.379

General Training Reading

5.7

1.5

0.424

 

The SEM should be interpreted in terms of the final band scores reported for Listening and Reading components(which are reported in half-bands).


 

Reliability of subjectively-scored components (Writing and Speaking)

The reliability of the Writing and Speaking components cannot be reported in the same manner as for Reading/Listening because they are not item-based; candidates' writing and speaking performances are rated by trained and standardised examiners according to detailed descriptive criteria and rating scales. The assessment criteria used for rating Writing and Speaking performance are described in the IELTS 2006 Handbook.

 

Benchmarked example writing performances and CD-based speaking performances at different levels can be found, along with examiner comments, in the IELTS official practice materials which can be ordered from the IELTS website. User-oriented band descriptors describing levels of Writing and Speaking performance are also available on the website. In addition, the “IELTS Scores Explained” DVD provides information specifically tailored to organizations wanting a detailed description of IELTS scores. This information helps in setting appropriate standards of English proficiency. Click here for more information.


 

Reliability of rating is assured through the face-to-face training and certification of examiners and all must undergo a retraining and recertification process every two years. A Professional Support Network (PSN) manages and standardizes the examiner cadre, including face to face examiner monitoring as well as distance monitoring (using recordings of the Speaking tests). A ‘jagged profile’ system maintains a further check on the global reliability of IELTS performance assessment. Routine targeted double marking identifies the level of divergence (i.e., jagged profile) between Writing and/or Speaking scores and Reading and Listening scores. This process allows for the identification of possible misclassified candidates. The jagged profile system is also combined with ‘Targeted sample monitoring’ to further identify possible faulty ratings by examiners. Selected centres worldwide are required to provide a sample of examiners' marked tapes and scripts. Tapes and scripts are then second-marked by a team of IELTS Principal Examiners and assistant Principal Examiners. Principal Examiners monitor for quality of both test conduct and rating, and feedback is returned to each test centre. The outcomes that emerge from these reliability measures feed back into examiner retraining and continually build on quality management and assurance systems for IELTS.


 

Experimental generalisability studies were carried out as part of the IELTS Speaking and Writing Revision Projects to investigate the reliability of ratings (Shaw, 2004; Taylor & Jones, 2001). More recent G-studies based on examiner certification data showed coefficients of 0.83-0.86 for Speaking and 0.81-0.89 for Writing.


 

The IELTS exam contains four components upon which an overall band score is awarded. Thus an estimate of composite reliability offers a useful measure for overall test reliability. Following Feldt & Brennan (1989), composite reliability estimates were calculated based on test data from 2009. To generate an appropriately cautious estimate, minimum alpha values were used for the objectively marked papers, and G-coefficients for the single rater condition on subjectively marked scores. The composite reliability estimate for the Academic module was 0.96 and produced a composite SEM of 0.22. For General Training, the composite reliability was 0.96 with a SEM of 0.23. If average rather than minimum values are used for the objective paper alphas, composite reliability becomes 0.96 for both versions.


 

References

Feldt, LS & Brennan, RL (1989) Reliability. In RL Linn (Ed), Educational measurement (3rd ed, 105-146). New York: Macmillan.

Shaw, SD (2004) IELTS writing: revising assessment criteria and scales (Phase 3). Research Notes 16, 3-7.

Taylor, L & Jones, N (2001) Revising the IELTS Speaking Test. Research Notes 4, 9-12.

Disclaimer | Legal | Copyright Notice | Privacy Policy