IELTS - Home > Researchers > Analysis of test data > Test performance 2010
 
IELTS | Researchers - Test performance 2010

 

Each year, multiple versions of each of the six IELTS modules (Listening, Academic Reading, General Training Reading, Academic Writing, General Training Writing, and Speaking) are released for use by centres testing IELTS candidates. Reliability estimates for the objectively and subjectively scored modules used in 2010 are reported below.

 

Reliability of objectively-scored modules (Reading and Listening)

The reliability of Listening and Reading tests is reported using Cronbach's alpha, a reliability estimate which measures the internal consistency of the 40-item test. The following Listening and Reading material released in 2010 had sufficient candidate responses to estimate and report meaningful reliability values as follows:

 

Module (All Academic and General Training versions)

 

Alpha

Listening version

487

0.89

Listening version

488

0.91

Listening version

489

0.90

Listening version

490

0.91

Listening version

491

0.90

Listening version

492

0.90

Listening version

493

0.91

Listening version

494

0.90

Listening version

495

0.90

Listening version

496

0.89

Listening version

497

0.90

Listening version

498

0.89

Listening version

499

0.91

Listening version

500

0.92

Listening version

501

0.90

Listening version

502

0.89

Listening version

503

0.92

Listening version

504

0.92

Listening version

505

0.91

Listening version

506

0.89

Listening version

507

0.91

Listening version

508

0.91

Listening version

509

0.91

Listening version

510

0.91

Listening version

511

0.91

Listening version

512

0.92

Listening version

513

0.91

Listening version

514

0.92

Listening version

515

0.90

Listening version

516

0.92

Listening version

517

0.91

Listening version

518

0.93

Listening version

519

0.92

Listening version

520

0.90

Listening version

521

0.90

Listening version

522

0.91

Listening version

523

0.91

Listening version

524

0.89

Listening version

525

0.90

Listening version

526

0.91

Listening version

527

0.90

Listening version

528

0.90

Listening version

529

0.91

Listening version

530

0.91

Listening version

531

0.90

Listening version

532

0.91

Listening version

533

0.92

Listening version

534

0.87

Average Alpha across versions

 

0.91

 

 

Module

 

Alpha

General Training reading version

372

0.87

General Training reading version

373

0.90

General Training reading version

374

0.89

General Training reading version

375

0.90

General Training reading version

376

0.91

General Training reading version

377

0.91

General Training reading version

378

0.91

General Training reading version

379

0.91

General Training reading version

380

0.92

General Training reading version

381

0.92

General Training reading version

382

0.93

General Training reading version

383

0.91

General Training reading version

384

0.89

General Training reading version

385

0.92

General Training reading version

386

0.90

General Training reading version

387

0.92

General Training reading version

388

0.93

General Training reading version

389

0.92

General Training reading version

390

0.94

General Training reading version

391

0.91

General Training reading version

392

0.90

General Training reading version

393

0.92

General Training reading version

394

0.91

General Training reading version

395

0.92

Average Alpha across versions

 

0.91

 

 

Module

 

Alpha

Academic reading version

487

0.88

Academic reading version

488

0.90

Academic reading version

489

0.92

Academic reading version

490

0.90

Academic reading version

491

0.90

Academic reading version

492

0.92

Academic reading version

493

0.89

Academic reading version

494

0.87

Academic reading version

495

0.91

Academic reading version

496

0.88

Academic reading version

497

0.92

Academic reading version

498

0.91

Academic reading version

499

0.84

Academic reading version

500

0.91

Academic reading version

501

0.90

Academic reading version

502

0.89

Academic reading version

503

0.89

Academic reading version

504

0.89

Academic reading version

505

0.93

Academic reading version

506

0.91

Academic reading version

507

0.92

Academic reading version

508

0.85

Academic reading version

509

0.86

Academic reading version

510

0.90

Academic reading version

511

0.89

Academic reading version

512

0.90

Academic reading version

513

0.91

Academic reading version

514

0.91

Academic reading version

515

0.88

Academic reading version

516

0.92

Academic reading version

517

0.90

Academic reading version

518

0.91

Academic reading version

519

0.91

Academic reading version

520

0.91

Academic reading version

521

0.89

Academic reading version

522

0.92

Academic reading version

523

0.90

Academic reading version

524

0.90

Academic reading version

525

0.92

Academic reading version

526

0.91

Academic reading version

527

0.90

Academic reading version

528

0.89

Academic reading version

529

0.89

Academic reading version

530

0.91

Academic reading version

531

0.89

Academic reading version

532

0.89

Academic reading version

533

0.90

Academic reading version

534

0.90

Average Alpha across versions

 

0.90

 

 

The figures reported for Listening and Reading modules indicate the expected levels of reliability for tests containing 40 items. On the basis of these reliability figures, an estimate of the standard error of measurement (SEM) may be calculated for these modules using the following formula:

 

Research formula

 

st is the standard deviation of the test

 

rxx' is the reliability of the test

 

 

Table 1 Mean, standard deviation and standard error of measurement of Listening and Reading (2010)

 

Mean

St deviation

SEM

Listening

6.04

1.30

0.389

Academic Reading

5.97

1.21

0.382

General Training Reading

5.74

1.37

0.412

 

The SEM should be interpreted in terms of the final band scores reported for Listening and Reading modules (which are reported in half-bands).

 

Reliability of subjectively-scored modules (Writing and Speaking)

The reliability of the Writing and Speaking modules cannot be reported in the same manner as for Reading/Listening because they are not item-based; candidates' writing and speaking performances are rated by trained and standardised examiners according to detailed descriptive criteria and rating scales. The assessment criteria used for rating Writing and Speaking performance are described in the IELTS 2006 Handbook. Benchmarked example writing performances and CD-based speaking performances at different levels can be found, along with examiner comments, in the IELTS official practice materials which can be ordered from the IELTS website. User-oriented band descriptors describing levels of Writing and Speaking performance are also available on the website. In addition, the “IELTS Scores Explained” DVD provides information specifically tailored to organizations wanting a detailed description of IELTS scores. This information helps in setting appropriate standards of English proficiency. Click here for more information.

 

Reliability of rating is assured through the face-to-face training and certification of examiners and all must undergo a retraining and recertification process every two years. A Professional Support Network (PSN) manages and standardizes the examiner cadre, including face to face examiner monitoring as well as distance monitoring (using recordings of the Speaking tests). A ‘jagged profile’ system maintains a further check on the global reliability of IELTS performance assessment. Routine targeted double marking identifies the level of divergence (i.e., jagged profile) between Writing and/or Speaking scores and Reading and Listening scores. This process allows for the identification of possible misclassified candidates. The jagged profile system is also combined with ‘Targeted sample monitoring’ to further identify possible faulty ratings by examiners. Selected centres worldwide are required to provide a sample of examiners' marked tapes and scripts. Tapes and scripts are then second-marked by a team of IELTS Principal Examiners and assistant Principal Examiners. Principal Examiners monitor for quality of both test conduct and rating, and feedback is returned to each test centre. The outcomes that emerge from these reliability measures feed back into examiner retraining and continually build on quality management and assurance systems for IELTS.

 

Experimental generalisability studies were carried out as part of the IELTS Speaking and Writing Revision Projects to investigate the reliability of ratings (Shaw, 2004; Taylor & Jones, 2001). More recent G-studies based on examiner certification data showed coefficients of 0.81-0.89 for Speaking and 0.80-0.83 for Writing

The IELTS exam contains four components upon which an overall band score is awarded. Thus an estimate of composite reliability offers a useful measure for overall test reliability. Following Feldt & Brennan (1989), composite reliability estimates were calculated based on test data from 2009. To generate an appropriately cautious estimate, minimum alpha values were used for the objectively marked papers, and G-coefficients for the single rater condition on subjectively marked scores. The composite reliability estimate for the Academic module was 0.95 and produced a composite SEM of 0.22. For General Training, the composite reliability was 0.96 with a SEM of 0.24. If average rather than minimum values are used for the objective paper alphas, composite reliability becomes 0.96 for both versions.

 

References

Feldt, LS & Brennan, RL (1989) Reliability. In RL Linn (Ed), Educational measurement (3rd ed, 105-146). New York: Macmillan

Shaw, SD (2004) IELTS writing: revising assessment criteria and scales (Phase 3). Research Notes 16, 3-7

Taylor, L & Jones, N (2001) Revising the IELTS Speaking Test. Research Notes 4, 9-12

Disclaimer | Legal | Copyright Notice | Privacy Policy