Psychometric Evidence for Using FSI Speaking Ratings in Indonesian Primary EFL Classrooms: Content Validity and Inter-Rater Reliability
Abstract
Reliable and valid speaking assessment is crucial for accurately interpreting young learners’ communicative competence in English as a Foreign Language (EFL). Although the Foreign Service Institute (FSI) Speaking Ratings are widely used in adult contexts, empirical evidence supporting their adaptation for primary school learners, particularly in Indonesia, remains limited. This study employed a quantitative psychometric validation design to examine the content validity and inter-rater reliability of an adapted FSI scale. Six expert validators (two media, two language, and two material experts), two trained raters, and 30 Grade V students from a public primary school participated. The scale was contextually modified to align with young learners’ characteristics while retaining its five domains. Students performed a 2–3 minute monologue based on visual prompts, which was video-recorded and independently scored. Content validity was assessed using Aiken’s V, and inter-rater reliability was analyzed using a two-way random-effects Intraclass Correlation Coefficient (ICC) with absolute agreement. Aiken’s V coefficients ranged from 0.50 to 1.00, with a mean of 0.87 across 54 indicators, indicating strong content validity. The ICC results demonstrated consistent scoring between raters, suggesting satisfactory inter-rater reliability. The findings provide initial psychometric support for the adapted FSI Speaking Ratings in primary EFL contexts, enhancing assessment objectivity and standardization. However, limitations include a small sample size, limited number of raters, single-site data, and the absence of construct validity analysis. Future studies should address these constraints to strengthen generalizability and validation.
Keywords
Full Text:
PDFReferences
Acosta-Banda, A., Aguilar-Esteva, V., Patiño Ortiz, M., & Patiño Ortiz, J. (2021). Construction and Validity of an Instrument to Evaluate Renewable Energies and Energy Sustainability Perceptions for Social Consciousness. Sustainability, 13(4), 2333. https://doi.org/10.3390/su13042333
Arafah, B., Room, F., Suryadi, R., B., L. O. M. I. H., Juniardi, Y., & Takwa. (2023). Character Education Values in Pullman’s The Golden Compass. Journal of Language Teaching and Research, 15(1), 246–254. https://doi.org/10.17507/jltr.1501.27
Arsyad, Moh. A., & Suadiyatno, T. (2024). Differentiated Assessment In EFL Classroom in Indonesia: Prospects and Challenges. Journal of Language and Literature Studies, 4(2), 516–523. https://doi.org/10.36312/jolls.v4i2.1913
Chen, Z. (2025). What shapes communicative adequacy in second language speaking performance? The contributions of complexity, accuracy, fluency, and pronunciation. Vigo International Journal of Applied Linguistics, (22). https://doi.org/10.35869/vial.v0i22.4882
Đorđević, J. (2025). Rubrics in the Assessment of EAP Speaking Skills Supported by Mobile Assisted Language Learning. ESP Today, 13(1), 91–112. https://doi.org/10.18485/esptoday.2025.13.1.5
Estrada Ramos, A. J., & Hernández Alipi, M. de los Á. (2025). Diferencias en la Experiencia de Aprendizaje del Inglés entre Niños y Adultos del CELE-UJAT. Ciencia Latina Revista Científica Multidisciplinar, 8(6), 6227–6244. https://doi.org/10.37811/cl_rcm.v8i6.15318
Gao, J., & Sun, P. P. (2025). Unveiling the Relationship Between L2 Utterance Fluency and Perceived Fluency in Monologic and Dialogic Speaking. Language and Speech. https://doi.org/10.1177/00238309251352105
Gultom, C., Sihombing, R., & Harahap, S. H. (2024). Evaluasi Kemahiran Komunikasi Lisan Dalam Pembelajaran Bahasa Indonesia. IJEDR: Indonesian Journal of Education and Development Research, 2(1), 445–448. https://doi.org/10.57235/ijedr.v2i1.1801
Halim, N., Kasim, N. A., & Pratiwi, D. F. (2025). DEVELOPING SPEAKING PROFICIENCY IN INDONESIAN EFL CLASSROOMS: A QUALITATIVE STUDY ON CHALLENGES AND SOLUTIONS. THE ACADEMIC: ENGLISH LANGUAGE LEARNING JOURNAL, 10(1), 9–18. https://doi.org/10.52208/aellj.v10i1.1384
Huang, B. H., Bailey, A. L., Sass, D. A., & Shawn Chang, Y. (2021). An investigation of the validity of a speaking assessment for adolescent English language learners. Language Testing, 38(3), 401–428. https://doi.org/10.1177/0265532220925731
Ikrima Maulida, Lestari, E., Kumala Sari, C., & Safutri, L. W. (2025). Formative Assessment as an Evaluation Tool for Elementary Students’ Speaking Skills in Indonesian Language Learning: A Descriptive Qualitative Study. Journal of Mathematics Instruction, Social Research and Opinion, 4(3), 769–782. https://doi.org/10.58421/misro.v4i3.610
Istihari, I., Juniardi, Y., Sofiah, V., & Abidin, Y. (2025). Text Complexity in An Indonesian EFL Textbook: Is it Aligned with the Emancipated Curriculum Goals? Journal of English Language Studies, 10(1), 82. https://doi.org/10.30870/jels.v10i1.29104
Joo, D., & Lee, J. (2022). Validation of the L2 Speaking Performance Assessment for Young EFL Learners: Using Many-facet Rasch Measurement. Korean Journal of Applied Linguistics, 38(3), 31–56. https://doi.org/10.17154/kjal.2022.9.38.3.31
Jung Youn, S. (2023). Test design and validity evidence of interactive speaking assessment in the era of emerging technologies. Language Testing, 40(1), 54–60. https://doi.org/10.1177/02655322221126606
Kaharuddin, K., Arafah, B., Nurpahmi, S., Sukmawaty, S., Rahman, I. F., & Juniardi, Y. (2023). Exploring How Reading Aloud and Vocabulary Enrichment Shape English Speaking Skills Among Indonesian Learners of English. World Journal of English Language, 13(8), 436. https://doi.org/10.5430/wjel.v13n8p436
Lane, S., & Marion, S. F. (2025). Validity Argumentation for Culturally Responsive Assessments 1. In Culturally Responsive Assessment in Classrooms and Large-Scale Contexts (pp. 106–123). Routledge. https://doi.org/10.4324/9781003392217-8
Lauwaert, P. (2023). On Validity. Studies in Applied Linguistics and TESOL, 23(1). https://doi.org/10.52214/salt.v23i1.11804
Liao, M.-H. (2025). Cultivating proficient and efficacious L2 English speakers via VoiceThread-mediated self- and peer assessments. Humanities and Social Sciences Communications, 12(1), 1277. https://doi.org/10.1057/s41599-025-05674-2
Mercado Borja, W. E., & Barrera Navarro, J. R. (2023). Diseño, construcción y validación de un instrumento que evalúa acciones innovadoras mediadas con TIC. Sophia, 19(2). https://doi.org/10.18634/sophiaj.19v.2i.1287
Milano, N., Ponticorvo, M., & Marocco, D. (2026). Human Expertise and Large Language Model Embeddings in the Content Validity Assessment of Personality Tests. Educational and Psychological Measurement, 86(1), 30–53. https://doi.org/10.1177/00131644251355485
Mister, B. (2025). Enhancing Adult ESL Learners’ Vocabulary Use through Pronunciation-Focused Discussion. Teaching English as a Second or Foreign Language--TESL-EJ, 29(2). https://doi.org/10.55593/ej.29114a4
Mulyanah, E. Y., Arwen, D., Ishak, Muhyidin, A., Nulhakim, L., Jamaludin, U., & Kumala, S. A. (2025). The Impact of E-Visual English Instructions Prototype, Local Wisdom, and Spiritual Values on Pre-Teachers’ Perceptions and English Teaching Effectiveness. Theory and Practice in Language Studies, 15(9), 3124–3135. https://doi.org/10.17507/tpls.1509.35
Mulyanah, E. Y., Juniardi, Y., & Nulhakim, L. (2025a). Revitalizing the Name Euis in Sundanese Culture Integrated E-Visual English Instructions (EUIS) in Implementing the Merdeka Curriculum. 3rd ISOLLEAC (International Seminar on Language, Literature, Educayion, Arts, and Culture) Available, 1–10. https://doi.org/http://dx.doi.org/10.62870/aiselt.v10i2.37106
Mulyanah, E. Y., Juniardi, Y., & Nulhakim, L. (2025b). The Effectiveness of EUIS (E-Visual English Instructions) in Enhancing Primary School Teachers’ English Skills. PROCEEDING AISELT (Annual International Seminar on English Language Teaching), 385–392. https://doi.org/10.62870/aiselt.v10i1.36882
Neiriz, R. (2023). Developing and evaluating a contextualized interactional competence rating scale based on a metaphorical conceptualization. Journal of Second Language Studies, 6(1), 61–94. https://doi.org/10.1075/jsls.22003.nei
Nguyen, C. D. (2021). The construction of age-appropriate pedagogies for young learners of English in primary schools. The Language Learning Journal, 49(1), 13–26. https://doi.org/10.1080/09571736.2018.1451912
Nulhakim, L., Wibawa, B., & Erwin, T. N. (2019). Relationship between students’ multiple intelligence-based instructional areas and assessment on academic achievements. Journal of Physics: Conference Series, 1188(1). https://doi.org/10.1088/1742-6596/1188/1/012086
Ölmezer-Öztürk, E., & Aydin, B. (2018). Toward measuring language teachers’ assessment knowledge: development and validation of Language Assessment Knowledge Scale (LAKS). Language Testing in Asia, 8(1), 20. https://doi.org/10.1186/s40468-018-0075-2
Putri, A., & Sya, M. F. (2023). Tantangan Berbicara Bahasa Inggris pada Siswa Sekolah Dasar. Karimah Tauhid, 2(2), 510–516. https://doi.org/10.30997/karimahtauhid.v2i2.7850
Quesada Pacheco, A. G. (2023). Assessment of Young English-Language Learners. Revista de Lenguas Modernas, (36). https://doi.org/10.15517/rlm.v0i36.48313
Raymond, J., Dai, D. W., & McAllister, S. (2025). The interpretation-use argument– the essential ingredient for high quality assessment design and validation. Advances in Health Sciences Education, 30(4), 1313–1332. https://doi.org/10.1007/s10459-024-10392-6
Rima, R., Juniardi, Y., & Syafrizal, S. (2025). Assessing Self-Regulated Learning of Undergraduate EFL Students: Instrument Development and Validation. International Journal of Social Learning (IJSL), 5(2), 396–411. https://doi.org/10.47134/ijsl.v5i2.387
Schames Kreitchmann, R., Nájera, P., Sanz, S., & Sorrel, M. Á. (2024). Enhancing Content Validity Assessment With Item Response Theory Modeling. Psicothema, 36(2), 145–153. https://doi.org/10.7334/psicothema2023.208
Shella Gherina Saptiany, & Bayu Ade Prabowo. (2024). Speaking Proficiency Among English Specific Purpose Students: A Literature Review On Assessment And Pedagogical Approaches. LITERACY : International Scientific Journals of Social, Education, Humanities, 3(1), 36–48. https://doi.org/10.56910/literacy.v3i1.1392
Sinclair, J., & Lau, C. (2018). Initial assessment for K-12 English language support in six countries: revisiting the validity–reliability paradox. Language and Education, 32(3), 257–285. https://doi.org/10.1080/09500782.2018.1430825
Speyer, R., Hakkarainen, A., Yoon, S., Kim, J.-H., Windsor, C., Wilkes Gillan, S., Littlefair, D., & Cordier, R. (2024). Content validity of measures in early numeracy in children up to eight years: A COSMIN systematic review. PLOS ONE, 19(9), e0308874. https://doi.org/10.1371/journal.pone.0308874
Spoto, A. (2025). Supplemental Material for Improving Content Validity Evaluation of Assessment Instruments Through Formal Content Validity Analysis. Psychological Methods. https://doi.org/10.1037/met0000545.supp
Sultana Shaik, S. (2024). DEVELOPING COMMUNICATION PROFICIENCY: A MULTIDIMENSIONAL ANALYSIS OF LANGUAGE COMPETENCIES. International Journal of Advanced Research, 12(05), 1144–1151. https://doi.org/10.21474/IJAR01/18829
Suzuki, S., & Kormos, J. (2025). L2 fluency across tasks: disentangling demands on conceptualisation and formulation in speech production. International Review of Applied Linguistics in Language Teaching. https://doi.org/10.1515/iral-2024-0185
Vo, S. (2017). Assessing Foreign Language Students’ Spoken Proficiency: Stakeholder Perspectives on Assessment Innovation , By M. East. Language Assessment Quarterly, 14(1), 93–96. https://doi.org/10.1080/15434303.2016.1262378
Widiastuti, O. (2025). Developing Assessment in Indonesian EFL Speaking Classroom. Issues in Applied Linguistics & Language Teaching, 7(1), 305–317. https://doi.org/10.37253/iallteach.v7i1.10400
Zhang, X., & Lu, X. (2025). Aligning linguistic complexity with the difficulty of English texts for L2 learners based on CEFR levels. Studies in Second Language Acquisition, 47(5), 1407–1434. https://doi.org/10.1017/S0272263125101125
Zhang, Y., & Zhang, L. J. (2024). Developing and validating an L2 writing willingness to communicate scale: A sequential embedded mixed-methods approach. Language Teaching Research. https://doi.org/10.1177/13621688241279834
Заирова, Н. (2023). Assessment methods for evaluating communication skills in english language learners. Ренессанс в Парадигме Новаций Образования и Технологий в XXI Веке, 1(1), 460–464. https://doi.org/10.47689/XXIA-TTIPR-vol1-iss1-pp460-464
DOI: https://doi.org/10.35445/alishlah.v18i1.9642
Refbacks
- There are currently no refbacks.
Copyright (c) 2026 Euis Yanah Mulyanah, Yudi Juniardi, Lukman Nulhakim
Al-Ishlah Jurnal Pendidikan Abstracted/Indexed by:

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


.png)




