Assessing the fairness of mathematical literacy test in Indonesia: Evidence from gender-based differential item function analysis
Kartianom Kartianom 1 2 * , Heri Retnawati 1, Kana Hidayati 1
More Detail
1 Yogyakarta State University, Indonesia
2 IAIN Bone, Indonesia
* Corresponding Author

Abstract

Conducting a fair test is important for educational research. Unfair assessments can lead to gender disparities in academic achievement, ultimately resulting in disparities in opportunities, wages, and career choice. Differential Item Function [DIF] analysis is presented to provide evidence of whether the test is truly fair, where it does not harm or benefit certain groups of students. For this reason, this study aims to assess the fairness of mathematics literacy tests from a gender perspective using three DIF analysis approaches, namely, the Cognitive Diagnostic Model [CDM], Classical Test Theory [CTT], and Item Response Theory [IRT], and to compare the results of the three approaches to examine the compatibility between them in identifying DIF effects. This study was included in quantitative descriptive research, and for the CDM approach, a retrofitting method (post-hoc analysis) was used. The sample in this study consists of Indonesian students who participated in the administration of PISA 2012 and were tested on Booklet 1, Booklet 3, Booklet 4, and Booklet 6. The Q-matrix used in this study consisted of 12 items and 11 attributes. The results of this study show that out of the 12 items analyzed, there are differences in findings between the CTT, IRT, and CDM approaches; the item with the largest DIF was found using the Raju Unsigned Area Measures method in IRT and the Wald Test from the CDM approach, while the item with the lowest DIF was found using the LRT method from the CDM approach; and there are three items that were simultaneously identified as DIF using the CTT, IRT, and CDM methods, namely PM923Q01, PM923Q03, and PM924Q02. Items PM923Q01 and PM923Q03 favor the group of male students, while item PM924Q02 favors the group of female students.

Keywords

References

  • Abedalaziz, N. (2010). A gender- related differential item functioning of mathematics test items. The International Journal of Educational and Psychological Assessment, 5, 101-116.
  • Akbay, L. (2021). Impact of retrofitting and item ordering on DIF. Journal of Measurement and Evaluation in Education and Psychology, 12(2), 212–225. https://doi.org/10.21031/epod.886920
  • Başman, M., & Kutlu, Ö. (2020). Identification of differential item functioning on mathematics achievement according to the interactions of gender and affective characteristics by Rasch Tree Method. International Journal of Progressive Education, 16(2), 205–217. https://doi.org/10.29329/ijpe.2020.241.14
  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the r environment. Journal of Statistical Software, 48(6). https://doi.org/10.18637/jss.v048.i06
  • De La Torre, J., & Chiu, C.-Y. (2016). A general method of empirical Q-matrix validation. Psychometrika, 81(2), 253–273. https://doi.org/10.1007/s11336-015-9467-8
  • Else-Quest, N. M., Hyde, J. S., & Linn, M. C. (2010). Cross-national patterns of gender differences in mathematics: A meta-analysis. Psychological Bulletin, 136(1), 103–127. https://doi.org/10.1037/a0018053
  • Eren, B., Gündüz, T., & Tan, Ş. (2023). Comparison of methods used in detection of DIF in cognitive diagnostic models with traditional methods: applications in TIMSS 2011. Journal of Measurement and Evaluation in Education and Psychology, 14(1), 76-94. https://doi.org/10.21031/epod.1218144
  • Freudenthal, H. (1972). Mathematics as an educational task. Springer. https://doi.org/10.1007/978-94-010-2903-2
  • George, A. C., Robitzsch, A., Kiefer, T., Groß, J., & Ünlü, A. (2016). The R Package CDM for cognitive diagnosis models. Journal of Statistical Software, 74(2), 1-24. https://doi.org/10.18637/jss.v074.i02
  • Gierl, M. J., Alves, C., & Majeau, R. T. (2010). Using the attribute hierarchy method to make diagnostic inferences about examinees’ knowledge and skills in mathematics: an operational implementation of cognitive diagnostic assessment. International Journal of Testing, 10(4), 318–341. https://doi.org/10.1080/15305058.2010.509554
  • Hou, L., La Torre, J. D., & Nandakumar, R. (2014). Differential item functioning assessment in cognitive diagnostic modeling: application of the wald test to investigate DIF in the DINA Model: Applying Wald test to investigate DIF in DINA model. Journal of Educational Measurement, 51(1), 98–125. https://doi.org/10.1111/jedm.12036
  • Hou, L., Terzi̇, R., & De La Torre, J. (2020). Wald test formulations in DIF detection of CDM data with the proportional reasoning test. International Journal of Assessment Tools in Education, 7(2), 145–158. https://doi.org/10.21449/ijate.689752
  • Kaiser, G., & Zhu, Y. (2022). Gender differences in mathematics achievement: A secondary analysis of Programme for International Student Assessment data from Shanghai. Asian Journal for Mathematics Education, 1(1), 115–130. https://doi.org/10.1177/27527263221091373
  • Kang, C., Yang, Y., & Zeng, P. (2019). Q-matrix refinement based on item fit statistic RMSEA. Applied Psychological Measurement, 43(7), 527–542. https://doi.org/10.1177/0146621618813104
  • Li, C., Ma, C., & Xu, G. (2020). Learning large Q-matrix by restricted Boltzmann machines (15424). arXiv. https://doi.org/10.48550/ARXIV.2006.15424
  • Li, T., & Traynor, A. (2022). The use of cognitive diagnostic modeling in the assessment of computational thinking. AERA Open, 8, 1256. https://doi.org/10.1177/23328584221081256
  • Liu, Y., Yin, H., Xin, T., Shao, L., & Yuan, L. (2019). A comparison of differential item functioning detection methods in cognitive diagnostic models. Frontiers in Psychology, 10, 1137. https://doi.org/10.3389/fpsyg.2019.01137
  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Routledge. https://doi.org/10.4324/9780203056615
  • Ma, W., & De La Torre, J. (2020). GDINA: An R package for cognitive diagnosis modeling. Journal of Statistical Software, 93(14), 1-26. https://doi.org/10.18637/jss.v093.i14
  • Ma, W., Terzi, R., & De La Torre, J. (2021). Detecting differential item functioning using multiple-group cognitive diagnosis models. Applied Psychological Measurement, 45(1), 37–53. https://doi.org/10.1177/0146621620965745
  • Magis, D., Béland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42(3), 847–862. https://doi.org/10.3758/BRM.42.3.847
  • Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22(4), 719-748. https://doi.org/10.1093/jnci/22.4.719
  • Mehrazmay, R., Ghonsooly, B., & De La Torre, J. (2021). Detecting differential item functioning using cognitive diagnosis models: applications of the Wald test and likelihood ratio test in a university entrance examination. Applied Measurement in Education, 34(4), 262–284. https://doi.org/10.1080/08957347.2021.1987906
  • Moradi, Y., Baradaran, H., & Khamseh, M. E. (2016). Psychometric properties of the Iranian version of the diabetes numeracy Test-15. International Journal of Preventive Medicine, 7, 43. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4809128/
  • Niu, T. (2022). The Impact of Gender Difference on Major Selections of Chinese College Students. In A. Holl, J. Chen, & G. Guan (Eds.), Proceedings of the 2022 5th International Conference on Humanities Education and Social Sciences (ICHESS 2022) (pp. 216–224). Atlantis Press SARL. https://doi.org/10.2991/978-2-494069-89-3_25
  • OECD. (2018). PISA for development assessment and analytical framework: reading, mathematics and science. OECD. https://doi.org/10.1787/9789264305274-en
  • OECD. (2023). PISA 2022 Results (Volume I): The state of learning and equity in education. OECD. https://doi.org/10.1787/53f23881-en
  • Ong, Y. M., Williams, J., & Lamprianou, I. (2015). Exploring crossing differential item functioning by gender in mathematics assessment. International Journal of Testing, 15(4), 337–355. https://doi.org/10.1080/15305058.2015.1057639
  • Paulsen, J., Svetina, D., Feng, Y., & Valdivia, M. (2020). Examining the impact of differential item functioning on classification accuracy in cognitive diagnostic models. Applied Psychological Measurement, 44(4), 267–281. https://doi.org/10.1177/0146621619858675
  • Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495–502. https://doi.org/10.1007/BF02294403
  • Ravand, H., & Baghaei, P. (2020). Diagnostic classification models: recent developments, practical issues, and prospects. International Journal of Testing, 20(1), 24–56. https://doi.org/10.1080/15305058.2019.1588278
  • Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. Guilford Press.
  • Rutkowski, L., & Rutkowski, D. (2016). A call for a more measured approach to reporting and interpreting PISA results. Educational Researcher, 45(4), 252–257. https://doi.org/10.3102/0013189X16649961
  • Sanchis-Segura, C., Aguirre, N., Cruz-Gómez, Á. J., Solozano, N., & Forn, C. (2018). Do gender-related stereotypes affect spatial performance? Exploring when, how and to whom using a chronometric two-choice mental rotation task. Frontiers in Psychology, 9, 1261. https://doi.org/10.3389/fpsyg.2018.01261
  • Shanmugam, S. K. S. (2018). Determining Gender Differential Item Functioning for Mathematics in Coeducational School Culture. Malaysian Journal of Learning and Instruction, 15(2), 83–109. https://doi.org/10.32890/mjli2018.15.2.4
  • Svetina, D., Dai, S., & Wang, X. (2017). Use of cognitive diagnostic model to study differential item functioning in accommodations. Behaviormetrika, 44(2), 313–349. https://doi.org/10.1007/s41237-017-0021-0
  • Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
  • Terzi, R., & Sen, S. (2019). A nondiagnostic assessment for diagnostic purposes: q-matrix validation and item-based model fit evaluation for the TIMSS 2011 assessment. Sage Open, 9(1), 215824401983268. https://doi.org/10.1177/2158244019832684
  • Wang, W., Song, L., Ding, S., Meng, Y., Cao, C., & Jie, Y. (2018). An EM-Based method for Q-matrix validation. Applied Psychological Measurement, 42(6), 446–459. https://doi.org/10.1177/0146621617752991
  • Wu, X., Wu, R., Chang, H.-H., Kong, Q., & Zhang, Y. (2020). International comparative study on pisa mathematics achievement test based on cognitive diagnostic models. Frontiers in Psychology, 11, 2230. https://doi.org/10.3389/fpsyg.2020.02230
  • Yildirim, O. (2019). Detecting gender differences in PISA 2012 mathematics test with differential item functioning. International Education Studies, 12(8), 59. https://doi.org/10.5539/ies.v12n8p59

License

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.