The Chat GPT Develops Multiple Choice Questions (MCQs) for Postgraduate Specialty Assessment – A Reality or a Myth?


  • Faridah Amir Ali Department of Neurosurgery, Liaquat National Hospital and Medical College, Karachi
  • Salman Yousuf Sharif Liaquat National Hospital & Medical College
  • Madiha Ata Department of Family Medicine, Indus University of Health Sciences, Karachi
  • Nirali Patel Department of Neurosurgery, Children’s National Medical Center, Washington DC, USA
  • Muhammad Rafay Department of Neurosurgery, Liaquat National Hospital and Medical College, Karachi
  • Hasan Raza Syed Department of Neurosurgery, Children’s National Medical Center, Washington DC, USA
  • Saima Perwaiz Iqbal Department of Family Medicine, Shifa College of Medicine, Islamabad, Pakistan



Objective:  Multiple Choice Questions (MCQs) are a valuable assessment tool, but creating them to match learning goals needs experts. AI, like ChatGPT, might offer an alternative. A study showed MCQs made for medical programs by ChatGPT and the faculty. This study compares faculty-made MCQs to ChatGPT-made ones for a post-grad program.

Material & Methods:  Specific learning objectives of a module from a medical and surgical program were extracted. One mid-level faculty and the AI software developed MCQ from each learning objective with a clinical scenario. Two subject and medical education experts from each specialty were blinded and given a standardized online tool to rate the technical and content quality of the MCQs in five domains; the item, vignette, question stem, response options, and overall quality.

Results:  For the medicine and allied specialty, 23 MCQs in each set were assessed. There was no significant difference between each variable, the overall quality of MCQs, or the odds of the decision to accept the questionnaire. Two sets of 24 MCQs were assessed for the surgical and allied specialty. There was no difference between the domains for “Item” and “Vignette”. For the domain “question stem”, MCQs developed by faculty were more grammatically correct (p-value 0.02). There was no difference in the quality or odds of the decision to accept.

Conclusions:  AI's impact on education is undeniable. Our findings indicate that in specific areas, faculty outperformed ChatGPT, though overall question quality was comparable. More research is necessary, but ChatGPT could potentially streamline assessment development, saving faculty substantial time.


Tabatabai S. COVID-19 impact and virtual medical education. J Adv Med Educ Prof. 2020 Jul;8(3):140-143. doi: 10.30476/jamp.2020.86070.1213. PMID: 32802908.

Latif MZ, Wajid G. Reforming Medical Education in Pakistan through strengthening Departments of Medical Education. Pak J Med Sci. 2018;34(6):1439-1444. doi:10.12669/pjms.346.15942.

Nassar AK, Waheed A, Tuma F. Academic Clinicians' Workload Challenges and Burnout Analysis. Cureus. 2019;11(11):e6108. doi:10.7759/cureus.6108.

Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. PMID: 36812645.

Cheung BHH, Lau GKK, Wong GTC, Lee EYP, Kulkarni D, Seow CS, et al. ChatGPT versus human in generating medical graduate exam questions–A multinational prospective study. PLOS ONE. 2023;18(8):e0290691. doi:10.1371/journal.pone.0290691.

Reniers, J. Writing Fair and Effective MCQs - Checklist. Office of Teaching and Learning - University of Guelph. Updated: February 2020. Accessed August 2023.

Grace K, Salvatier J, Dafoe A, Zhang B, Evans O. When will AI exceed human performance? Evidence from AI experts. Journal of Artificial Intelligence Research. 2018;62:729-54. doi:10.1613/jair.1.11222.

Nasution N. Using artificial intelligence to create biology multiple choice questions for higher education. Agricultural and Environmental Education. 2023;2(1):em002. doi: 1029333/agrenvedu/13071.

Vinu EV, Kumar S. A novel approach to generate MCQs from domain ontology: Considering DL semantics and open-world assumption. J Web Semant. 2015;34:40-54. doi:10.1016/j.websem.2015.05.005.

Kambur E. Emotional Intelligence or Artificial Intelligence?: Emotional Artificial Intelligence. Florya Chronicles of Political Economy. 2021;7(2):147-68. doi:10.17932/IAU.FCPE.2015.010/fcpe_v07i2004.

Parikh RB, Teeple S, Navathe AS. Addressing Bias in Artificial Intelligence in Health Care. JAMA. 2019;322(24):2377-2378. doi:10.1001/jama.2019.18058.

Fitria TN. Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay. ELT Forum: Journal of English Language Teaching. 2023;12(1):44-58. doi:10.15294/elt.v12i1.64069.






Original Articles