Artificial intelligence-simplified information to advance reproductive genetic literacy and health equity

The paper introduced this time focuses on the application of artificial intelligence and large language models in the field of reproductive genetics. By comparing the simplification effects of four mainstream models on patient educational materials, it explores feasible paths to enhance the popularization of reproductive genetics knowledge and medical equity. This research provides a new direction for addressing the imbalance in service utilization rate caused by complex information in the current reproductive genetics field, and is of great significance for promoting the improvement of health literacy and the rational utilization of medical resources.

Research Background

Reproductive genetic testing and counseling are intended to reduce the burden of genetic diseases through personalized plans, but their practical application is limited by two core issues: the complexity of testing technologies themselves and the unreasonable design of patient educational materials (PEMs). Existing PEMs are often overly specialized, exceeding the comprehension ability of the general population. This leads some patients to refuse necessary tests due to insufficient information, while others overuse technologies lacking clear evidential support. This imbalance is particularly pronounced in low- and middle-income regions, where child mortality related to congenital diseases remains high, and the low utilization rate of reproductive genetic testing exacerbates this predicament. As online health information becomes a primary reference for pregnant women, the need to simplify PEMs to improve accessibility and promote health equity has become increasingly urgent.

Model Introduction

The study selected four mainstream large language models (LLMs) for comparison. GPT-3.5, as an early popularized model, represents basic language processing capabilities; GPT-4, its upgraded version, is renowned for stronger reasoning abilities and excels at handling complex tasks; Copilot performs prominently in generating medical content, especially in transforming professional information; Gemini, on the other hand, has advantages in creating structured educational materials, capable of producing clear and understandable medical content. These models processed texts using unified prompts to ensure standardization of the simplification task, ultimately generating 120 simplified versions based on 30 original PEMs.

Experimental Design

The study adopted a comparative observational design, with the core process divided into three parts. First, text screening: 30 PEMs were selected from authoritative platforms such as the WHO and Johns Hopkins Medicine, covering 6 key topics including reproductive genetic counseling, first-trimester screening, and amniocentesis. Each topic included texts from different sources to ensure diversity, all targeting the general population. Second, model processing: the four LLMs simplified these texts based on the same prompts. Third, evaluation: on one hand, quantitative indicators were used to analyze text readability; on the other hand, 30 reproductive genetics experts assessed clinical reliability. Additionally, the study developed an open-access graphical user interface supporting real-time text simplification and readability analysis, facilitating clinical application.

Indicator Introduction

The evaluation system includes three dimensions, with each indicator designed to comprehensively reflect the simplification effect.

Readability indicators: Five validated tools were used to measure text comprehensibility from different perspectives. For example, the Flesch Reading Ease formula evaluates overall fluency through sentence length and syllable count; the Gunning Fog Index focuses on the proportion of long sentences and complex words to reflect text “obscurity”; the SMOG Index estimates the educational level required for comprehension based on the proportion of polysyllabic words; the Coleman–Liau Index and Linsear Write formula supplement the assessment from the perspectives of letter-word structure and the balance between short and long words, collectively providing a comprehensive measure of text complexity.

Text feature indicators: These focus on structural changes before and after simplification, including word count (reflecting conciseness), proportion of complex words (measuring vocabulary difficulty), proportion of long sentences (indicating sentence structure complexity), total number of sentences (related to the rationality of sentence splitting), and proportion of passive voice (affecting directness of expression), thereby analyzing the models’ simplification logic.

Expert evaluation indicators: Professionals scored from three core dimensions: “accuracy” (focusing on consistency between simplified texts and originals), “completeness” (measuring retention of key medical information), and “relevance of omission” (judging whether omitted content is non-essential). Together, these ensure simplification does not sacrifice clinical value.

Research Results

All LLMs significantly improved PEM readability, reducing texts that originally required a higher educational level to a more accessible reading level. Among them, Copilot and Gemini were particularly effective in enhancing readability, substantially reducing text language complexity. However, in terms of clinical reliability, GPT-4 performed best, receiving high expert recognition in accuracy, completeness, and screening of non-essential information. In contrast, some models, despite significantly improving readability, omitted key information due to over-simplification, resulting in lower expert scores. This result highlights the critical balance between “simplification” and “accuracy”—texts must be understandable without losing core medical content.

Research Innovations

The study achieved several breakthroughs in reproductive genetics: it was the first to systematically compare the performance of multiple mainstream LLMs in simplifying PEMs, filling a research gap in this field; through large-sample expert evaluation (30 experts scoring 120 texts), it provided solid verification for the clinical reliability of LLM-generated content; the developed open-access graphical user interface, integrating text simplification and readability analysis functions, offers a practical tool for clinical practice; more importantly, it confirmed the potential of LLMs in balancing readability and content completeness, providing a new method for improving health literacy.

Research Limitations

The study has certain limitations: there is a risk of key information being omitted during simplification, requiring vigilance against misleading due to over-simplification; the evaluation mainly relies on expert judgment, lacking feedback from real patients, making it difficult to directly verify the actual impact of simplified texts on patient decisions; the current research materials are primarily in English, and applicability to other languages has not been verified, which may limit its use in multilingual regions; additionally, some original PEMs may have been included in LLMs’ training data, introducing potential familiarity bias that affects the objectivity of results.

Clinical Significance and Prospects

The study provides important insights for reproductive medical practice: LLMs can serve as valuable assistants to healthcare providers, quickly generating easily understandable PEMs, helping populations with low health literacy grasp reproductive genetic testing information, reducing overuse or abandonment of tests due to cognitive deficiencies, and thereby promoting health equity. Future research should incorporate patient feedback to verify the impact of simplified texts on actual decision-making; expand to more languages to address information accessibility issues for non-English speakers; continuously optimize LLMs to enhance their simplification capabilities while retaining key information; and establish strict human supervision mechanisms to ensure the safety of technical applications, ultimately achieving synergistic improvement in technological progress and medical quality.

reference

Marjan Naghdi, Ping Cao, Rick Essers, Malou Heijligers, Aimee D C Paulussen, Arie van der Lugt, Robert A C Ruiter, Wendy A G van Zelst-Stams, Andres Salumets, Masoud Zamani Esteki, Artificial intelligence-simplified information to advance reproductive genetic literacy and health equity, Human Reproduction, 2025