Can AI assist embryologists when only low-quality embryos are available for selection?

This paper focuses on the practical challenge of evaluating poor-quality embryos in the field of assisted reproduction: how to select the embryo with the highest implantation potential when only poor-quality embryos are available for transfer. The study designs experiments targeting clinical pain points, verifies the value of AI in this specific scenario, provides an objective basis for precise clinical decision-making when “no high-quality embryos are available,” and lays a research foundation for the application of AI in specialized scenarios of assisted reproduction.

Research Background

In assisted reproductive technology (ART), embryo quality assessment is a core link determining implantation success. Traditional morphological assessment methods rely on the subjective judgment of embryologists, with significant inter- and intra-observer variability, and evaluation is particularly challenging for poor-quality embryos. Studies have shown that poor-quality embryos still have a 10%-30% implantation pregnancy rate, but traditional grading systems struggle to accurately identify those with genuine pregnancy potential. The combination of time-lapse imaging (TLS) technology and artificial intelligence (AI) has provided new tools for embryo assessment, but previous studies have mostly focused on the overall evaluation of random embryos, rarely targeting the common clinical scenario of “only poor-quality embryos available for transfer.” This study aims to explore the value of the AI system EMBRYOLY in evaluating poor-quality embryos, filling the gap in this field.

Research Design

The study data were collected from 3,214 oocyte retrieval cycles across 15 reproductive centers in 4 countries between 2019 and 2024, involving 15,767 embryos, covering 3 time-lapse imaging systems (Embryoscope+, GERI, MIRI). Among them, data from 7 independent clinics not involved in the initial algorithm training were included to verify generalization ability. The data included dynamic embryo images, patients’ clinical information, and transfer outcomes, excluding embryos cultured for less than 4 days. The study employed three core algorithms of the EMBRYOLY system: GardNet automatically grades embryo morphology based on the Gardner scale, defining poor-quality embryos as those with expansion stage ≤2 (B1-B2) or expansion ≥3 but with inner cell mass (ICM) and/or trophectoderm (TE) graded as C; the FH-ranking model, based on a Transformer architecture, predicts the probability of clinical pregnancy through dynamic embryo images; the hybrid model combines FH-ranking scores with patients’ clinical characteristics to generate personalized transplantation confidence. The experiment was analyzed through four subsets to verify the consistency between EMBRYOLY and embryologists’ decisions, the correlation between FH-ranking scores and pregnancy outcomes, the impact of AI on pregnancy efficiency, and the pregnancy rates of embryos of different qualities recommended by the hybrid model.

Data collection and exclusion criteria

Key Findings

The study showed that in 29% of embryo cohorts, embryologists could only select poor-quality embryos for transfer, highlighting the clinical necessity of evaluating poor-quality embryos. In terms of decision consistency, the agreement rate between EMBRYOLY and embryologists in selecting the preferred poor-quality embryo reached 66%, significantly higher than the random probability of 17%, with a consistency of 71% in single embryo transfer and 49% in multiple embryo transfer. FH-ranking scores were significantly correlated with clinical pregnancy (OR=1.03, P<0.001) and live birth (OR=1.02, P<0.001) of poor-quality embryos, and remained stable in subgroup analyses across multiple independent clinics. Results of simulated AI-assisted decision-making showed that in scenarios where only poor-quality embryos could be selected, the first-cycle pregnancy rate with embryologists’ independent decisions was 37%, with a pregnancy cycle length of 1.78; following EMBRYOLY’s recommendations, the first-cycle pregnancy rate increased to 61% (P=0.02), and the pregnancy cycle length shortened to 1.44 (P=0.01), achieving a 65% improvement in efficiency and a 19% reduction in cycles, respectively. The hybrid model analysis indicated that although only 10% of poor-quality embryos were rated as high transplantation confidence (compared to 29% of non-poor-quality embryos), the pregnancy rate of poor-quality embryos with high confidence reached 46%, which was not statistically different from the 50% of non-poor-quality embryos with high confidence (P=0.54), demonstrating that AI can identify poor-quality embryos with “hidden potential.”

The relationship between FH-ranking score and the outcome of low-quality embryo transfer

Research Significance

The clinical and academic significance of this study is profound. In clinical practice, it is the first to confirm the practical value of an objectively trained AI system in evaluating poor-quality embryos. The fact that 29% of embryo cohorts only have poor-quality embryos available for transfer means that the application of EMBRYOLY can directly address a large number of clinical needs, helping embryologists make more precise decisions in scenarios where traditional grading systems may “misjudge.” The 66% decision consistency between EMBRYOLY and embryologists not only verifies the reliability of AI but also serves as a “second opinion” to reduce human subjective bias, especially providing supplementary references in complex scenarios such as multiple embryo transfer. The improvement in pregnancy efficiency by AI is groundbreaking: a 65% increase in first-cycle pregnancy rate and a 19% reduction in cycles directly reduce the treatment burden and psychological pressure on patients, while improving the resource utilization efficiency of assisted reproductive technology. Moreover, the pregnancy rate of high-confidence poor-quality embryos (46%) is comparable to that of high-confidence non-poor-quality embryos (50%), subverting the inherent perception that “poor quality equals low potential” and providing a basis for clinically retaining more potentially transplantable embryos, avoiding unnecessary embryo discard.

Compare the clinical pregnancy cycle length of inferior embryos and the clinical pregnancy rate in the first cycle when embryologists are alone and with the help of the FH-ranking model

Limitations

Despite its significant value, the study has limitations that need to be acknowledged. First, the identification of poor-quality embryos relies on the GardNet algorithm, which, although validated with 83% accuracy, was not fully manually reviewed by embryologists, potentially leading to a small number of misjudgments that affect the basis of subsequent analyses. Second, the definition of poor-quality embryos is based on the Gardner grading system, while other international grading systems such as ASEBIR and Istanbul exist; conclusions based on a single standard may limit its application in clinical centers using different grading systems. Third, double embryo transfer (DET) accounts for 16% of cases; pregnancy outcomes in such cases may be affected by multiple embryos, making it difficult to clarify the contribution of a single embryo, introducing noise into the evaluation of the FH-ranking model’s ranking ability. Additionally, there is a certain selection bias in the data: only embryos cultured for more than 4 days were included (to meet Gardner grading criteria), while some embryos in clinical practice may be transferred at an earlier stage, meaning the AI’s performance in such scenarios was not verified; approximately 5% of cases with vitrified oocytes were treated as an independent cohort, which is consistent with clinical practice but may affect data homogeneity.

The changes in clinical pregnancy rates and model transfer confidence between poor-quality embryos and non-poor-quality embryos

Future Perspectives

Future research can break through these limitations in several aspects: first, expand the data scope to include data from centers using grading systems such as ASEBIR and Istanbul to verify the applicability of AI under different standards; second, supplement manual review by embryologists on poor-quality embryos identified by AI to improve the accuracy of basic data; third, design special analyses for double embryo transfer, optimizing the model’s evaluation ability in multiple embryo scenarios through follow-up data of single embryo transfer. Meanwhile, integrating single-cell transcriptomics, epigenetics, and other molecular data to explore biological markers of pregnancy potential in poor-quality embryos can enhance the mechanistic interpretability of AI evaluation; developing adaptive evaluation models for embryos cultured for less than 4 days to cover more comprehensive clinical scenarios. Ultimately, through continuous algorithm optimization and expanded validation, promote AI-assisted poor-quality embryo evaluation to become a clinical routine, further improving the precision and success rate of assisted reproductive technology.

reference

T Roque, N Dissler, A Duval, D Nogueira, C Geoffroy-Siraudin, A Boussommier, O-250 Never a lost cause: can artificial intelligence (AI)-powered ranking help decrease time to pregnancy when only poor-quality embryos are available?, Human Reproduction, Volume 39, Issue Supplement_1, July 2024, deae108.297, https://doi.org/10.1093/humrep/deae108.297

发表评论