This paper presents a synthetic data generation tool for automated sperm analysis. Unlike existing Generative Adversarial Network (GAN)-based methods, the tool generates highly realistic sperm microscopic images and videos via a user-friendly graphical interface—without requiring large real sample datasets or neural network training. It also automatically provides complete annotations suitable for classification, detection, segmentation, and tracking tasks. However, the visual realism of images generated by the current system still lags behind that of GAN-based methods, and manual parameter adjustment may pose a usability barrier for non-professional users. With future optimizations, it could become an important tool for addressing data scarcity.
Hernández-Ferrándiz D, Pantrigo JJ, Montalvo S, Cabido R. AndroGen: Open-source synthetic data generation for automated sperm analysis. Comput Methods Programs Biomed. 2025 Oct 27;274:109132. doi: 10.1016/j.cmpb.2025.109132. Epub ahead of print. PMID: 41172583.

Research Background and Objectives
Automated sperm analysis systems rely on machine learning models, which require large-scale and diverse image datasets for training. However, acquiring real annotated microscopic samples is a costly and time-consuming process, often constrained by privacy issues. To address this challenge, this paper proposes AndroGen—an open-source software tool that rapidly generates realistic sperm microscopic images without the need for real samples or complex artificial intelligence training. By simulating the morphological characteristics and movement patterns of real sperm, the tool provides a flexible and customizable data solution for the Computer-Assisted Sperm Analysis (CASA) field, helping researchers overcome bottlenecks in data collection and annotation.
Experimental Design
To verify the practical effectiveness of AndroGen, three representative public datasets were selected as reference standards: two human sperm datasets (SVIA and VISEM-Tracking) and one boar sperm dataset (BOSS-Track). These datasets differ significantly in image style, sperm density, and imaging conditions, enabling comprehensive testing of the tool’s adaptability. Experiments were conducted from two evaluation dimensions: first, two internationally recognized image similarity metrics (FID and KID) were used to quantitatively compare the overall similarity between synthetic and real images, with intra-dataset splits (lower bound) and inter-dataset comparisons (upper bound) establishing quality assessment benchmarks; second, domain experts were invited to conduct visual inspections to evaluate the visual realism of synthetic images. Performance testing was performed on a standard office computer (8-core processor, 16GB RAM), with batch generation of images with varying sperm concentrations (50–250 per frame) to test the tool’s generation speed and stability.
Experimental Evaluation
Experimental results confirm AndroGen’s excellent performance in multiple aspects. Quantitative evaluations show that the similarity scores between synthetic images and corresponding real datasets are significantly higher than those between different real datasets, indicating that synthetic images successfully capture the unique characteristics of target datasets. Among them, images generated to mimic the BOSS dataset achieved the highest similarity, with experts barely distinguishing them from real ones by visual inspection. Qualitative analysis further confirms that synthetic images accurately reproduce the visual features of each dataset, including the clear contours of sperm heads in SVIA, background texture details in VISEM, and natural light and shadow effects of cells and debris in the BOSS dataset. In terms of performance, generating a single image on a standard computer takes only 1.14–1.30 seconds, and processing time exhibits a stable linear relationship with sperm count. This demonstrates that the tool can operate efficiently on conventional hardware, making it suitable for large-scale dataset production.

Research Innovations
This study achieves three key breakthroughs. First, it operates without real samples: unlike traditional methods that rely on large real image datasets for training, AndroGen generates data by adjusting biological parameters alone, completely resolving privacy and sample acquisition challenges. Second, it automatically generates complete annotations: the system simultaneously produces all necessary annotation information, such as sperm positions, contour boundaries, and movement trajectories, which can be directly used to train a full suite of analysis algorithms for classification, counting, morphological analysis, and motion tracking—eliminating the tedious work of manual annotation. Third, it enables fine-grained cross-species simulation: it incorporates biological databases of sperm from various animals (including humans, horses, and pigs), covering normal morphologies and common abnormalities (e.g., neck bending, tail curling, cytoplasmic residues), and accurately simulates the movement speed and oscillation patterns of real sperm to generate highly realistic dynamic videos.
Research Limitations and Future Directions
Currently, the visual realism of synthetic images still has room for improvement compared to certain deep learning methods. Future plans include integrating image style optimization techniques to enhance visual realism. Additionally, parameter adjustment may present a learning curve for non-professional users; intelligent recommendation functions will be developed to automatically optimize settings based on user goals. While adding entirely new sperm morphologies requires professional programming support, the team will provide detailed tutorials and templates to lower extension barriers and encourage more researchers to participate in improvements.
Research Significance
AndroGen provides researchers in reproductive medicine and computer science with a fast and reliable data generation solution. In clinical applications, the tool can low-costly produce educational materials and standardized test datasets to support training for young technicians and quality control. In scientific research, it has been verified that the generated images can be directly used to train deep learning models and develop novel sperm analysis algorithms. By removing data barriers, this open-source software will significantly accelerate the innovation and popularization of automated sperm analysis technology, ultimately improving the efficiency and accuracy of male infertility diagnosis and treatment.