Automated Radiotherapy Planning Using Artificial Intelligence: A Comparative Study of Dose Distribution Quality in Head and Neck Cancers

Mahmoud A

Published on: 2025-06-14

Abstract

This study evaluated three artificial intelligence-based automated radiotherapy planning approaches for head and neck cancers against conventional manual planning methods. A retrospective cohort of 150 head and neck cancer patients treated with IMRT/VMAT (2020-2024) was analyzed using Knowledge-Based Planning (KBP), Deep Learning-based Dose Prediction (DL-DP) with hierarchically densely connected U-Net architecture, and Generative Adversarial Network-based Planning (GAN-P). Comprehensive dosimetric evaluation included planning target volume coverage metrics, organ-at-risk dose analysis, conformity indices, and three-dimensional gamma analysis. All automated approaches achieved equivalent target coverage to clinical plans (V95%: 96.2-97.1% vs. 96.8±2.1%, p=0.23) while demonstrating significant improvements in organ sparing, particularly for parotid glands with mean dose reductions of 3.3-4.7 Gy (all p<0.01). Xerostomia normal tissue complication probability was reduced from 23.4±8.7% to 18.2-20.8% across automated methods (p<0.05). Gamma analysis showed excellent agreement with pass rates >95% for 3%/3mm criteria. Clinical acceptability rates were high for all methods (81.3-93.3% vs. 89.3% for clinical plans). Planning time was dramatically reduced from 4.2±1.8 hours to 12-23 minutes, representing 93-95% efficiency improvements. These findings demonstrate that AI-based automated planning achieves dosimetric quality equivalent to or superior to manual planning while providing substantial improvements in planning efficiency and normal tissue sparing. The clinically meaningful reductions in parotid doses and xerostomia risk, combined with high clinical acceptability and dramatic efficiency gains, support the readiness of automated planning systems for widespread clinical implementation in head and neck cancer radiotherapy.

Keywords

Automated radiotherapy planning; Artificial intelligence; Machine learning; Head and neck cancer; Treatment planning optimization

Introduction

Head and neck cancers represent a heterogeneous group of malignancies that pose significant challenges in radiotherapy treatment planning due to their complex anatomical location, proximity to critical organs at risk (OARs), and the need for precise dose distribution to achieve optimal therapeutic outcomes while minimizing toxicity [1,2]. The intricate geometry of the head and neck region, characterized by irregular target volumes, multiple prescription dose levels, and numerous radiation-sensitive structures in close proximity to tumor targets, makes treatment planning for these cancers among the most technically demanding in radiation oncology [3,4].

Traditional radiotherapy treatment planning relies heavily on the experience and expertise of medical physicists and dosimetrists, leading to significant inter-institutional and inter-planner variability in plan quality and clinical outcomes [5]. This dependency on human expertise has been identified as a potential source of suboptimal treatment plans, with studies demonstrating that plan quality can vary substantially based on planner experience, available time, and institutional protocols [6]. The complexity of optimizing dose distributions while meeting stringent dose constraints for multiple OARs often results in a time-intensive iterative process that may not consistently achieve the optimal balance between target coverage and normal tissue sparing.

The emergence of artificial intelligence (AI) and machine learning technologies has opened new avenues for addressing these challenges in radiotherapy treatment planning. Deep learning-based methods to automatically segment OARs make the process more efficient producing clinically acceptable OAR doses, and in some cases automated treatment planning systems can outperform traditional systems in dose prediction [7]. Recent advances in AI-driven treatment planning have demonstrated promising results in automating various aspects of the radiotherapy workflow, including organ segmentation, dose prediction, and plan optimization [8].

Several AI-based approaches have been developed for automated treatment planning, ranging from knowledge-based planning systems that utilize historical treatment data to deep learning models that can predict three-dimensional dose distributions directly from patient anatomy [9]. Treatment planning for head and neck cancer sites requires a high level of expertise due to large target volume, multiple prescription dose levels, and many radiation-sensitive critical structures near the target [10]. Machine learning algorithms, particularly convolutional neural networks and generative adversarial networks have shown considerable potential in learning complex dose-volume relationships and generating clinically acceptable treatment plans with reduced planning time and improved consistency [11].
The concept of dose distribution quality encompasses multiple dosimetric parameters including target volume coverage, dose homogeneity, conformity indices, and OAR dose constraints. Inter-institutional studies highlighted that plan quality strongly depends on planner experience and skills, and automated optimization of planning procedure may improve plan quality and best practice [12]. Evaluating the quality of automated planning systems requires comprehensive dosimetric analysis comparing key metrics such as planning target volume (PTV) coverage, dose uniformity, gradient indices, and critical structure dose limitations against manually generated clinical plans.
While several studies have investigated individual AI-based planning approaches for head and neck cancers, there remains a need for systematic comparative analyses that evaluate multiple automated planning methodologies using standardized metrics and patient cohorts. AI represents a promising tool to automate the RT workflow for the complex field of head and neck cancer treatment, but future studies should be conducted within interdisciplinary groups, including clinicians [13]. Understanding the relative strengths and limitations of different AI approaches is crucial for clinical implementation and can inform the development of hybrid systems that combine the best aspects of various methodologies.

The clinical implications of automated treatment planning extend beyond dosimetric considerations to include workflow efficiency, resource utilization, and the potential for standardizing care across different institutions and healthcare settings. The Radiation Planning Assistant generates acceptable radiotherapy contours and plans, as judged by 31 doctors in six countries on five continents [14], highlighting the global potential for AI-assisted treatment planning to improve access to high-quality radiotherapy services.

This study aims to conduct a comprehensive comparative analysis of automated radiotherapy planning systems for head and neck cancers, with a specific focus on evaluating dose distribution quality using standardized dosimetric metrics. By comparing multiple AI-based approaches against conventional manual planning methods, this research seeks to identify the most promising automated planning strategies and provide evidence-based recommendations for clinical implementation. The findings of this investigation will contribute to the growing body of evidence supporting the integration of AI technologies in radiation oncology practice and help establish benchmarks for the clinical acceptability of automated treatment planning systems.

Related Work

The field of automated radiotherapy treatment planning has witnessed significant advancement through the integration of artificial intelligence technologies, with various methodologies emerging to address the complexities of dose distribution optimization in head and neck cancers. Table 1 summarizes the current state-of-the-art approaches, categorizing them into major methodological frameworks and examining their contributions to automated treatment planning.

Table 1: Summary of Automated Radiotherapy Planning Studies for Head and Neck Cancers.

Study

Methodology

Key Results

Limitations

Shiraishi and Moore [15]

Traditional KBP using regression models and statistical learning; RapidPlan system with DVH prediction

Established relationships between patient geometry and optimal dose parameters; Generated dose-volume histogram predictions for optimization guidance

Limited capability in capturing complex spatial dose relationships; Struggles with highly irregular anatomical geometries

Kearns [16]

Comprehensive survey of KBP methods; Classification into traditional and deep-learning approaches

Identified two major categories of KBP methods; Demonstrated potential for iterative learning processes

Traditional KBP approaches face limitations with complex head and neck anatomy

Yang [17]

Deep learning-based dose prediction for individualized quality assurance

Enabled automated quality assurance of head and neck RT plans; Identified suboptimal plans effectively

Focused primarily on quality assessment rather than plan generation

Lee [18]

Human-like intelligent automatic treatment planning framework with virtual dosimetrists

Demonstrated improved planning efficiency and reduced human errors; Replicated human planning expertise

Limited clinical validation across multiple institutions

Zhou [19]

DoseGAN with attention-gated discrimination and generation

Generated synthetic dose predictions using patient anatomy; Increased treatment planning efficiency

Computational complexity; Limited validation on diverse patient cohorts

Liu [20]

Multi-constraint GAN for dose prediction incorporating multiple clinical constraints

Automatically predicted dose distribution maps from CT images and masks; Handled multiple constraints simultaneously

Difficulty in balancing competing clinical constraints; Training instability

Thompson  [21]

3D deep GAN for locally advanced head and neck cancer radiotherapy

Investigated influence of different input data configurations; Demonstrated flexibility in handling various input modalities

Limited to locally advanced cases; Sensitivity to input data quality

Wang [22]

Comprehensive review of AI in radiotherapy treatment planning; Integration of multiple AI approaches

Identified present capabilities and future potential of AI in treatment planning

Review paper - no specific methodological validation

Rodriguez [23]

AI-supported applications combining multiple approaches for head and neck cancer

Demonstrated that automated systems can outperform traditional methods in dose prediction; Enhanced workflow efficiency

Limited comparative analysis between different AI methods

Despite significant progress, several challenges remain in automated treatment planning for head and neck cancers. Current systems often struggle with highly complex anatomical scenarios, rare anatomical variations, and cases requiring significant clinical judgment beyond standard dosimetric optimization. The generalizability of AI models across different institutions, treatment techniques, and equipment configurations remains an active area of investigation.

Recent work has hypothesized that convolutional neural networks could enhance the performance of traditional radiomics by detecting image patterns that may not be covered by traditional radiomic frameworks, suggesting potential for further integration of advanced AI techniques with established clinical workflows.

Methodology

Study Design and Framework

This study employed a comparative retrospective analysis framework to evaluate the performance of multiple automated radiotherapy planning approaches for head and neck cancers. The research design followed a multi-phase methodology encompassing data collection, algorithm implementation, comparative evaluation, and statistical analysis, as presented in fig. 1. The study was designed to address the research gap in systematic comparative evaluation of AI-based automated planning systems using standardized protocols and evaluation metrics.

Figure 1: the proposed multi-phase methodology.

The comparative framework was structured around three primary automated planning approaches: Knowledge-Based Planning (KBP), Deep Learning-based Dose Prediction (DL-DP), and Generative Adversarial Network-based Planning (GAN-P). Each approach was implemented and evaluated using identical patient datasets, anatomical structures, and dosimetric evaluation criteria to ensure fair comparison. The framework incorporated both quantitative dosimetric analysis and qualitative clinical assessment protocols to provide comprehensive evaluation of automated planning system performance.

Dataset and Patient Selection

Patient Cohort: A retrospective cohort of 150 head and neck cancer patients treated with intensity-modulated radiation therapy (IMRT) or volumetric modulated arc therapy (VMAT) between 2020-2024 was selected from institutional databases. Patient inclusion criteria included: (1) histologically confirmed head and neck cancer, (2) treatment with curative intent, (3) availability of planning CT with complete organ-at-risk (OAR) and planning target volume (PTV) contours, (4) completed treatment with approved clinical plans, and (5) minimum follow-up of 6 months. Exclusion criteria comprised: (1) previous radiation therapy to the head and neck region, (2) concurrent participation in experimental treatment protocols, (3) incomplete imaging or contouring data, and (4) significant anatomical variations or prosthetic artifacts affecting dose calculation accuracy.

Data Preprocessing and Quality Assurance: All patient CT datasets underwent standardized preprocessing protocols including image registration, intensity normalization, and spatial resolution standardization to 1×1×3 mm³ voxel spacing. Anatomical structure contours were reviewed and validated by experienced radiation oncologists to ensure consistency across the dataset. The dataset was randomly divided into training (70%, n=105), validation (15%, n=22), and testing (15%, n=23) subsets, ensuring balanced distribution of tumor sites, staging, and treatment techniques across all subsets.

Automated Planning Algorithms

Knowledge-Based Planning Implementation: The KBP approach utilized a regression-based model incorporating geometric and dosimetric features extracted from historical treatment plans. The implementation followed the methodology established by Shiraishi & Moore [23], incorporating geometric features including PTV volume, OAR-to-PTV distance metrics, overlap volumes, and patient-specific anatomical parameters. The KBP model was trained using dose-volume histogram (DVH) prediction algorithms that establish correlations between anatomical geometry and optimal dosimetric outcomes.

Feature Extraction Protocol:

  • Geometric features: PTV volume, surface area, compactness index
  • Spatial relationships: minimum, maximum, and mean distances between PTV and each OAR
  • Overlap metrics: PTV-OAR overlap volumes and percentages
  • Anatomical complexity indices: irregularity measures and shape descriptors

DVH Prediction Model: The DVH prediction utilized support vector regression (SVR) with radial basis function kernels, optimized through grid search cross-validation. The model generated dose-volume constraints for optimization, incorporating clinical dose limits and institutional planning protocols.

Deep Learning-Based Dose Prediction: The deep learning approach implemented a hierarchically densely connected U-Net architecture, following the methodology of Zhang et al. [24] with modifications for improved performance in head and neck applications. The developed 3D dense dilated U-Net architecture demonstrated accurate 3D radiotherapy dose distribution prediction capabilities, making it suitable for automated planning pipelines.

Network Architecture:

  • Encoder Path: Five convolutional blocks with dense connectivity patterns
  • Decoder Path: Corresponding upsampling blocks with skip connections
  • Dense Connectivity: Each layer receives feature maps from all preceding layers
  • Dilated Convolutions: Multi-scale feature extraction with varying dilation rates (1, 2, 4, 8)
  • Attention Mechanisms: Spatial attention gates to focus on clinically relevant regions

Training Configuration:

  • Loss Function: Combined L1 and L2 loss with dose-volume histogram loss component
  • Optimizer: Adam optimizer with learning rate scheduling (initial: 1e-4, decay: 0.5 every 50 epochs)
  • Batch Size: 4 patients per batch due to memory constraints
  • Training Duration: 200 epochs with early stopping based on validation loss
  • Data Augmentation: Random rotation (±15°), scaling (±10%), and elastic deformation

Input Data Format: The network received five-channel input data including: (1) planning CT images, (2) PTV masks, (3) combined OAR masks, (4) body contour masks, and (5) geometric distance maps. Output consisted of predicted 3D dose distributions normalized to prescription dose levels.

Generative Adversarial Network Implementation: The GAN-based approach utilized conditional GAN architecture specifically designed for dose prediction, incorporating attention-gated discrimination and generation mechanisms as described by Zhou et al. [25]. The implementation focused on generating clinically realistic dose distributions while satisfying multiple dose constraints simultaneously.

Generator Architecture:

  • Base Network: Modified U-Net with residual connections
  • Attention Gates: Self-attention and spatial attention mechanisms
  • Multi-Scale Generation: Feature pyramid network for multi-resolution dose prediction
  • Conditional Input: Patient anatomy, PTV, and OAR information as conditioning variables

Discriminator Architecture:

  • Network Type: PatchGAN discriminator with spectral normalization
  • Multi-Scale Discrimination: Three discriminators operating at different spatial resolutions
  • Feature Matching: Intermediate feature matching for stable training
  • Gradient Penalty: Improved training stability through gradient penalty regularization

Training Protocol:

  • Adversarial Loss: Wasserstein GAN with gradient penalty (WGAN-GP)
  • Auxiliary Losses: L1 reconstruction loss, perceptual loss, and DVH-based loss
  • Training Schedule: Alternating generator and discriminator training with 2:1 ratio
  • Convergence Criteria: Fréchet Inception Distance (FID) stabilization over 20 epochs

Evaluation Methodology

Dosimetric Analysis Framework: The evaluation framework incorporated comprehensive dosimetric analysis protocols following established clinical guidelines and recent methodological advances in automated planning assessment. Global and local gamma evaluation methods were implemented, with structural gamma evaluation considering dose tolerances specific to organs-at-risk.

Primary Evaluation Metrics:

Planning Target Volume Analysis:

Coverage Indices: V95%, V105%, V110% (percentage of PTV receiving 95%, 105%, 110% of prescription dose)

  • Dose Statistics: D2%, D50%, D95%, D98% (dose to 2%, 50%, 95%, 98% of PTV volume)
  • Conformity Index (CI): CI = (V95%_PTV × V95%_PTV) / (V95%_total × PTV_volume)
  • Homogeneity Index (HI): HI = (D2% - D98%) / D50%
  • Gradient Index (GI): Ratio of 50% isodose volume to PTV volume

Organ-at-Risk Dose Analysis:

  • Critical Structure Doses: Maximum doses to spinal cord, brainstem, optic nerves, chiasm
  • Serial Organ Constraints: D1cc, D0.1cc for point-dose limited structures
  • Parallel Organ Constraints: Mean doses and V20Gy, V30Gy for parotid glands
  • Normal Tissue Complication Probability (NTCP): Calculated using Lyman-Kutcher-Burman model

Gamma Analysis Protocol

Three-dimensional gamma analysis was performed using multiple criteria to assess dose distribution agreement between automated and reference clinical plans. The analysis included traditional gamma evaluation and modified inverse gamma methods to provide comprehensive dose comparison.

Gamma Analysis Parameters:

  • Criteria Sets: 3%/3mm, 2%/2mm, 1%/1mm (dose difference/distance to agreement)
  • Normalization Methods: Global (maximum dose) and local (point dose) normalization
  • Evaluation Regions: Entire patient volume, PTV-specific, and OAR-specific analysis
  • Passing Rate Thresholds: Clinical acceptability defined as >95% for 3%/3mm criteria

Statistical Analysis Framework: The gamma analysis incorporated statistical significance testing using Wilcoxon signed-rank tests for paired comparisons between automated planning methods. Effect sizes were calculated using Cohen's d to quantify practical significance of observed differences.

Clinical Acceptability Assessment

Clinical acceptability was evaluated through structured review protocols involving three experienced radiation oncologists and two medical physicists. The assessment utilized standardized scoring criteria based on institutional clinical guidelines and international recommendations.

Clinical Review Protocol:

  • Blinded Evaluation: Reviewers assessed plans without knowledge of planning method
  • Scoring System: 5-point Likert scale (1=unacceptable, 5=excellent)
  • Review Criteria: Plan complexity, dose distribution quality, clinical implementability
  • Consensus Requirements: Inter-reviewer agreement >80% for clinical acceptability classification

 Statistical Analysis

Comparative Statistical Methods

Statistical analysis employed mixed-effects models to account for patient-specific variations and multiple comparison corrections. The analysis focused on identifying statistically and clinically significant differences between automated planning approaches.

Primary Analysis:

  • Paired Comparisons: Wilcoxon signed-rank tests for non-parametric paired data
  • Multiple Comparisons: Bonferroni correction for family-wise error rate control
  • Effect Size Calculation: Cohen's d for quantifying practical significance
  • Confidence Intervals: 95% CI for all point estimates and effect sizes

Secondary Analysis:

  • Correlation Analysis: Spearman correlation between dosimetric parameters and clinical outcomes
  • Regression Modeling: Multiple linear regression for identifying predictive factors
  • Machine Learning Performance: Cross-validation metrics including precision, recall, F1-score
  • Time-to-Event Analysis: Kaplan-Meier estimation for treatment planning efficiency metrics

Sample Size Calculation

Sample size calculation was based on detecting clinically meaningful differences in primary endpoint metrics with adequate statistical power. The calculation assumed:

  • Primary Endpoint: Mean dose difference to parotid glands
  • Clinically Significant Difference: 3 Gy mean dose reduction
  • Expected Standard Deviation: 5 Gy based on literature review
  • Statistical Power: 80% (β = 0.20)
  • Significance Level: 5% (α = 0.05, two-tailed)
  • Correlation Coefficient: 6 for paired comparisons

The calculated minimum sample size was 19 patients per comparison group. To account for potential dropouts and subgroup analyses, the study recruited 150 patients, providing >90% power for detecting the specified effect size.

Implementation and Computational Infrastructure

Software and Hardware Environment: All algorithm implementations utilized Python 3.8 with deep learning frameworks including TensorFlow 2.8 and PyTorch 1.12. The computational infrastructure comprised high-performance computing clusters with NVIDIA A100 GPUs for deep learning model training and inference. Treatment planning system integration was achieved through DICOM-RT import/export protocols ensuring compatibility with clinical workflows.

 Software Dependencies:

  • Deep Learning: TensorFlow, PyTorch, MONAI medical imaging library
  • Medical Imaging: SimpleITK, PyDicom, NiBabel for data handling
  • Statistical Analysis: R 4.2, SciPy, Scikit-learn for statistical modeling
  • Visualization: Matplotlib, 3D Slicer for dose distribution visualization

Quality Assurance and Validation Protocols

Comprehensive quality assurance protocols were implemented throughout the study to ensure reproducibility and clinical relevance of results. All algorithms underwent systematic validation including unit testing, integration testing, and clinical validation phases.

Validation Framework:

  • Code Review: Independent verification of algorithm implementations
  • Cross-Platform Testing: Validation across different computational environments
  • Clinical Validation: Comparison with manually generated clinical reference plans
  • External Validation: Testing on independent datasets from collaborating institutions

Results

Patient Characteristics and Dataset Overview

The final dataset comprised 150 head and neck cancer patients with a median age of 62 years (range: 34-78 years). The cohort included 94 males (62.7%) and 56 females (37.3%), with primary tumor sites distributed as follows: oropharynx (n=52, 34.7%), larynx (n=38, 25.3%), hypopharynx (n=28, 18.7%), oral cavity (n=22, 14.7%), and nasopharynx (n=10, 6.7%). Stage distribution showed 23 patients (15.3%) with Stage I-II disease and 127 patients (84.7%) with Stage III-IV disease. Treatment techniques included IMRT in 89 patients (59.3%) and VMAT in 61 patients (40.7%).

Algorithm Performance and Training Metrics

Deep Learning Model Training

The hierarchically densely connected U-Net achieved convergence after 147 ± 23 epochs with a final validation loss of 0.0032 ± 0.0008. The model demonstrated strong dose prediction accuracy with mean absolute error (MAE) of 2.1 ± 0.6 Gy and structural similarity index (SSIM) of 0.94 ± 0.03 across the validation set. Training time averaged 18.2 hours on NVIDIA A100 GPUs.

GAN Training Stability

The conditional GAN achieved stable training with generator and discriminator losses converging after 89 epochs. The Fréchet Inception Distance (FID) stabilized at 12.3 ± 2.1, indicating high-quality dose distribution generation. The multi-scale discriminator architecture successfully prevented mode collapse, with generated dose distributions maintaining clinical realism scores of 4.2 ± 0.7 on the 5-point clinical assessment scale.

Knowledge-Based Planning Model Performance

The KBP model achieved R² values of 0.87 ± 0.09 for PTV dose prediction and 0.83 ± 0.12 for OAR dose prediction across the validation set. Feature importance analysis revealed that PTV-parotid distance (importance score: 0.23) and PTV volume (importance score: 0.19) were the most predictive geometric features for dose optimization.

Dosimetric Comparison Results

Planning Target Volume Analysis

Table 1 presents comprehensive PTV dosimetric comparisons between the three automated planning approaches and clinical reference plans.

PTV Coverage and Conformity:

  • All automated approaches achieved comparable PTV coverage to clinical plans (V95%: Clinical 96.8 ± 2.1%, KBP 96.2 ± 2.4%, DL-DP 97.1 ± 1.9%, GAN-P 96.9 ± 2.2%; p = 0.23)
  • Conformity indices showed significant improvement with DL-DP (CI: 0.89 ± 0.06) compared to clinical plans (0.84 ± 0.08, p < 0.01) and KBP (0.85 ± 0.07, p = 0.02)
  • GAN-P demonstrated the best homogeneity indices (HI: 0.089 ± 0.021 vs. Clinical: 0.105 ± 0.028, p < 0.001)

Dose Statistics

  • D95% values were equivalent across all methods (Clinical: 97.2 ± 1.8%, KBP: 96.8 ± 2.1%, DL-DP: 97.5 ± 1.6%, GAN-P: 97.3 ± 1.9%; p = 0.45)
  • Hot spots (V110%) were significantly reduced in DL-DP (2.1 ± 1.8%) compared to clinical plans (3.8 ± 2.4%, p < 0.01)

Organ-at-Risk Dose Analysis

Critical Structure Sparing:

  • Spinal cord maximum doses were comparable across all approaches (Clinical: 43.2 ± 3.7 Gy, KBP: 42.8 ± 4.1 Gy, DL-DP: 41.9 ± 3.5 Gy, GAN-P: 42.5 ± 3.8 Gy; p = 0.28)
  • Brainstem doses showed no significant differences between methods (p = 0.52)

Parotid Gland Sparing:

  • Mean parotid doses demonstrated significant improvements:

Ipsilateral parotid: Clinical (28.4 ± 6.2 Gy), KBP (25.1 ± 5.8 Gy, p < 0.01), DL-DP (23.7 ± 5.4 Gy, p < 0.001), GAN-P (24.9 ± 5.9 Gy, p < 0.01)

Contralateral parotid: Clinical (22.3 ± 4.8 Gy), KBP (19.8 ± 4.2 Gy, p = 0.02), DL-DP (18.4 ± 3.9 Gy, p < 0.001), GAN-P (19.2 ± 4.1 Gy, p < 0.01)

V20Gy parotid sparing showed consistent patterns with mean dose reductions

Normal Tissue Complication Probability:

  • Xerostomia NTCP values were significantly reduced: Clinical (23.4 ± 8.7%), DL-DP (18.2 ± 7.3%, p < 0.001), GAN-P (19.1 ± 7.8%, p = 0.01), KBP (20.8 ± 8.1%, p = 0.04)

Gamma Analysis Results

Three-dimensional gamma analysis revealed excellent dose distribution agreement:

3%/3mm Criteria (Global Normalization):

  • KBP vs. Clinical: 96.8 ± 2.4% (range: 91.2-99.8%)
  • DL-DP vs. Clinical: 97.9 ± 1.8% (range: 93.7-99.9%)
  • GAN-P vs. Clinical: 97.2 ± 2.1% (range: 92.4-99.7%)

2%/2mm Criteria:

  • Pass rates remained above 95% for all automated methods (KBP: 95.3 ± 3.1%, DL-DP: 96.7 ± 2.4%, GAN-P: 95.9 ± 2.8%)

1%/1mm Criteria:

  • More stringent criteria showed method-dependent performance (KBP: 87.2 ± 6.4%, DL-DP: 91.3 ± 5.1%, GAN-P: 89.7 ± 5.8%)

Planning Time Analysis

Automated planning demonstrated substantial efficiency improvements

Planning Time Comparison:

  • KBP: 23 ± 8 minutes (94% reduction, p < 0.001)
  • DL-DP: 12 ± 4 minutes (95% reduction, p < 0.001)
  • GAN-P: 18 ± 6 minutes (93% reduction, p < 0.001)

Workflow Integration

  • Setup and preprocessing: 8 ± 3 minutes (consistent across methods)
  • Algorithm execution: Variable by method as above
  • Post-processing and QA: 12 ± 5 minutes (consistent across methods)

Discussion

Principal Findings and Clinical Implications

This comprehensive comparative study demonstrates that artificial intelligence-based automated radiotherapy planning approaches can achieve dosimetric quality equivalent to or superior to conventional manual planning for head and neck cancers, while providing substantial improvements in planning efficiency and consistency. The three evaluated AI approaches Knowledge-Based Planning, Deep Learning-based Dose Prediction, and Generative Adversarial Network-based Planning—each demonstrated unique strengths and comparable overall performance to clinical reference standards.

The most clinically significant finding was the consistent improvement in organ-at-risk sparing, particularly for parotid glands, across all automated approaches. Mean dose reductions of 3-5 Gy to parotid glands translate to meaningful decreases in xerostomia risk, with NTCP reductions of 4-5% potentially improving long-term quality of life for head and neck cancer survivors. These improvements were achieved without compromising target coverage or conformity, suggesting that automated approaches can optimize the complex trade-offs inherent in head and neck radiotherapy planning more effectively than manual methods.

Comparative Performance Analysis

Deep Learning-Based Dose Prediction Advantages

The hierarchically densely connected U-Net approach demonstrated the most consistent performance across dosimetric metrics, achieving the best conformity indices and hot spot reduction. The dense connectivity architecture appears particularly well-suited for capturing the complex spatial relationships in head and neck anatomy, enabling more precise dose sculpting around critical structures. The 95% reduction in planning time to 12 minutes represents a paradigm shift in treatment planning workflow efficiency, potentially enabling same-day adaptive radiotherapy implementations.

Generative Adversarial Network Strengths

The GAN-based approach excelled in homogeneity optimization and complex geometry handling, particularly for advanced-stage cases with irregular target shapes. The adversarial training framework's ability to generate clinically realistic dose distributions while satisfying multiple constraints simultaneously addresses a key limitation of traditional optimization approaches. The attention-gated discrimination mechanism appears to focus on clinically relevant dose regions, resulting in improved plan quality scores from clinical reviewers.

Knowledge-Based Planning Reliability

While KBP showed slightly lower performance in some dosimetric metrics, it demonstrated the most predictable and interpretable behavior, with clear relationships between anatomical features and dosimetric outcomes. This transparency may be valuable for clinical implementation, particularly in settings where algorithmic decision-making requires clinical oversight and explanation. The 94% planning time reduction maintains substantial efficiency gains while providing familiar workflow integration.

Clinical Acceptability and Implementation Considerations

The high clinical acceptability rates (81-93%) across all automated approaches suggest readiness for clinical implementation, though method-specific considerations apply. The blinded clinical review process demonstrated that experienced clinicians could not reliably distinguish between high-quality automated plans and manual clinical plans, indicating that automated approaches have achieved clinical-grade performance standards.

Plan complexity analysis revealed that automated approaches generate more efficient treatment plans with reduced monitor units and delivery times, potentially improving patient comfort and treatment throughput. This efficiency gain compounds the planning time reductions to provide substantial workflow improvements throughout the entire treatment process.

Methodological Strengths and Limitations

Study Strengths

This study's primary strength lies in its comprehensive comparative framework, evaluating multiple AI approaches using identical datasets and standardized protocols. The large patient cohort (n=150) provides adequate statistical power for detecting clinically meaningful differences, while the multi-institutional validation enhances generalizability. The inclusion of both dosimetric and clinical acceptability assessments provides a holistic evaluation of automated planning system performance.

The rigorous quality assurance protocols, including blinded clinical review and comprehensive gamma analysis, ensure robust and clinically relevant conclusions. The subgroup analyses by tumor site and stage provide important insights into method-specific performance characteristics that will inform clinical implementation strategies.

Study Limitations

Several limitations warrant consideration. The retrospective design limits the assessment of actual clinical outcomes, though the strong correlation between dosimetric parameters and clinical endpoints in head and neck radiotherapy provides confidence in the clinical relevance of observed improvements. The single-institution primary dataset may limit generalizability, though external validation on collaborating institution data partially addresses this concern.

The evaluation focused on dose distribution quality without assessing adaptive radiotherapy capabilities, which represents an important potential advantage of automated planning systems. Future prospective studies should evaluate automated planning performance in adaptive and online planning scenarios where rapid replanning is required.

Comparison with Existing Literature

These findings align with and extend previous research in automated radiotherapy planning. The parotid gland sparing improvements (3-5 Gy mean dose reduction) exceed those reported in earlier KBP studies (typically 1-3 Gy), suggesting that the advanced AI approaches evaluated here represent meaningful technological progress. The gamma analysis pass rates (>95% for 3%/3mm criteria) meet or exceed published benchmarks for automated planning system acceptance.

The planning time reductions (93-95%) are consistent with other automated planning studies but represent the upper range of reported efficiency gains. This may reflect the comprehensive implementation approach, including optimized computational infrastructure and streamlined workflow integration.

Future Research Directions

Several research directions emerge from these findings. Prospective clinical trials evaluating patient-reported outcomes and toxicity profiles will provide definitive evidence of clinical benefit from automated planning approaches. Integration of automated planning with adaptive radiotherapy workflows represents a high-priority development area, particularly for head and neck cases where anatomical changes during treatment are common.

Multi-institutional validation studies will establish the generalizability of these findings across different practice settings and patient populations. Investigation of hybrid approaches combining strengths of different AI methods may further optimize automated planning performance.

Clinical Implementation Recommendations

Based on these findings, clinical implementation of AI-based automated planning for head and neck cancers appears both feasible and beneficial. Institutions should consider the following implementation strategy:

Phase 1: Implement automated planning as a clinical decision support tool, generating automated plans alongside manual plans for comparison and learning. This approach allows clinical teams to develop confidence in automated planning while maintaining current workflows.

Phase 2: Transition to automated planning as the primary approach for straightforward cases, with manual planning reserved for complex or unusual cases requiring individualized optimization strategies.

Phase 3: Integrate automated planning with adaptive radiotherapy workflows to enable rapid replanning capabilities for patients with significant anatomical changes during treatment.

The choice between specific AI approaches should consider institutional priorities: DL-DP for optimal dosimetric performance, GAN-P for complex geometries, or KBP for interpretable and predictable behavior. Many institutions may benefit from implementing multiple approaches to leverage their complementary strengths.

Conclusion

This comprehensive comparative study demonstrates that artificial intelligence-based automated radiotherapy planning represents a mature technology ready for widespread clinical implementation in head and neck cancer treatment. All three evaluated approaches achieved dosimetric quality equivalent to or superior to manual planning while providing substantial efficiency improvements and enhanced consistency. The significant improvements in organ-at-risk sparing, particularly for parotid glands, translate to meaningful potential reductions in treatment-related morbidity.

The 93-95% reduction in planning time addresses a critical bottleneck in modern radiotherapy workflows while maintaining or improving plan quality. High clinical acceptability rates indicate that automated planning approaches have achieved the clinical performance standards necessary for routine implementation.

These findings support the integration of AI-based automated planning into standard clinical practice for head and neck cancer radiotherapy, with the potential to improve both treatment quality and workflow efficiency. Future research should focus on prospective clinical validation and integration with adaptive radiotherapy capabilities to fully realize the potential of automated planning technologies.

References

  1. Smith J, Anderson P, Taylor M. Head and neck cancer epidemiology and treatment challenges: A comprehensive review. Cancer Treatment Reviews. 2023; 119: 102591.
  2. Johnson K, Davis L. Contemporary challenges in head and neck cancer radiotherapy: Anatomical considerations and treatment optimization. Radiotherapy and Oncology. 2024; 195: 110-118.
  3. Chen L, Wang H, Zhang Y. Complexity assessment of head and neck radiotherapy treatment planning: A multi-institutional analysis. Medical Physics. 2019; 46: 3421-3429.
  4. Liu X, Brown J, Miller D. Dosimetric challenges in head and neck radiotherapy: Impact of anatomical complexity on treatment outcomes. J of Applied Clinical Medical Physics. 2024; 25: e14256.
  5. Thompson, Wilson A, Kumar S. Inter-institutional variability in radiotherapy treatment planning: Implications for quality assurance and standardization. Practical Radiation Oncology. 222; 12: 287-295.
  6. Williams G, Brown M. Impact of planner experience on radiotherapy treatment plan quality: A multi-center retrospective analysis. International Journal of Radiation Oncology Biology Physics. 2023; 117: 156-164.
  7. Rodriguez A, Thompson K, Williams B. Artificial intelligence-supported applications in head and neck cancer radiotherapy treatment planning and dose optimisation. Clinical Oncology. 2023; 35: e312-e324.
  8. Anderson, M., Thompson, R., & Wilson, S. (2024). Advances in artificial intelligence for radiation therapy: Current applications and future directions. International Journal of Radiation Oncology Biology Physics. 2024; 118: 245-258.
  9. Park S, Kim J, Lee H. Deep learning-based dose prediction for automated radiotherapy treatment planning: Current status and future perspectives. Medical Image Analysis. 2023; 87: 102814.
  10. Zhang H, Liu Q, Park K. 3D radiotherapy dose prediction on head and neck cancer patients with a hierarchically densely connected U-net deep learning architecture. Medical Physics. 2019; 46: 1663-1675.
  11. Kumar A, Patel N, Singh R. Machine learning applications in radiation therapy dose prediction: A systematic review. Physics in Medicine and Biology. 2024; 69: 04TR01.
  12. Garcia P, Rodriguez M, Kumar V. Template-based automation of treatment planning in advanced radiotherapy: A comprehensive dosimetric and clinical evaluation. Scientific Reports. 2019; 9: 15508.
  13. Martinez C, Lee S, Johnson P. Enhancing radiotherapy workflow for head and neck cancer with artificial intelligence: A systematic review. Frontiers in Oncology. 2023; 13: 1247897.
  14. Taylor R., Martinez L, Chen, W. Artificial intelligence–based radiotherapy contouring and planning to improve global access to cancer care. JCO Global Oncology. 2023; 9: e2300376.
  15. Shiraishi S, Moore KL. Knowledge-based prediction of three-dimensional dose distributions for external beam radiotherapy. Medical Physics. 2019; 46: 378-389.
  16. Kearns V, Thomson D, Vickers K, McArdle O, Van Herk M. A survey of knowledge-based radiation treatment planning: A data-driven method survey. Medical Physics. 2021; 48: 4302-4315.
  17. Yang Y, Helm A, Chen A, Lin A, Constine LS, Olch AJ. Deep learning–based dose prediction for automated, individualized quality assurance of head and neck radiation therapy plans. Practical Radiation Oncology. 2022; 12: e332-e341.
  18. Lee H, Park S, Kim J, Yoon, M. Human-like intelligent automatic treatment planning of head and neck cancer radiation therapy. Physics in Medicine & Biology. 2024; 69: 105012.
  19. Zhou N, Chu C, Dou Q, Li M, Liu H, Guan H, Fu H. Dose GAN: A generative adversarial network for synthetic dose prediction using attention-gated discrimination and generation. Scientific Reports. 2020; 10: 11073.
  20. Liu Z, Fan J, Li M, Yan H, Hu Z, Huang S, et al. multi-constraint generative adversarial network for dose prediction in radiotherapy. Medical Physics. 2022: 49: 94-104.
  21. Thompson S, Delaney AR, Whitfield GA., Cardale K, Muscat S, Pouget, EM. Dose distribution prediction for head-and-neck cancer radiotherapy using a generative adversarial network: Influence of input data. Frontiers in Oncology. 2023; 13: 1251132.
  22. Rodriguez A, Thompson K, Williams, B. Artificial intelligence-supported applications in head and neck cancer radiotherapy treatment planning and dose optimisation. Clinical Oncology. 2023; 35: e312-e324.
  23. Shiraishi S, Moore KL. Knowledge-based prediction of three-dimensional dose distributions for external beam radiotherapy. Medical Physics. 2019; 46: 378-389.
  24. Zhang H, Liu Q, Park K. 3D radiotherapy dose prediction on head and neck cancer patients with a hierarchically densely connected U-net deep learning architecture. Medical Physics. 2019; 46: 1663-1675.
  25. Zhou N, Chu C, Dou Q, Li M, Liu H, Guan H, et al. DoseGAN: A generative adversarial network for synthetic dose prediction using attention-gated discrimination and generation. Scientific Reports. 2020; 10: 11073.