Population Size and Cigarette Consumption: A Machine Learning Perspective
Adebayo OP, Ahmed I, Ogunjimi OA and Oyeleke KT
Published on: 2025-06-20
Abstract
This study develops a predictive machine learning model to analyze cigarette pack sales using key economic and demographic variables. The gradient boosting model achieved strong predictive performance with a test RMSE of 0.530 and explained 70.6% of sales variance (R² = 0.706), demonstrating reliable forecasting capability. Feature importance analysis revealed price as the dominant predictor, contributing 55.7% of the model's explanatory power, followed by tax (12.7%) and population (11.5%). The implementation of early stopping at the 10th iteration prevented overfitting while maintaining model generalizability. These findings provide actionable insights for public health policy and retail strategy, quantifying how pricing and taxation influence consumption patterns. The analysis also identified negligible contributions from certain variables (e.g., CPI), suggesting opportunities for model simplification. By combining robust predictive accuracy with interpretable feature importance metrics, this research offers a data-driven framework for understanding cigarette market dynamics and supporting evidence-based decision-making. The results highlight the critical role of price sensitivity in tobacco consumption while establishing methodological foundations for future sales forecasting models.