Advanced Insurance Risk Modeling for Pseudo-New Customers Using Balanced Ensembles and Transfomer Architectures

Handle

https://riunet.upv.es/handle/10251/236401

Cita bibliográfica

Solly, FL.; Soriano-González, Raquel; Juan, Angel A.; Guerrero, A. (2026). Advanced Insurance Risk Modeling for Pseudo-New Customers Using Balanced Ensembles and Transfomer Architectures. Risks. 14(4). https://doi.org/10.3390/risks14040091

Titulación

Resumen

[EN] In insurance portfolios, classifying customers without a prior history at a given company is particularly challenging due to the absence of historical behavior, extreme class imbalance, heavy-tailed loss distributions, and strict operational constraints. Traditional machine learning approaches, including the baseline methodology proposed in previous studies, typically optimize global predictive accuracy and therefore fail to capture business-critical outcomes, especially the identification of high-risk clients. This study extends the existing approach by evaluating two complementary business-aware classification strategies: (i) a balanced bagging ensemble specifically designed to handle class imbalance and maximize expected profit under explicit customer-omission constraints, and (ii) a lightweight Transformer-based architecture capable of learning richer feature representations. Both approaches incorporate the asymmetric financial cost structure of insurance and operate under operational selection limits. The empirical analysis is conducted on a proprietary large-scale auto insurance dataset comprising 51,618 customers and is complemented by validation on nine synthetic datasets to assess robustness. Model performance is evaluated using statistical tests (ANOVA, Friedman, and pair-wise comparisons) together with business-oriented metrics. The results show that both proposed approaches consistently outperform the baseline methodology (p < 0.001) in terms of profit, with the ensemble offering a better balance of performance and efficiency, while the Transformer shows stronger robustness and generalization under data perturbations. The balanced ensemble provides the most favourable trade-off between predictive performance, robustness, interpretability, and computational efficiency, making it suitable for deployment in regulated insurance environments, while the Transformer achieves competitive results and exhibits stronger generalization under data perturbations. The proposed approach aligns machine learning with actuarial portfolio optimization by explicitly integrating profit-driven objectives and operational constraints, offering two practical and scalable solutions for risk-based decision-making in real-world insurance settings.

Fuente

Risks

Enlaces relacionados

URL