Utilizing large language models (LLMs) for quantitative reasoning-intensive tasks within the (re)insurance sector

Yilin Hao et al.

Annals of Actuarial Science2025https://doi.org/10.1017/s1748499525100079article

AJG 1ABDC A

Weight

0.50

What the paper says

The rise of large language models (LLMs) has marked a substantial leap toward artificial general intelligence. However, the utilization of LLMs in (re)insurance sector remains a challenging problem because of the gap between general capabilities and domain-specific requirements. Two prevalent methods for domain specialization of LLMs involve prompt engineering and fine-tuning. In this study, we aim to evaluate the efficacy of LLMs, enhanced with prompt engineering and fine-tuning techniques, on quantitative reasoning tasks within the (re)insurance domain. It is found that (1) compared to prompt engineering, fine-tuning with task-specific calculation dataset provides a remarkable leap in performance, even exceeding the performance of larger pre-trained LLMs; (2) when acquired task-specific calculation data are limited, supplementing LLMs with domain-specific knowledge dataset is an effective alternative; and (3) enhanced reasoning capabilities should be the primary focus for LLMs when tackling quantitative tasks, surpassing mere computational skills. Moreover, the fine-tuned models demonstrate a consistent aptitude for common-sense reasoning and factual knowledge, as evidenced by their performance on public benchmarks. Overall, this study demonstrates the potential of LLMs to be utilized as powerful tools to serve as AI assistants and solve quantitative reasoning tasks in (re)insurance sector.

Open paper page →

Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact	0.50 × 0.4 = 0.20
M · momentum	0.50 × 0.15 = 0.07
V · venue signal	0.50 × 0.05 = 0.03
R · text relevance †	0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.