Beyond the stars: predicting guest satisfaction through NLP and machine learning of hotel reviews

Eslam Ahmed Fathy et al.

Journal of Hospitality and Tourism Technology2026https://doi.org/10.1108/jhtt-09-2025-0771article
AJG 1ABDC B
Weight
0.50

Abstract

Purpose This study aims to present a framework for applying natural language processing techniques to analyze and classify hotel customer reviews. Design/methodology/approach Using a data set of over 500,000 hotel reviews, a supervised machine learning model is developed to predict whether a review is good or bad based on its textual content. The approach involved a comprehensive data preprocessing pipeline, including tokenization, stop-word removal and lemmatization. For feature engineering, a combination of sentiment analysis scores (using valence-aware dictionary and sentiment reasoner), basic text metrics and advanced text vectorization techniques is integrated such as Doc2Vec and TF-IDF. Findings Given the significant class imbalance in the data set, with a very low percentage of negative reviews, the model performance is rigorously evaluated using the precision–recall curve and the average precision (AP) metric, which are better suited for such scenarios than the traditional receiver operating characteristic curve. The final model, a random forest classifier, achieved an AP of 0.37, demonstrating its effectiveness in identifying the minority class of negative reviews. The results indicate that sentiment analysis features are the most influential in predicting reviewer satisfaction. Practical implications The framework created allows recognizing negative reviews in time to take action to facilitate immediate service recovery and proactive reputation management. The model can be applicable on a strategic level to uncover recurring operational issues, track customer satisfaction trends and derive marketing insights used in the reviews. Originality/value This paper provides a foundation for developing automated systems that enable hotels to better understand and respond to customer feedback in real time.

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.1108/jhtt-09-2025-0771

Or copy a formatted citation

@article{eslam2026,
  title        = {{Beyond the stars: predicting guest satisfaction through NLP and machine learning of hotel reviews}},
  author       = {Eslam Ahmed Fathy et al.},
  journal      = {Journal of Hospitality and Tourism Technology},
  year         = {2026},
  doi          = {https://doi.org/https://doi.org/10.1108/jhtt-09-2025-0771},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

Beyond the stars: predicting guest satisfaction through NLP and machine learning of hotel reviews

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.50 × 0.4 = 0.20
M · momentum0.50 × 0.15 = 0.07
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.