How Well Do Ratings Reflect Sentiment? Evidence From a Large Italian Review Corpus
Nicolò Biasetton et al.
Abstract
Understanding whether numerical ratings reliably reflect the sentiment expressed in user‐generated product reviews is critical for accurate interpretation of online feedback. Although star ratings provide immediate, quantifiable signals to consumers and businesses, they may not fully convey the nuanced sentiment contained in text. Thus, we investigate the relationship between review ratings and underlying sentiment using a large corpus of Italian online product reviews. Since review corpora typically lack explicit sentiment labels, we develop a predictive framework for sentiment. We use a BERT‐based encoder (specifically, AlBERTo), fine‐tuned on our large, domain‐specific corpus, and a multi‐task CORAL ordinal regression trained on a sample with multiple human annotations. Finally, we utilize Correspondence Analysis to compare user ratings with the predicted sentiment scores. Our sentiment model shows strong performance on the validation set when evaluated on a five‐point ordinal scale, achieving MAE below 0.62 and RMSE below 0.82. The comparison between ratings and sentiment predictions shows that ratings and textual sentiment are generally aligned at extreme and neutral points, but notable discrepancies exist for mid‐scale evaluations, where ratings often fail to capture underlying textual nuances.
Evidence weight
Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40
| F · citation impact | 0.50 × 0.4 = 0.20 |
| M · momentum | 0.50 × 0.15 = 0.07 |
| V · venue signal | 0.50 × 0.05 = 0.03 |
| R · text relevance † | 0.50 × 0.4 = 0.20 |
† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.