The role of reliability in experiments
Jeffrey N. Rouder et al.
Abstract
We are concerned about an emphasis on reliability for analysis of psychology experiments. Experiments have two elements of sample size: the number of individuals and the number of replicate trials within a task, and that complicates reliability measures. To account for these elements, we distinguish among three levels of analysis: (1) A foundational level that centers task properties without recourse to either element of sample size. An example statistic is intraclass correlation which is the proportion of variances without reference to sample sizes. (2) An intermediate level that centers the number of trials but not the number of individuals. An example statistic on this level is reliability which describes variabilities with reference to numbers of trials but not numbers of individuals. A final level centers both the numbers of individuals and trials. An example quantity is the uncertainty in a correlation coefficient, which, ideally, reflects sample size limits in individuals and trials. Reliability describes an intermediate level - neither useful for communicating foundational task properties nor interpreting correlations. We advocate that researchers consider all three levels and highlight the role of hierarchical models in doing so.
Evidence weight
Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40
| F · citation impact | 0.50 × 0.4 = 0.20 |
| M · momentum | 0.50 × 0.15 = 0.07 |
| V · venue signal | 0.50 × 0.05 = 0.03 |
| R · text relevance † | 0.50 × 0.4 = 0.20 |
† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.