Editorial: 25 YeaRs (Special Issue)

Tilman M. Davies

Australian and New Zealand Journal of Statistics2026https://doi.org/10.1111/anzs.70046article

ABDC A

Weight

0.50

What the paper says

The year 2000 saw many notable events, beyond signalling the turn of a new millennium spared self-aware microwave ovens by an anticlimactic ‘Y2K Bug’ (Li et al. 1999). The first rough draft of the human genome was announced (International Human Genome Sequencing Consortium 2001). The final restrictions on GPS accuracy were lifted for civilian use (Sherman 2004). Sony Computer Entertainment released the Playstation 2 (Perry 1999). Brad Pitt and Jennifer Aniston wed (Wilson 2000). And Version 1.0.0 of R, a modest coding environment built on the S syntax (Becker et al. 1988), created by Ross Ihaka and Robert Gentleman in the early 1990s (Ihaka and Gentleman 1996), was officially released. Now, as we leave 2025 and enter 2026, it is fitting that we celebrate the expansion of an open-source language whose impact, not only in statistics but in a plethora of scientific, industrial and even commercial applications, is difficult to overstate. The origin story of R is well known, the language having been born at the University of Auckland from a practical frustration with the tools then available for teaching and doing modern statistics. Given there are already many good accounts in various forms (see e.g., Thieme 2018; Smith 2020; Chambers 2020), I shall refrain from retelling that history here. Rather, it suffices to highlight two design choices that, in retrospect, were decisive: R was open source, and it was built to be extended. Together, these turned R from a useful environment into a scalable community resource—maintained today by the dedicated volunteers of the R Core Team1—a resource in which new ideas could be readily translated into shareable tools, and shared tools could become shared practice. The timing couldn't have been better, with R's emergence coming right at the dawn of the ‘data science’ era. The subsequent demand to understand relationships in large and complex datasets meant that practitioners needed a portable way to communicate new ideas and share developments in statistics and data analysis. R's interpreted nature geared to statistical modelling, unmatched graphical tools and package-based extensibility, combined with an open-source existence and purchase price of $0.00, clinched it. The stage was clearly set, and perhaps none of us should be surprised at the result. Indeed, the pedagogical origin has left an obvious legacy: today there is a remarkable ecosystem of introductory and advanced textbooks that treat R as the default medium for learning statistics and data analysis (a wonderful effort by Baruffa 2025, offers a searchable list of well over 400 at https://www.bigbookofr.com). The eight invited articles that follow, which I shall briefly introduce here, offer a set of complementary views from authors who will be familiar to many R users. Together, they span the practical scaffolding that keeps R scalable, the language ideas that give it its character, the interfaces through which many now experience ‘doing R’ and the applied domains where R has not only carried methods into practice but sometimes helped create them. I extend my gratitude to all authors who took the valuable time to share their knowledge and contribute these papers, and to all associate editors and anonymous peer-reviewers who helped shape them into their final forms. I trust all readers will enjoy these articles as much as I have done. Kicking off the issue are Hornik and Ligges (2026), who recount the development of the Comprehensive R Archive Network—CRAN—as the quiet infrastructure enabling R's growth: being not only a repository, but a disciplined mechanism for distribution, documentation and quality control. Although it is by no means the only package repository for R developers (readers will likely be familiar with Bioconductor for instance, see Gentleman et al. 2004), CRAN is arguably the predominant reason that tens of thousands of independent contributions coalesce into something that feels like a single, usable platform. If CRAN represents the ecosystem's infrastructure, the language semantics are its core. Ihaka (2026) revisits the mechanics that underlie everyday R code—environments, scoping and evaluation—showing how R's semantics enable both interactive analysis and powerful abstractions. It is a reminder that R's success is not only social and institutional but also technical: the language was built to support the way statisticians actually work. On that foundation sit the interfaces through which many users encounter R day to day. Wickham (2026) offers a personal account of the ‘tidyverse’ as an interface layer: a philosophy of data analysis expressed through design choices about data structures, verbs and conventions. Tanaka (2026) follows by exploring the design of the tidyverse, identifying interface features that help explain its adoption in teaching and practice, and distilling principles to assist developers in designing their own packages. Beyond data wrangling, R's strengths extend to communication and diagnostic practice. Murrell (2026) explores the often-underappreciated problem of text in statistical graphics, bringing attention to thorny issues of typography, layout and rendering as central to communication rather than mere ornamentation. Li et al. (2026) present tools for automated residual-plot assessment, pairing an R package with a Shiny application to support diagnostics that are both principled and accessible. It is a concrete example of R enabling a package-and-app ecosystem that turns methodological ideas into tools that can be used and taught at scale. These contributions sit squarely in R's long tradition of making analysis, presentation and interpretation part of the same computational workflow. The payoff is perhaps clearest in applied areas, where shared software accelerates shared science. A good example of this is found in one of, if not the, largest contributed package on CRAN by code volume (now actually a family of multiple sub-packages by necessity) known as spatstat—a comprehensive suite of functions for modelling spatial point patterns. Baddeley et al. (2026) highlight how R has served not just as an implementation and dissemination medium for their spatstat behemoth, but as the catalyst for solving various methodological problems in spatial statistics and uncovering new ones. Finally, with R now undeniably a central player in data science in all its forms, it is natural to ask how it sits alongside other languages. With the benefit of decades of experience in computer science and statistical computing, Matloff (2026) engages the inevitable R–Python comparison, pushing the discussion towards the practical questions of workflow, pedagogy and what we actually mean when we talk about ‘data science’. It is easy, in surveying R's scope and reflecting on its impact, to speak only in broad terms of ecosystems, science and institutions. But R's role is also personal: it lives daily in the training of students, the habits of researchers and the work of analysts of all different walks of life. In wrapping up this editorial, I therefore find myself thinking back to the year 2000, where I would make my way to high school in semi-rural Western Australia as a teenager entirely unaware of the existence of Ross and Robert's creation and the way it, along with its maintainers and contributors, would shape my own education and career. If you happened to exist then, I invite you to also think back to ‘the before’…where were you when R was first released? The author has nothing to report.

Open paper page →

Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact	0.50 × 0.4 = 0.20
M · momentum	0.50 × 0.15 = 0.07
V · venue signal	0.50 × 0.05 = 0.03
R · text relevance †	0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.