A systematic review of generative AI in education: Empirical insights from a human– AI interaction perspective

Zhiping Liang et al.

British Journal of Educational Technology2026https://doi.org/10.1111/bjet.70055article

AJG 2ABDC A

Weight

0.50

What the paper says

With the increasing application of GenAI in education, researchers and practitioners are paying more and more attention to its effectiveness and impact in teaching and learning. This can be evidenced by the increasing number of literature reviews on this topic published in recent years. However, these literature reviews seldom analysed the effectiveness of GenAI‐powered educational applications from the human–AI interaction perspective, which is a widely recognized critical factor influencing the effectiveness of such GenAI‐powered educational applications. In response, this study systematically reviews 56 empirical studies on the application of GenAI in education. To explicitly address this gap from the human–AI interaction perspective, we analysed the reviewed studies using the AIED‐HCD framework, which conceptualizes three human–AI interaction modes along the dimensions of human control and AI automation, and examined educational contextual factors and educational tasks supported by GenAI to assess how interaction modes vary across teaching and learning contexts. In addition, we conducted a sensitivity analysis to evaluate the robustness of findings across these modes. We demonstrated that, although current educational practices remain cautious towards interaction modes with a high level of AI automation, the mode characterized by both high human control and high AI automation has begun to emerge as a trend, demonstrating promising potential for integrating the respective strengths of humans and AI. Furthermore, sensitivity analysis reveals that many studies lack sufficient detail in their statistical reporting, and the reported effect sizes often fall below the thresholds required for acceptable statistical power. Based on these findings, we recommend: (i) beyond exploring how to improve the practical use of AI automation under human supervision and control, educational researchers and practitioners should carefully choose and implement suitable human–AI interaction settings according to the specific context of use, with higher levels of AI automation applied only when supported by appropriate task design and pedagogical guidance; (ii) researchers should improve methodological transparency by estimating appropriate sample sizes and testing assumptions to ensure the reliability of empirical findings. Practitioner notes What is already known about this topic Generative artificial intelligence can efficiently analyse vast amounts of textual information and perform complex natural language processing and generation tasks, demonstrating powerful language intelligence capabilities. Generative artificial intelligence is increasingly being integrated into various educational systems, and with its exceptional capabilities, its potential to support educational applications is gradually being explored and put into practice. What this paper adds Based on the AIED‐HCD conceptual framework, this study systematically classifies current GenAI‐based empirical research by examining two key dimensions, namely the extent to which human control and automation through GenAI are enabled, and offers a holistic perspective on the current state of research. A sensitivity analysis was conducted to examine whether the empirical evidence reported in current GenAI‐based studies demonstrates sufficient statistical power to support future research and practical implementation. Implications for practice and/or policy Stay continuously informed about the latest advancements in AI technologies and carefully verify their suitability for different interaction modes to enhance educational user experiences and deliver systematically measurable improvements and efficiencies to the educational landscape. Current GenAI‐based empirical studies should include more detailed statistical reporting, such as whether assumption tests were conducted and the specific results of effect sizes, to support empirical evidence and enhance transparency.

Open paper page →

Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact	0.50 × 0.4 = 0.20
M · momentum	0.50 × 0.15 = 0.07
V · venue signal	0.50 × 0.05 = 0.03
R · text relevance †	0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.