Using multilabel classification neural network to detect intersectional DIF with small sample sizes

Yale Quan & Chun Wang

British Journal of Mathematical and Statistical Psychology2026https://doi.org/10.1111/bmsp.70041article
ABDC B
Weight
0.50

Abstract

This study introduces InterDIFNet, a multilabel classification neural network for detecting intersectional differential item functioning (DIF) in educational and psychological assessments, with a focus on small sample sizes. Unlike traditional marginal DIF methods, which often fail to capture the effects of intersecting identities and require large samples, InterDIFNet models uniform and non-uniform DIF across multiple intersectional groups simultaneously and utilizes an optimized thresholding procedure to balance power and Type 1 error control. A Monte Carlo simulation compared InterDIFNet to the Truncated Lasso Penalty (TLP) test and other intersectional DIF methods across varying sample sizes, numbers of groups and DIF prevalence rates. Results show that when trained using TLP features, InterDIFNet consistently achieved higher power than TLP while maintaining comparable Type 1 error control, particularly in scenarios with three or more intersectional groups. An empirical application to real assessment data further demonstrated the method's practical utility. InterDIFNet provides a scalable, data-driven solution for identifying intersectional DIF across multiple small sample groups.

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.1111/bmsp.70041

Or copy a formatted citation

@article{yale2026,
  title        = {{Using multilabel classification neural network to detect intersectional DIF with small sample sizes}},
  author       = {Yale Quan & Chun Wang},
  journal      = {British Journal of Mathematical and Statistical Psychology},
  year         = {2026},
  doi          = {https://doi.org/https://doi.org/10.1111/bmsp.70041},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

Using multilabel classification neural network to detect intersectional DIF with small sample sizes

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.50 × 0.4 = 0.20
M · momentum0.50 × 0.15 = 0.07
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.