Neural networks generalize on low complexity data

Sourav Chatterjee & Timothy Sudijono

Annals of Statistics2026https://doi.org/10.1214/25-aos2570article
AJG 4*ABDC A*
Weight
0.50

Abstract

We show that feedforward neural networks with ReLU activation generalize on low complexity data, suitably defined. Given i.i.d. data generated from a simple programming language, the minimum description length (MDL) feedforward neural network which interpolates the data generalizes with high probability. We define this simple programming language, along with a notion of description length of such networks. We provide several examples on basic computational tasks, such as checking primality of a natural number. For primality testing, our theorem shows the following and more. Suppose that we draw an i.i.d. sample of n numbers uniformly at random from 1 to N. For each number xi, let yi=1 if xi is a prime and 0 if it is not. Then the interpolating MDL network accurately answers, with probability 1−O((lnN)/n), whether a newly drawn number between 1 and N is a prime or not. Note that the network is not designed to detect primes; minimum description learning discovers a network which does so. Extensions to noisy data are also discussed, suggesting that MDL neural network interpolators can demonstrate tempered overfitting.

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.1214/25-aos2570

Or copy a formatted citation

@article{sourav2026,
  title        = {{Neural networks generalize on low complexity data}},
  author       = {Sourav Chatterjee & Timothy Sudijono},
  journal      = {Annals of Statistics},
  year         = {2026},
  doi          = {https://doi.org/https://doi.org/10.1214/25-aos2570},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

Neural networks generalize on low complexity data

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.50 × 0.4 = 0.20
M · momentum0.50 × 0.15 = 0.07
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.