A document of record.
This document is permalinked and versioned. Researchers may cite a specific version in published work. The methodology does not change without a version increment and a corrigendum notice.
Cite this version →Arbiter indexes business research selectively, not exhaustively. Every included work can be traced to a ranked venue. Four ranking lists define the T1 corpus (FT50, UTD24, ABDC, AJG); a T3 citation-neighbor tier is planned for v1.1.
All journals appearing on at least one of four authoritative ranking lists: the FT50 (Financial Times), UTD24 (University of Texas at Dallas), ABDC (Australian Business Deans Council), and AJG (Association of Business Schools Academic Journal Guide). Approximately 3,230 distinct journals after ISSN de-duplication — ABDC covers all four tiers (A*, A, B, C); AJG-exclusive journals (those not already in ABDC/FT50/UTD24) were added in April 2026. Year-stamped editions for reproducibility: ABDC 2022, FT50 2016, UTD24 2024, AJG 2024.
SSRN preprints from authors affiliated with AACSB-accredited institutions. Working papers are important in finance and economics; they are deferred to v1.1 to allow rigorous quality filtering before inclusion. The scope decision is documented in the product roadmap.
Papers cited by five or more distinct T1 works. This tier is designed to capture foundational methodological and theoretical works published outside ranked business journals — for example, statistical methods papers in psychology or foundational economics works in non-business venues. T3 is not in the v1.0 corpus; it is planned for v1.1.
Conference proceedings, book chapters, dissertations, technical reports, and works from venues on the maintained predatory publisher exclusion list. Retracted works are hidden by default (see §vii). Exclusions can be reviewed in the Refine panel; retracted works can be surfaced for fraud-research use cases.
Every query runs against a pre-indexed full-text search column. A relevance gate removes peripheral matches before scoring. Results are deterministic — the same query returns the same papers every time.
Arbiter runs every query against a stored full-text search column (PostgreSQL GIN index on titles, abstracts, and author metadata). The index is pre-computed at ingest time, so queries return in under one second rather than scanning each record at query time. Searches that match on abstract content rank higher than title-only matches because the abstract signals topical depth.
Papers must reach a minimum relevance score (ts_rank ≥ 0.05) to appear in results. Papers below this threshold matched only a peripheral term — a word in a passing reference, not the paper’s actual focus. The gate is applied in the database before scoring, so the result set contains only papers where the query topic is genuinely present.
The same query always returns the same paper set. Arbiter enforces this with a deterministic SQL ordering (rank descending, then paper ID ascending) combined with a composite score tiebreaker in application code. This makes search results reproducible — a researcher running the same query in three years will retrieve the same papers, assuming the corpus has not been updated. What varies: synthesis text is AI-generated and may differ in phrasing across runs; the factual claims are grounded in the same fixed paper set.
Arbiter returns up to 1,000 papers that pass the relevance gate. All of these appear in the evidence panel and receive stance classification. The synthesis model receives only the top 20 papers by composite evidence score. This keeps synthesis focused on the strongest evidence rather than diluting it with hundreds of peripherally-relevant results.
The composite evidence weight combines four signals: query relevance, citation impact, recency momentum, and venue quality. It is not a citation count. It is not a journal impact factor. Journal prestige is shown on every paper but is deliberately excluded from the composite — the corpus is already the quality gate.
How directly the paper’s title and abstract address the specific query. Derived from ts_rank (PostgreSQL full-text relevance), normalized within the result set so that the most relevant paper in each search scores 1.0. This is the strongest signal in Balanced mode (ε = 0.40) because it answers the question “is this paper actually about what I asked?”
OpenAlex FWCI, normalized with a documented fallback ladder: FWCI → percentile → log-raw-count → 0.5 baseline. FWCI compares a paper’s citation count to other papers from the same field, year, and work type — an empirical study in management is compared to other empirical management studies, not to physics papers. Values above 10.0 are winsorized. Papers fewer than 18 months from publication receive a benefit-of-doubt floor of 0.5.
A time-decay and persistence signal. New papers (under 36 months) receive a benefit-of-doubt floor of 0.50 plus a boost for early citation velocity. Older papers are rewarded for having maintained citation velocity relative to their lifetime average — M captures lasting relevance, not raw age. A 2008 paper still being heavily cited in 2026 scores higher on M than a 2008 paper that peaked in 2012.
OpenAlex source 2yr_mean_citedness, normalized across the corpus. This is the least-weighted signal (δ = 0.05) and exists primarily to break ties between papers with otherwise similar R, F, and M scores. It accounts for the documented phenomenon that papers in higher-prestige venues within a tier receive more citations, controlling for paper quality.
J is computed from the paper’s highest tier across all ranking lists (FT50/UTD24 → 1.00; ABDC A* → 0.80; ABDC A → 0.65; ABDC B → 0.45; ABDC C → 0.25). It appears on the paper detail page so researchers can see which list a journal belongs to. J is deliberately excluded from the composite score: every paper in the corpus already passed a journal-tier filter at ingest, so using J in scoring would double-count prestige and cause high-tier journal papers with marginal relevance to outrank directly-relevant lower-tier papers.
weight = ε·R + β·F + γ·M + δ·V, clamped to [0, 1]| Mode | ε (R) | β (F) | γ (M) | δ (V) | When to use |
|---|---|---|---|---|---|
| Balanced(default) | 0.40 | 0.40 | 0.15 | 0.05 | General business research queries |
| Prestige | 0.25 | 0.55 | 0.15 | 0.05 | Citation-weighted analysis; highly-cited work regardless of topic proximity |
| Impact | 0.30 | 0.50 | 0.15 | 0.05 | Identifying high-influence work regardless of venue |
Coefficient values are v1.0. The formula structure — additive, normalized, four factors, no journal prestige in the composite, no author signals — is locked; only coefficients may change between versions. J (journal prestige) is computed and displayed on every paper; it is not an input to the composite score.
A worked example — Balanced mode
Malmendier, U. & Tate, G. (2008). “Who makes acquisitions? CEO overconfidence and the market’s reaction.” Journal of Financial Economics, 89(1), 20–43. Query: “CEO overconfidence acquisitions”
R = 0.94 (ts_rank normalized; title + abstract directly on-topic)
F = 0.92 (FWCI 8.4, normalized; winsorization cap: 10.0)
M = 0.78 (2008 vintage; strong lasting citation velocity)
V = 0.85 (JFE source 2yr_mean_citedness, 99th percentile)
weight = 0.40×0.94 + 0.40×0.92 + 0.15×0.78 + 0.05×0.85
= 0.376 + 0.368 + 0.117 + 0.043 = 0.904J = 0.90 (FT50 · Journal of Financial Economics) — displayed on the paper detail page; not used in the composite above.
Arbiter generates a structured AI summary of the top 20 papers. The summary is not an opinion — it is a prose account of what the retrieved evidence says, organized by stance. Three reading levels let you choose how much research vocabulary the summary uses.
Arbiter generates synthesis at three reading levels selectable on the answer page. Essentials uses everyday language — no research jargon, just the core finding and what it means. Detailed introduces research terminology with brief explanations and discusses how studies relate to each other. Technical uses discipline-standard vocabulary, methodological detail, and direct engagement with conflicting evidence — suitable for inclusion in a dissertation literature review.
Synthesis is generated by OpenAI’s gpt-4.1-mini model via Vercel AI Gateway. Output is structured using the Vercel AI SDK’s streamObject function with a Zod schema: { headline, groups[{ stance, narrative, papers[] }] }. The headline is one sentence summarizing the key finding and its main caveat. Groups are narrative paragraphs organized by stance — papers that support the query answer, papers that challenge it, and papers that show it depends on conditions. Each group cites specific papers inline.
The synthesis prompt receives a work-type label with each paper: empirical study, literature review, or preprint. The model is instructed to weight these differently: meta-analyses and systematic reviews are highest weight (mention first, most authoritative for existence claims); empirical studies are weighted proportionally to replication count and sample scope; theoretical papers explain mechanisms and boundary conditions but are not treated as evidence that something happens. This prevents the synthesis from overstating single-study findings.
The synthesis model sees only titles, abstracts, authors, years, and work-type labels — not full paper text. It does not access the internet, follow citations, or reason about papers it was not given. The synthesis is a structured summary of the abstracts of the top 20 papers by composite evidence score. It is a starting point for literature engagement, not a substitute for reading primary sources.
Every paper in the result set is classified into one of three stances relative to your query. Seeing papers that challenge your assumption is the product working as intended — not a sign that the evidence is weak.
Every paper is classified into one of three stances relative to the search query. “Supports”: the paper’s findings are consistent with or favor the claim. “Challenges”: the findings contradict or question the claim. “Depends on…”: the relationship is contingent on a moderating variable, context, or condition. Display labels are "Supports" / "Challenges" / "Depends on…"
Stance is classified by a separate model (OpenAI gpt-4.1-nano) in a dedicated request pipeline — it is not done by the synthesis model. Papers are batched 50 per request using generateObject with a Zod schema for structured output. The classifier receives each paper’s title, abstract, and the original search query. Papers without usable abstracts (missing, too short, or Cloudflare-blocked) may not receive a classification; this is why the stance counts sometimes show “N of M classified” when they differ.
Business research findings almost always come with conditions: industry, country, firm size, managerial context, time period. The classifier prompt instructs the model to favor conditional classification when findings are contingent, and the majority of papers in a result set will typically fall into this category. This is not a failure mode — it accurately reflects the nature of management research, which rarely produces unconditional universal claims.
Standard search engines return what you ask for. Arbiter’s synthesis explicitly instructs the model to find and name contradictory evidence. Seeing “6 Challenge” next to “119 Support” is the product working as intended, not a bug. Most novice literature reviews are unreliable because the researcher only found papers that agreed with their hypothesis. Arbiter fights this confirmation bias by making disagreement the same size and the same color as agreement.
Ranking lists change. Arbiter preserves historical membership so that a query made today produces the same result when replicated in three years.
Every ranking list edition is archived as a versioned data file with its effective date. Current editions: FT50 (2016), UTD24 (2024), ABDC (2022). When a new edition is released, Arbiter ingests it as a new version; the prior version remains active until the user or institution migrates explicitly.
A paper’s ranking list membership is recorded at ingest time. If a journal is removed from a list in a subsequent edition, papers already indexed retain their historical membership. A researcher who needs “FT50 as of 2023” can specify that corpus explicitly.
Every citation export includes the methodology version and the list editions effective at query time. This supports the emerging norm in management research of including database-search parameters in methods appendices, supporting full query reproducibility in published work.
Trust is a feature, not a filter. Every paper carries an integrity status. Problematic works are visible — never silently removed — but clearly marked and penalized in evidence weight.
Cross-checked against the Retraction Watch database. Retracted papers are hidden from default search results; a toggle in the Refine panel surfaces them for researchers who study retraction itself. When shown, retracted papers display a danger-red badge and are excluded from composite evidence scoring entirely.
Papers under an active Expression of Concern are visible by default but carry a red EoC badge and a −0.25 penalty applied to their composite evidence weight. The penalty is shown in the weight breakdown panel on the paper detail screen — not silently applied. Researchers can see exactly why a score is lower than expected.
Venues from publishers on a maintained exclusion list — derived from documented predatory publisher indices, updated quarterly — are excluded from T1 ranking eligibility. They may still appear as T3 citation neighbors if cited sufficiently by T1 works, but carry a visible venue-quality warning and receive the minimum V score (0.05).
Arbiter's methodology is not a black box, and it is not infallible. Users can flag any paper on four grounds. Flags are reviewed within five business days.
Incorrect authors, journal, year, or DOI. These are metadata errors typically traceable to source databases (OpenAlex, Crossref) and are corrected upstream and re-ingested.
The paper has been retracted but Arbiter has not yet received the Retraction Watch update. These flags are prioritized for same-day resolution.
A researcher believes the composite weight is materially incorrect — for example, due to a known citation-cartel inclusion or a venue reclassification not yet in the current list edition. These are logged for the quarterly methodology review.
A user identifies a venue not yet on the exclusion list that shows characteristics of predatory publishing. These are reviewed by the methodology team and escalated to the publisher exclusion process if warranted.
An honest account of the boundaries. Researchers should note these limitations alongside their use of Arbiter-retrieved evidence in published work.
Arbiter indexes titles, abstracts, keywords, and author metadata. It does not search the full text of papers. For works where the abstract is absent — common for pre-2000 publications — a “no abstract available” indicator is shown. Researchers requiring full-text search should supplement with publisher databases.
Arbiter does not use author h-index, institutional affiliation, or career-stage signals in evidence weighting. This is a deliberate methodological choice to avoid amplifying citation-cartel behavior. The composite weight is a paper signal, not an author signal.
OpenAlex coverage for papers published before 1990 is substantially less complete than for recent work. Arbiter inherits this limitation. Researchers conducting historical literature reviews should supplement with JSTOR and publisher archives for pre-1990 coverage.
The AI synthesis is a language model output based on retrieved abstracts. The same query may produce slightly different phrasing across runs because the model operates at non-zero temperature. The underlying paper set is fixed and deterministic; the prose summary is not. Synthesis is a starting point for literature engagement, not a substitute for reading primary sources. Citations are verifiable; the synthesis framing should be treated as one reasonable interpretation of the evidence.
T3 citation neighbors — works cited by five or more ranked-journal papers — are not in the v1.0 corpus. This means some important methodological papers (statistics papers in psychology, foundational economics works) may not appear even when directly relevant. Researchers should supplement with discipline-specific databases when foundational methodology coverage matters.
Arbiter provides complete bibliographic coverage across ranked journals from 1990. Abstract coverage is partial, and varies substantially by publisher. A result of 'not found' means not found in our indexed abstracts — not that no paper exists.
Arbiter sources abstracts from OpenAlex, CrossRef, and Semantic Scholar. Several major publishers — including Elsevier, Springer Nature, and SAGE — restrict third-party abstract indexing or serve content behind Cloudflare protections that block automated retrieval. This affects 11 FT50 journals with coverage below 20%, including Accounting, Organizations and Society (4%), Journal of Accounting and Economics (5%), Organizational Behavior and Human Decision Processes (5%), and Journal of Financial Economics (7%).
Works where the abstract is absent are fully indexed by title, authors, venue, year, and citation count. They appear in search results but are matched on metadata only — not on abstract content. This may cause relevant papers to rank lower than their true relevance warrants.
All 49 FT50 journals currently indexed. Coverage varies by publisher — see explanation above. Live per-journal pages available in the journal directory. Data as of April 2026; updates automatically at Stage 4.
| Journal | Papers | Abstract % | Range |
|---|---|---|---|
| Journal of Business Ethics | 9,936 | 10% | 1990–2026 |
| Management Science | 9,538 | 88% | 1990–2026 |
| The American Economic Review | 7,318 | 80% | 1990–2026 |
| Journal of Applied Psychology | 5,235 | 68% | 1990–2026 |
| The Journal of Finance | 5,062 | 67% | 1990–2026 |
| Operations Research | 4,734 | 86% | 1990–2026 |
| Research Policy | 4,592 | 14% | 1990–2026 |
| Journal of Financial Economics | 4,179 | 7% | 1990–2026 |
| Strategic Management Journal | 3,976 | 81% | 1990–2026 |
| Production and Operations Management | 3,827 | 81% | 1992–2026 |
| Organization Studies | 3,639 | 60% | 1990–2026 |
| Academy of Management Journal | 3,574 | 71% | 1990–2026 |
| The Review of Financial Studies | 3,199 | 96% | 1990–2026 |
| Econometrica | 3,161 | 68% | 1990–2026 |
| Journal of Marketing | 3,144 | 81% | 1990–2026 |
| Academy of Management Review | 3,104 | 61% | 1990–2026 |
| Human Relations | 3,085 | 79% | 1990–2026 |
| Journal of Marketing Research | 3,038 | 72% | 1990–2026 |
| The Accounting Review | 3,017 | 87% | 1990–2026 |
| Harvard Business Review | 3,000 | 45% | 1990–2021 |
| Journal of Management | 2,959 | 90% | 1990–2026 |
| Journal of Management Studies | 2,898 | 80% | 1990–2026 |
| Journal of Political Economy | 2,842 | 72% | 1990–2026 |
| Organization Science | 2,725 | 88% | 1990–2026 |
| Human Resource Management | 2,675 | 62% | 1990–2026 |
| Journal of Consumer Research | 2,633 | 85% | 1990–2026 |
| Journal of International Business Studies | 2,513 | 8% | 1990–2026 |
| Journal of Financial and Quantitative Analysis | 2,398 | 98% | 1990–2026 |
| Organizational Behavior and Human Decision Processes | 2,349 | 5% | 1990–2026 |
| Marketing Science | 2,300 | 87% | 1990–2026 |
| Contemporary Accounting Research | 2,267 | 71% | 1990–2026 |
| MIS Quarterly | 2,257 | 83% | 1990–2026 |
| Journal of the Academy of Marketing Science | 2,206 | 9% | 1990–2026 |
| The Review of Economic Studies | 2,175 | 95% | 1990–2026 |
| Administrative Science Quarterly | 2,145 | 42% | 1990–2026 |
| Journal of Operations Management | 2,132 | 66% | 1990–2026 |
| Information Systems Research | 1,925 | 87% | 1990–2026 |
| Journal of Consumer Psychology | 1,852 | 82% | 1992–2026 |
| The Quarterly Journal of Economics | 1,833 | 93% | 1990–2026 |
| Entrepreneurship Theory and Practice | 1,807 | 83% | 1990–2026 |
| Manufacturing and Service Operations Management | 1,717 | 93% | 1999–2026 |
| Accounting, Organizations and Society | 1,699 | 4% | 1990–2026 |
| Journal of Management Information Systems | 1,674 | 86% | 1990–2026 |
| Journal of Accounting and Economics | 1,635 | 5% | 1990–2026 |
| Journal of Business Venturing | 1,633 | 10% | 1990–2026 |
| Journal of Accounting Research | 1,610 | 81% | 1990–2026 |
| Review of Accounting Studies | 1,156 | 17% | 1996–2026 |
| Strategic Entrepreneurship Journal | 649 | 77% | 2007–2026 |
| Review of Finance | 9 | 22% | 2003–2021 |
Journals below 15% abstract coverage are primarily Elsevier, SAGE, or Springer titles that restrict third-party abstract indexing. Bibliographic metadata (title, authors, venue, year, citations) is complete for all journals.
This document is a permalink. When citing Arbiter in a published methods section, cite this version specifically. If the methodology changes, a new version will be issued and this page will be updated with a forward link.
Arbiter. (2026). Arbiter evidence methodology, v1.0. Opus Vita. https://arbiter.ac/methodology/v1.0
@techreport{arbiter2026methodology,
author = {Arbiter},
title = {Arbiter evidence methodology, v1.0},
institution = {Opus Vita},
year = {2026},
month = {April},
url = {https://arbiter.ac/methodology/v1.0},
note = {Version 1.0. Accessed [date].}
}“Literature evidence was retrieved and synthesized using Arbiter (v1.0; Arbiter, 2026), which indexes journals on the FT50, UTD24, ABDC, and AJG ranking lists and applies a composite evidence weight combining query relevance (R), field-weighted citation impact (F), recency momentum (M), and venue quality (V). The Balanced mode (ε=0.40, β=0.40, γ=0.15, δ=0.05) was used. Retracted papers were excluded by default.”
“The methodology page is the trust contract. We update it the way an academic journal updates a corrigendum.”
— Arbiter, v1.0 documentation notes
A printed snapshot of this page, generated at time of release. The live page is always authoritative; the PDF is for archival and offline reference.
Download PDF →