Methodologyv1.0 · April 2026

A document of record.

How Arbiter weighs evidence.

This document is permalinked and versioned. Researchers may cite a specific version in published work. The methodology does not change without a version increment and a corrigendum notice.

Cite this version →
i.The corpus

What Arbiter indexes.

Arbiter indexes business research selectively, not exhaustively. Every included work can be traced to a ranked venue. Four ranking lists define the T1 corpus (FT50, UTD24, ABDC, AJG); a T3 citation-neighbor tier is planned for v1.1.

T1
Ranked venues

All journals appearing on at least one of four authoritative ranking lists: the FT50 (Financial Times), UTD24 (University of Texas at Dallas), ABDC (Australian Business Deans Council), and AJG (Association of Business Schools Academic Journal Guide). Approximately 3,230 distinct journals after ISSN de-duplication — ABDC covers all four tiers (A*, A, B, C); AJG-exclusive journals (those not already in ABDC/FT50/UTD24) were added in April 2026. Year-stamped editions for reproducibility: ABDC 2022, FT50 2016, UTD24 2024, AJG 2024.

T2
Working papers — deferred to v1.1

SSRN preprints from authors affiliated with AACSB-accredited institutions. Working papers are important in finance and economics; they are deferred to v1.1 to allow rigorous quality filtering before inclusion. The scope decision is documented in the product roadmap.

T3
Citation neighbors — deferred to v1.1

Papers cited by five or more distinct T1 works. This tier is designed to capture foundational methodological and theoretical works published outside ranked business journals — for example, statistical methods papers in psychology or foundational economics works in non-business venues. T3 is not in the v1.0 corpus; it is planned for v1.1.

T4
Excluded works

Conference proceedings, book chapters, dissertations, technical reports, and works from venues on the maintained predatory publisher exclusion list. Retracted works are hidden by default (see §vii). Exclusions can be reviewed in the Refine panel; retracted works can be surfaced for fraud-research use cases.

i.
GIN full-text search

Arbiter runs every query against a stored full-text search column (PostgreSQL GIN index on titles, abstracts, and author metadata). The index is pre-computed at ingest time, so queries return in under one second rather than scanning each record at query time. Searches that match on abstract content rank higher than title-only matches because the abstract signals topical depth.

ii.
Relevance gate

Papers must reach a minimum relevance score (ts_rank ≥ 0.05) to appear in results. Papers below this threshold matched only a peripheral term — a word in a passing reference, not the paper’s actual focus. The gate is applied in the database before scoring, so the result set contains only papers where the query topic is genuinely present.

iii.
Deterministic results

The same query always returns the same paper set. Arbiter enforces this with a deterministic SQL ordering (rank descending, then paper ID ascending) combined with a composite score tiebreaker in application code. This makes search results reproducible — a researcher running the same query in three years will retrieve the same papers, assuming the corpus has not been updated. What varies: synthesis text is AI-generated and may differ in phrasing across runs; the factual claims are grounded in the same fixed paper set.

iv.
Papers sent to synthesis vs. evidence display

Arbiter returns up to 1,000 papers that pass the relevance gate. All of these appear in the evidence panel and receive stance classification. The synthesis model receives only the top 20 papers by composite evidence score. This keeps synthesis focused on the strongest evidence rather than diluting it with hundreds of peripherally-relevant results.

iii.Evidence weight

How each paper earns its score.

The composite evidence weight combines four signals: query relevance, citation impact, recency momentum, and venue quality. It is not a citation count. It is not a journal impact factor. Journal prestige is shown on every paper but is deliberately excluded from the composite — the corpus is already the quality gate.

R
Text relevance to the query

How directly the paper’s title and abstract address the specific query. Derived from ts_rank (PostgreSQL full-text relevance), normalized within the result set so that the most relevant paper in each search scores 1.0. This is the strongest signal in Balanced mode (ε = 0.40) because it answers the question “is this paper actually about what I asked?”

F
Field-weighted citation impact

OpenAlex FWCI, normalized with a documented fallback ladder: FWCI → percentile → log-raw-count → 0.5 baseline. FWCI compares a paper’s citation count to other papers from the same field, year, and work type — an empirical study in management is compared to other empirical management studies, not to physics papers. Values above 10.0 are winsorized. Papers fewer than 18 months from publication receive a benefit-of-doubt floor of 0.5.

M
Recency momentum

A time-decay and persistence signal. New papers (under 36 months) receive a benefit-of-doubt floor of 0.50 plus a boost for early citation velocity. Older papers are rewarded for having maintained citation velocity relative to their lifetime average — M captures lasting relevance, not raw age. A 2008 paper still being heavily cited in 2026 scores higher on M than a 2008 paper that peaked in 2012.

V
Venue tiebreaker

OpenAlex source 2yr_mean_citedness, normalized across the corpus. This is the least-weighted signal (δ = 0.05) and exists primarily to break ties between papers with otherwise similar R, F, and M scores. It accounts for the documented phenomenon that papers in higher-prestige venues within a tier receive more citations, controlling for paper quality.

J
Journal prestige — displayed, not scored

J is computed from the paper’s highest tier across all ranking lists (FT50/UTD24 → 1.00; ABDC A* → 0.80; ABDC A → 0.65; ABDC B → 0.45; ABDC C → 0.25). It appears on the paper detail page so researchers can see which list a journal belongs to. J is deliberately excluded from the composite score: every paper in the corpus already passed a journal-tier filter at ingest, so using J in scoring would double-count prestige and cause high-tier journal papers with marginal relevance to outrank directly-relevant lower-tier papers.

weight = ε·R + β·F + γ·M + δ·V,  clamped to [0, 1]
Modeε (R)β (F)γ (M)δ (V)When to use
Balanced(default)0.400.400.150.05General business research queries
Prestige0.250.550.150.05Citation-weighted analysis; highly-cited work regardless of topic proximity
Impact0.300.500.150.05Identifying high-influence work regardless of venue

Coefficient values are v1.0. The formula structure — additive, normalized, four factors, no journal prestige in the composite, no author signals — is locked; only coefficients may change between versions. J (journal prestige) is computed and displayed on every paper; it is not an input to the composite score.

A worked example — Balanced mode

Malmendier, U. & Tate, G. (2008). “Who makes acquisitions? CEO overconfidence and the market’s reaction.” Journal of Financial Economics, 89(1), 20–43. Query: “CEO overconfidence acquisitions”

R = 0.94  (ts_rank normalized; title + abstract directly on-topic)
F = 0.92  (FWCI 8.4, normalized; winsorization cap: 10.0)
M = 0.78  (2008 vintage; strong lasting citation velocity)
V = 0.85  (JFE source 2yr_mean_citedness, 99th percentile)

weight = 0.40×0.94 + 0.40×0.92 + 0.15×0.78 + 0.05×0.85
       = 0.376 + 0.368 + 0.117 + 0.043 = 0.904

J = 0.90 (FT50 · Journal of Financial Economics) — displayed on the paper detail page; not used in the composite above.

iv.Synthesis

How Arbiter summarizes the evidence.

Arbiter generates a structured AI summary of the top 20 papers. The summary is not an opinion — it is a prose account of what the retrieved evidence says, organized by stance. Three reading levels let you choose how much research vocabulary the summary uses.

i.
Three levels

Arbiter generates synthesis at three reading levels selectable on the answer page. Essentials uses everyday language — no research jargon, just the core finding and what it means. Detailed introduces research terminology with brief explanations and discusses how studies relate to each other. Technical uses discipline-standard vocabulary, methodological detail, and direct engagement with conflicting evidence — suitable for inclusion in a dissertation literature review.

ii.
Model and output structure

Synthesis is generated by OpenAI’s gpt-4.1-mini model via Vercel AI Gateway. Output is structured using the Vercel AI SDK’s streamObject function with a Zod schema: { headline, groups[{ stance, narrative, papers[] }] }. The headline is one sentence summarizing the key finding and its main caveat. Groups are narrative paragraphs organized by stance — papers that support the query answer, papers that challenge it, and papers that show it depends on conditions. Each group cites specific papers inline.

iii.
Evidence type hierarchy

The synthesis prompt receives a work-type label with each paper: empirical study, literature review, or preprint. The model is instructed to weight these differently: meta-analyses and systematic reviews are highest weight (mention first, most authoritative for existence claims); empirical studies are weighted proportionally to replication count and sample scope; theoretical papers explain mechanisms and boundary conditions but are not treated as evidence that something happens. This prevents the synthesis from overstating single-study findings.

iv.
What the model does not do

The synthesis model sees only titles, abstracts, authors, years, and work-type labels — not full paper text. It does not access the internet, follow citations, or reason about papers it was not given. The synthesis is a structured summary of the abstracts of the top 20 papers by composite evidence score. It is a starting point for literature engagement, not a substitute for reading primary sources.

v.Stance classification

How Arbiter labels agreement and disagreement.

Every paper in the result set is classified into one of three stances relative to your query. Seeing papers that challenge your assumption is the product working as intended — not a sign that the evidence is weak.

i.
Three categories

Every paper is classified into one of three stances relative to the search query. “Supports”: the paper’s findings are consistent with or favor the claim. “Challenges”: the findings contradict or question the claim. “Depends on…”: the relationship is contingent on a moderating variable, context, or condition. Display labels are "Supports" / "Challenges" / "Depends on…"

ii.
How classification works

Stance is classified by a separate model (OpenAI gpt-4.1-nano) in a dedicated request pipeline — it is not done by the synthesis model. Papers are batched 50 per request using generateObject with a Zod schema for structured output. The classifier receives each paper’s title, abstract, and the original search query. Papers without usable abstracts (missing, too short, or Cloudflare-blocked) may not receive a classification; this is why the stance counts sometimes show “N of M classified” when they differ.

iii.
Why “Depends on…” is the default

Business research findings almost always come with conditions: industry, country, firm size, managerial context, time period. The classifier prompt instructs the model to favor conditional classification when findings are contingent, and the majority of papers in a result set will typically fall into this category. This is not a failure mode — it accurately reflects the nature of management research, which rarely produces unconditional universal claims.

iv.
Why Arbiter surfaces disagreement

Standard search engines return what you ask for. Arbiter’s synthesis explicitly instructs the model to find and name contradictory evidence. Seeing “6 Challenge” next to “119 Support” is the product working as intended, not a bug. Most novice literature reviews are unreliable because the researcher only found papers that agreed with their hypothesis. Arbiter fights this confirmation bias by making disagreement the same size and the same color as agreement.

vi.Versioning

How a 2026 query stays reproducible.

Ranking lists change. Arbiter preserves historical membership so that a query made today produces the same result when replicated in three years.

i.
Year-stamped list editions

Every ranking list edition is archived as a versioned data file with its effective date. Current editions: FT50 (2016), UTD24 (2024), ABDC (2022). When a new edition is released, Arbiter ingests it as a new version; the prior version remains active until the user or institution migrates explicitly.

ii.
Historical membership preserved

A paper’s ranking list membership is recorded at ingest time. If a journal is removed from a list in a subsequent edition, papers already indexed retain their historical membership. A researcher who needs “FT50 as of 2023” can specify that corpus explicitly.

iii.
Version in the citation string

Every citation export includes the methodology version and the list editions effective at query time. This supports the emerging norm in management research of including database-search parameters in methods appendices, supporting full query reproducibility in published work.

vii.Integrity

What Arbiter surfaces on every result.

Trust is a feature, not a filter. Every paper carries an integrity status. Problematic works are visible — never silently removed — but clearly marked and penalized in evidence weight.

i.
Retracted papers

Cross-checked against the Retraction Watch database. Retracted papers are hidden from default search results; a toggle in the Refine panel surfaces them for researchers who study retraction itself. When shown, retracted papers display a danger-red badge and are excluded from composite evidence scoring entirely.

ii.
Expressions of concern

Papers under an active Expression of Concern are visible by default but carry a red EoC badge and a −0.25 penalty applied to their composite evidence weight. The penalty is shown in the weight breakdown panel on the paper detail screen — not silently applied. Researchers can see exactly why a score is lower than expected.

iii.
Predatory publishers

Venues from publishers on a maintained exclusion list — derived from documented predatory publisher indices, updated quarterly — are excluded from T1 ranking eligibility. They may still appear as T3 citation neighbors if cited sufficiently by T1 works, but carry a visible venue-quality warning and receive the minimum V score (0.05).

viii.Flagging

How researchers correct the record.

Arbiter's methodology is not a black box, and it is not infallible. Users can flag any paper on four grounds. Flags are reviewed within five business days.

i.
Wrong attribution

Incorrect authors, journal, year, or DOI. These are metadata errors typically traceable to source databases (OpenAlex, Crossref) and are corrected upstream and re-ingested.

ii.
Retraction not reflected

The paper has been retracted but Arbiter has not yet received the Retraction Watch update. These flags are prioritized for same-day resolution.

iii.
Evidence weight dispute

A researcher believes the composite weight is materially incorrect — for example, due to a known citation-cartel inclusion or a venue reclassification not yet in the current list edition. These are logged for the quarterly methodology review.

iv.
Problematic venue

A user identifies a venue not yet on the exclusion list that shows characteristics of predatory publishing. These are reviewed by the methodology team and escalated to the publisher exclusion process if warranted.

ix.Limitations

What Arbiter does not do.

An honest account of the boundaries. Researchers should note these limitations alongside their use of Arbiter-retrieved evidence in published work.

i.
Full-text search not included

Arbiter indexes titles, abstracts, keywords, and author metadata. It does not search the full text of papers. For works where the abstract is absent — common for pre-2000 publications — a “no abstract available” indicator is shown. Researchers requiring full-text search should supplement with publisher databases.

ii.
Author-level signals not used

Arbiter does not use author h-index, institutional affiliation, or career-stage signals in evidence weighting. This is a deliberate methodological choice to avoid amplifying citation-cartel behavior. The composite weight is a paper signal, not an author signal.

iii.
Coverage before 1990 is partial

OpenAlex coverage for papers published before 1990 is substantially less complete than for recent work. Arbiter inherits this limitation. Researchers conducting historical literature reviews should supplement with JSTOR and publisher archives for pre-1990 coverage.

iv.
Synthesis is probabilistic, not definitive

The AI synthesis is a language model output based on retrieved abstracts. The same query may produce slightly different phrasing across runs because the model operates at non-zero temperature. The underlying paper set is fixed and deterministic; the prose summary is not. Synthesis is a starting point for literature engagement, not a substitute for reading primary sources. Citations are verifiable; the synthesis framing should be treated as one reasonable interpretation of the evidence.

v.
Citation neighbors not included in v1.0

T3 citation neighbors — works cited by five or more ranked-journal papers — are not in the v1.0 corpus. This means some important methodological papers (statistics papers in psychology, foundational economics works) may not appear even when directly relevant. Researchers should supplement with discipline-specific databases when foundational methodology coverage matters.

x.Coverage

What Arbiter can — and cannot — retrieve.

Arbiter provides complete bibliographic coverage across ranked journals from 1990. Abstract coverage is partial, and varies substantially by publisher. A result of 'not found' means not found in our indexed abstracts — not that no paper exists.

149,031
FT50 papers indexed
63%
Abstract coverage (FT50)
23 / 49
Journals with >80% coverage

Why abstract coverage is incomplete

Arbiter sources abstracts from OpenAlex, CrossRef, and Semantic Scholar. Several major publishers — including Elsevier, Springer Nature, and SAGE — restrict third-party abstract indexing or serve content behind Cloudflare protections that block automated retrieval. This affects 11 FT50 journals with coverage below 20%, including Accounting, Organizations and Society (4%), Journal of Accounting and Economics (5%), Organizational Behavior and Human Decision Processes (5%), and Journal of Financial Economics (7%).

Works where the abstract is absent are fully indexed by title, authors, venue, year, and citation count. They appear in search results but are matched on metadata only — not on abstract content. This may cause relevant papers to rank lower than their true relevance warrants.

FT50 per-journal coverage

All 49 FT50 journals currently indexed. Coverage varies by publisher — see explanation above. Live per-journal pages available in the journal directory. Data as of April 2026; updates automatically at Stage 4.

JournalPapersAbstract %Range
Journal of Business Ethics9,93610%1990–2026
Management Science9,53888%1990–2026
The American Economic Review7,31880%1990–2026
Journal of Applied Psychology5,23568%1990–2026
The Journal of Finance5,06267%1990–2026
Operations Research4,73486%1990–2026
Research Policy4,59214%1990–2026
Journal of Financial Economics4,1797%1990–2026
Strategic Management Journal3,97681%1990–2026
Production and Operations Management3,82781%1992–2026
Organization Studies3,63960%1990–2026
Academy of Management Journal3,57471%1990–2026
The Review of Financial Studies3,19996%1990–2026
Econometrica3,16168%1990–2026
Journal of Marketing3,14481%1990–2026
Academy of Management Review3,10461%1990–2026
Human Relations3,08579%1990–2026
Journal of Marketing Research3,03872%1990–2026
The Accounting Review3,01787%1990–2026
Harvard Business Review3,00045%1990–2021
Journal of Management2,95990%1990–2026
Journal of Management Studies2,89880%1990–2026
Journal of Political Economy2,84272%1990–2026
Organization Science2,72588%1990–2026
Human Resource Management2,67562%1990–2026
Journal of Consumer Research2,63385%1990–2026
Journal of International Business Studies2,5138%1990–2026
Journal of Financial and Quantitative Analysis2,39898%1990–2026
Organizational Behavior and Human Decision Processes2,3495%1990–2026
Marketing Science2,30087%1990–2026
Contemporary Accounting Research2,26771%1990–2026
MIS Quarterly2,25783%1990–2026
Journal of the Academy of Marketing Science2,2069%1990–2026
The Review of Economic Studies2,17595%1990–2026
Administrative Science Quarterly2,14542%1990–2026
Journal of Operations Management2,13266%1990–2026
Information Systems Research1,92587%1990–2026
Journal of Consumer Psychology1,85282%1992–2026
The Quarterly Journal of Economics1,83393%1990–2026
Entrepreneurship Theory and Practice1,80783%1990–2026
Manufacturing and Service Operations Management1,71793%1999–2026
Accounting, Organizations and Society1,6994%1990–2026
Journal of Management Information Systems1,67486%1990–2026
Journal of Accounting and Economics1,6355%1990–2026
Journal of Business Venturing1,63310%1990–2026
Journal of Accounting Research1,61081%1990–2026
Review of Accounting Studies1,15617%1996–2026
Strategic Entrepreneurship Journal64977%2007–2026
Review of Finance922%2003–2021

Journals below 15% abstract coverage are primarily Elsevier, SAGE, or Springer titles that restrict third-party abstract indexing. Bibliographic metadata (title, authors, venue, year, citations) is complete for all journals.

xi.Citing

How to cite this document.

This document is a permalink. When citing Arbiter in a published methods section, cite this version specifically. If the methodology changes, a new version will be issued and this page will be updated with a forward link.

i.
APA (7th ed.)

Arbiter. (2026). Arbiter evidence methodology, v1.0. Opus Vita. https://arbiter.ac/methodology/v1.0

ii.
BibTeX
@techreport{arbiter2026methodology,
  author      = {Arbiter},
  title       = {Arbiter evidence methodology, v1.0},
  institution = {Opus Vita},
  year        = {2026},
  month       = {April},
  url         = {https://arbiter.ac/methodology/v1.0},
  note        = {Version 1.0. Accessed [date].}
}
iii.
Suggested methods-section phrasing

“Literature evidence was retrieved and synthesized using Arbiter (v1.0; Arbiter, 2026), which indexes journals on the FT50, UTD24, ABDC, and AJG ranking lists and applies a composite evidence weight combining query relevance (R), field-weighted citation impact (F), recency momentum (M), and venue quality (V). The Balanced mode (ε=0.40, β=0.40, γ=0.15, δ=0.05) was used. Retracted papers were excluded by default.”

“The methodology page is the trust contract. We update it the way an academic journal updates a corrigendum.”

— Arbiter, v1.0 documentation notes
PDFVersion 1.0
April 2026

Download as PDF (versioned to current release)

A printed snapshot of this page, generated at time of release. The live page is always authoritative; the PDF is for archival and offline reference.

Download PDF →