How the review board evaluates statements and presentations
Version 1.0 — April 2026
The review board's primary job is to evaluate the statements about AGI that the event is built around — not to evaluate the presentations in isolation. Each event selects a subset of GEA's statements about AGI. Presenters are invited because their work bears on those statements. For each statement, the board publishes a verdict on where grounded science stands on it given what was presented at the event.
This is different from traditional peer review, which evaluates each paper as a standalone unit. Here, presentations are inputs to statement-level evaluation. The statement is what gets judged; presentations are evidence for or against it.
Each presenter makes a scientific case for, against, or orthogonal to one or more of the event's statements. For means their grounded work supports the statement. Against means their grounded work opposes it. Orthogonal means their work bears on the statement from a different angle — for example, reframing the question, pointing out that it presupposes something unproven, or addressing a related but distinct claim.
Presenters must state their position clearly at the opening of their presentation. The audience and the review board both need to know what position is being argued before they can evaluate the argument. See Rule 1.2 and Rule 1.3.
The board is checking whether each presentation meets a scientific grounding standard. Specifically:
These five criteria apply across all tracks — the scientific tracks, the economics track, the governance track, the law track, the philosophy track, and so on. The specific application differs by discipline, but the underlying standard is the same: identify your foundations, engage with them accurately, acknowledge extensions, reason coherently, and state your claims so they can be shown wrong.
Evaluation happens at two levels.
Each presentation is evaluated individually against the five criteria above. It receives one of three verdicts: meets the grounding standard, partially meets it, or does not meet it. Presentations that meet or partially meet the standard become inputs to level two. Presentations that do not meet the standard are noted in the Report with the board's reasoning, but do not count toward the statement-level verdict. See Rule 6.1.
For each of the event's statements, the board synthesizes all the grounded presentations addressing it. The board asks: given the grounded work presented, what is the scientific state of this statement? Where two or more grounded presentations disagree, the board analyzes the disagreement — is it empirical, definitional, methodological, or foundational — and describes what would resolve it.
The board publishes a scientific state verdict for each statement:
The full definitions are in Rule 6.2.c.
The Report does not adjudicate truth. The verdicts describe the scientific state of each statement as addressed at a particular event. A statement judged "scientifically opposed at this event" may still be true — the opposition simply prevailed among the grounded contributions at that event. A statement judged "scientifically supported" may still be wrong; later events may surface opposition that wasn't presented.
This is why there is an Annual AGI Report. Each City AGI Report is one event's evaluation. The Annual Report tracks how the scientific state of each statement evolves across cities and across time — which statements hold up under repeated scrutiny, which flip from supported to contested as more work is presented, which remain unsupported because no one has addressed them with grounded science yet.
The Report filters for scientific seriousness. It does not filter for ultimate correctness. No one has a theory of intelligence strong enough to make that call. See Rule 6.2.d and Rule 9.2.
This methodology is closer to scientific assessment than to traditional peer review. Scientific assessment is the model used by the IPCC for climate, the Cochrane Collaboration for medicine, and NIH consensus development conferences. These bodies evaluate the state of scientific work on specific questions by synthesizing across the available evidence and issuing structured assessments.
Traditional peer review evaluates each paper as a standalone unit. It works well for incremental research in mature fields with established consensus on what counts as progress. AGI is not such a field. There is no consensus on what intelligence is, what would count as a step toward AGI, or which approaches are serious. In that environment, evaluating papers one at a time produces a pile of judgments rather than a picture of where the field stands.
Scientific assessment produces the picture. That is what the Reports are for — a structured, honest landscape map of where grounded scientific work on AGI actually is, one statement at a time, updated across cities and across years.