Methodology — Speaker Guide — This Is AGI. Prove Us Wrong.

01 · Statements are the unit of evaluation #

The review board's primary job is to evaluate the statements about AGI that the event is built around — not to evaluate the presentations in isolation. Each event selects a subset of GEA's statements about AGI. Presenters are invited because their work bears on those statements. For each statement, the board publishes a verdict on where grounded science stands on it given what was presented at the event.

This is different from traditional peer review, which evaluates each paper as a standalone unit. Here, presentations are inputs to statement-level evaluation. The statement is what gets judged; presentations are evidence for or against it.

02 · How presenters contribute #

Each presenter makes a scientific case for, against, or orthogonal to one or more of the event's statements. For means their grounded work supports the statement. Against means their grounded work opposes it. Orthogonal means their work bears on the statement from a different angle — for example, reframing the question, pointing out that it presupposes something unproven, or addressing a related but distinct claim.

Presenters must state their position clearly at the opening of their presentation. The audience and the review board both need to know what position is being argued before they can evaluate the argument. See Rule 1.2 and Rule 1.3.

03 · What the review board looks for #

The board is checking whether each presentation meets a scientific grounding standard. Specifically:

Foundations. Does the presentation clearly identify the accepted science or established body of thought it builds on? Physics, neuroscience, biology, cognitive science, complexity science, information theory — whatever the work rests on has to be named and credited.
Correct engagement with foundations. Does the presentation use its foundations accurately? Misrepresenting what a cited field or theory claims fails this criterion even if the new argument is interesting.
Honest about extensions. Is the presentation clear about where the accepted science ends and its novel claims begin? Conflating speculation with established findings fails on intellectual honesty.
Coherent reasoning. Does the argument hold together? Do the conclusions follow from the foundations and premises?
Falsifiable in principle. Are the claims stated in a form that could be shown wrong? Where the claim can't be tested with today's technology, the presentation must describe what would falsify it in the future — what observation, experiment, or counter-argument would refute it. Claims that can't fail under any conditions are not evaluated as scientific claims.

These five criteria apply across all tracks — the scientific tracks, the economics track, the governance track, the law track, the philosophy track, and so on. The specific application differs by discipline, but the underlying standard is the same: identify your foundations, engage with them accurately, acknowledge extensions, reason coherently, and state your claims so they can be shown wrong.

In the Rules

The rule-text version of these five criteria is in Rule 4.2, with track-specific adaptations in Rules 4.3 through 4.11.

04 · The two-level evaluation #

Evaluation happens at two levels.

Level one — per-presentation grounding

Each presentation is evaluated individually against the five criteria above. It receives one of three verdicts: meets the grounding standard, partially meets it, or does not meet it. Presentations that meet or partially meet the standard become inputs to level two. Presentations that do not meet the standard are noted in the Report with the board's reasoning, but do not count toward the statement-level verdict. See Rule 6.1.

Level two — per-statement scientific state

For each of the event's statements, the board synthesizes all the grounded presentations addressing it. The board asks: given the grounded work presented, what is the scientific state of this statement? Where two or more grounded presentations disagree, the board analyzes the disagreement — is it empirical, definitional, methodological, or foundational — and describes what would resolve it.

The board publishes a scientific state verdict for each statement:

Scientifically supported — grounded work supports the statement, no grounded opposition was presented.
Scientifically opposed — grounded work opposes the statement, no grounded support was presented.
Scientifically contested — grounded work is presented on both sides, the statement is a live scientific disagreement.
Scientifically unsupported at this event — no contribution met the grounding standard, or no contribution addressed the statement.
Outside scientific reach — the statement cannot be evaluated by scientific means because the relevant science doesn't exist, can't exist, or the statement isn't framed in a way scientific evaluation can address.

The full definitions are in Rule 6.2.c.

05 · What the Report does not do #

The Report does not adjudicate truth. The verdicts describe the scientific state of each statement as addressed at a particular event. A statement judged "scientifically opposed at this event" may still be true — the opposition simply prevailed among the grounded contributions at that event. A statement judged "scientifically supported" may still be wrong; later events may surface opposition that wasn't presented.

This is why there is an Annual AGI Report. Each City AGI Report is one event's evaluation. The Annual Report tracks how the scientific state of each statement evolves across cities and across time — which statements hold up under repeated scrutiny, which flip from supported to contested as more work is presented, which remain unsupported because no one has addressed them with grounded science yet.

The Report filters for scientific seriousness. It does not filter for ultimate correctness. No one has a theory of intelligence strong enough to make that call. See Rule 6.2.d and Rule 9.2.

06 · Why this methodology #

This methodology is closer to scientific assessment than to traditional peer review. Scientific assessment is the model used by the IPCC for climate, the Cochrane Collaboration for medicine, and NIH consensus development conferences. These bodies evaluate the state of scientific work on specific questions by synthesizing across the available evidence and issuing structured assessments.

Traditional peer review evaluates each paper as a standalone unit. It works well for incremental research in mature fields with established consensus on what counts as progress. AGI is not such a field. There is no consensus on what intelligence is, what would count as a step toward AGI, or which approaches are serious. In that environment, evaluating papers one at a time produces a pile of judgments rather than a picture of where the field stands.

Scientific assessment produces the picture. That is what the Reports are for — a structured, honest landscape map of where grounded scientific work on AGI actually is, one statement at a time, updated across cities and across years.

← Introduction The Rules →