The full research protocol

Evidence
under adversarial review.

The end-to-end methodology behind every claim audit, peptide grade, and source breakdown. Written to be executable by a human editorial team or by an AI workflow with multiple independent reviewers, adversarial passes, and a final arbitration layer.

The shorter, reader-facing summary lives on /methodology. This page is the full protocol — what we run, in what order, with what gates.

§ 01

TL;DR

Peptigrade should not run on a single “smart model” reading a few papers and improvising a verdict. The protocol is designed around:

duplicate claim decomposition
duplicate and transparent search planning and screening
duplicate independent extraction
explicit risk-of-bias appraisal
adversarial challenge from multiple postures
arbitration with abstention when evidence is thin
full provenance from verdict back to source text

The default failure mode is not“confident answer from incomplete evidence.” The default failure mode is UNVALIDATED, SPECULATIVE, or LOW confidence until the workflow earns a stronger conclusion.

§ 02

Scope — what this protocol is for

Use this protocol for:

Claim audits — a public claim or popular framing is adjudicated against the literature.
Peptide × outcome grades — a molecule is graded for a specific use, not as a blanket product.
Source breakdowns — a podcast, post, paper, or newsletter is decomposed and traced upstream.

Do not use this protocol to:

provide medical advice
recommend dosing to an individual
infer legality from incomplete regulatory information
validate claims from memory without source retrieval

§ 03

Scientific scaffolding

This protocol is stitched from established frameworks. We do not adopt any one of them wholesale; we borrow the parts that are useful and explicit.

Framework	What it contributes here
GRADE	Certainty-of-evidence logic: downgrade for risk of bias, inconsistency, indirectness, imprecision, and publication bias.
Cochrane RoB 2	Domain-based risk-of-bias assessment for randomized trials.
ROBINS-I style reasoning	Structured bias assessment for non-randomized human studies when RCTs do not exist.
PRISMA 2020	Transparent reporting of search, screening, inclusion, exclusion, and evidence flow.
Cochrane duplicate extraction	Independent screening and extraction by at least two reviewers, with predefined disagreement resolution.
SciFact / FEVER	Atomic claim decomposition, rationale-linked verification, and explicit support / contradiction / not-enough-information logic.
FEVER adversarial evaluation	Stress-testing verdicts against omitted evidence, perturbations, and brittle reasoning.
PubPeer / Retraction Watch / registries	Post-publication integrity checks, retraction status, and protocol / registration cross-checks.

Two principles matter most for AI execution:

Systematic-review discipline beats model eloquence.
Adversarial review beats single-pass confidence.

§ 04

Core design principles

These are non-negotiable if the protocol is executed by LLMs.

§ I
Evidence first, never memory first
No model may assert that a claim is supported, contradicted, or untested unless the underlying sources were actually retrieved and logged. A model may use prior knowledge to suggest search terms, but not to decide the verdict.
§ II
Separate retrieval, extraction, and judgment
The same pass that retrieves sources should not be trusted to synthesize them without review. Retrieval, extraction, bias appraisal, and final adjudication should be split into distinct steps or agents.
§ III
Require duplicate independent passes for subjective steps
Claim decomposition, search planning, screening, extraction, and final synthesis all contain judgment. They must be run independently by at least two reviewers or two isolated model contexts before arbitration.
§ IV
Force adversarial postures
At least one reviewer must make the strongest defensible case for the claim; at least one must make the strongest case against. A third reviewer looks for mismatches, omissions, and overgeneralization.
§ V
Do not let mechanism substitute for efficacy
Mechanistic plausibility can raise interest. It cannot validate a human efficacy claim on its own.
§ VI
Human claims require human evidence
Animal and in-vitro evidence can support mechanistic or preclinical sub-claims. They cannot validate a claim about efficacy, safety, or speed of effect in humans.
§ VII
Reviews summarize; primary studies decide
Systematic reviews and meta-analyses are high-value synthesis sources, but they do not replace reading the primary studies that drive the conclusion when a claim is contested or high stakes.
§ VIII
Abstention is a valid result
If the search is incomplete, sources conflict, route/dose/population mismatch is large, or evidence is sparse, the workflow should stop at UNVALIDATED, SPECULATIVE, CONTESTED, or Low confidence rather than force a stronger answer.
§ IX
Every public sentence must be traceable
Every sentence in a verdict summary must map to cited sources, specific evidence snippets or extracted fields, and a documented reasoning step in the audit record.

§ 05

Machine-operable artifacts

An AI workflow should emit these artifacts for every reviewed claim.

Artifact	Purpose	Minimum contents
Decomposition packets	Preserves independent framing before reconciliation	Decomposer A output, Decomposer B output, qualifier diffs, reconciliation notes
Claim manifest	Defines what is being tested	Original claim, normalized claim, population, intervention, comparator, outcome, route, time horizon, claim type, scope key
Search plans	Makes query coverage auditable before execution	Plan A, Plan B / contradiction-seeking plan, query families, required source classes, completeness checklist
Search ledger	Makes retrieval auditable	Databases, queries, dates, filters, PRISMA-style counts, missing full texts, citation-chasing log
Screening log	Shows inclusion / exclusion decisions	Candidate source, include/exclude decision, reason, reviewer IDs
Study inventory	Deduplicated source list	PMID / DOI / registry ID, title, design, year, status, duplicate-linking across reports
Evidence cards	Structured study extraction	Population, route, dose, comparator, endpoints, timepoint, effect size, adverse events, exact snippets
Bias cards	Structured quality appraisal	RoB 2 / ROBINS-I domains, integrity flags, funding, registration, protocol deviations
Contradiction map	Prevents one-sided synthesis	Which sources support, contradict, or merely mention each sub-claim, and why
Adversarial review log	Preserves disagreement	Steelman, skeptic, mismatch, omitted-evidence cases, unresolved disputes
Decision log	Shows how the verdict was reached	Status, confidence, decisive studies, unresolved uncertainty, escalation notes
Public claim sentences	Enforces sentence-level provenance	Sentence text, source evidence cards, snippet IDs, allowed strength
Protocol compliance report	Makes completeness machine-checkable	Stage, required artifact, gate result, failure response, sign-off
Publication packet	What ships	Final summary, citations, status, confidence, claim-review JSON-LD, last verified date, linter result

Note

If the workflow cannot produce these artifacts, it has not completed the protocol.

§ 06

Claim unit & normalization

Duplicate claim decomposition

Before any retrieval runs, the workflow produces two independent packets — Claim Decomposition A and Claim Decomposition B — splitting the source material into atomic sub-claims, assigning qualifiers, and stating what evidence would count as direct support or contradiction.

The reconciliation step is deterministic:

compare claim boundaries
compare population / route / outcome / time qualifiers
compare claim-type assignments
merge exact duplicates
escalate materially different framings instead of averaging them away

If the two decomposers disagree on what the claim is actually asserting, the workflow has not earned the right to search yet.

Atomic claim extraction

A claim is the smallest assertion that can be tested against evidence. Long sentences should be split until each sub-claim has a single testable burden.

Yes

Good atomic claims

“BPC-157 improves tendon healing in rodent Achilles tendon injury models.”
“BPC-157 improves tendon healing in humans with chronic tendinopathy.”
“BPC-157 acts through FAK-paxillin signaling in tendon fibroblasts.”

Bad atomic claims

“BPC-157 is the Wolverine peptide.”
“BPC-157 heals everything fast and safely.”

Every atomic claim is normalized into this frame:

type ClaimManifest = {
  originalText: string;
  normalizedText: string;
  claimType:
    | "empirical"
    | "mechanistic"
    | "comparative"
    | "quantitative"
    | "predictive"
    | "definitional";
  population?: string;
  intervention?: string;
  comparator?: string;
  outcome?: string;
  route?: string;
  timeHorizon?: string;
  scopeKey?: string;
  scopeNotes?: string[];
};

For internal evidence review, the meaningful unit is usually closer to peptide × outcome × population × route × time horizon. Public pages may collapse presentation for readability, but the protocol retains these qualifiers internally. If a qualifier is implicit or unknown, it is marked as such and certainty is capped.

§ 07

Search & screening protocol

Duplicate and adversarial search planning

Search execution does not begin from a single planner’s instincts. Each finalized ClaimManifest produces:

Search Plan A — direct-support planner optimized to find the best on-point evidence.
Search Plan B / contradiction-seeking plan — planner optimized to find null, contradictory, safety, integrity, and scope-mismatch evidence.

These plans are reconciled deterministically. The reconciliation verifies that direct-support and contradiction queries exist for every central sub-claim, that safety / integrity / registry / regulatory checks are present where relevant, that synonyms and scope qualifiers are not missing, and that query coverage is logged before execution.

type SearchPlan = {
  subClaimId: string;
  queryFamilies: Array<{
    family:
      | "direct"
      | "contradiction"
      | "safety"
      | "integrity"
      | "registry"
      | "regulatory";
    query: string;
    rationale: string;
  }>;
  requiredSourceClasses: string[];
  coverageChecklist: string[];
};

Retrieval completeness checklist

For each sub-claim, the planner explicitly considers whether the query set covers:

canonical peptide name
synonyms, aliases, development codes
common misspellings and transliterations
route terms when route matters
outcome and endpoint synonyms
population / indication / disease terms
comparator or control terminology
dose or exposure terminology
human clinical terms
animal model terms for preclinical sub-claims
target / pathway terms for mechanistic sub-claims
negative, null, failed, no-effect terms
safety and adverse-event terms
retraction, correction, expression-of-concern terms
registry identifiers and trial terminology
regulatory body names and status terminology

“Not searched because irrelevant” is acceptable. Silent omission is not.

Databases and source classes

At minimum, the workflow searches or checks:

PubMed / MEDLINE
trial registries (ClinicalTrials.gov and regional equivalents)
Crossref or DOI resolution
PubPeer
Retraction Watch or equivalent retraction source
relevant regulatory bodies (FDA, EMA, WADA, etc.)

Preprints may be included only when methodology is inspectable, no peer-reviewed equivalent exists, and the verdict is explicitly capped for certainty.

Citation chasing and record linkage

For central claims and decisive studies, retrieval is not complete after keyword search alone. The workflow performs and logs:

backward citation review of decisive reviews and pivotal primary studies
forward citation review of decisive primary studies
registry-to-publication matching for registered trials
duplicate-report linking across abstracts, articles, supplements, and registry records
dead-end logging when cited primary sources cannot be retrieved in full text

PRISMA-style search ledger

type SearchDatabaseRecord = {
  database: string;
  queryFamily:
    | "direct"
    | "contradiction"
    | "safety"
    | "integrity"
    | "registry"
    | "regulatory";
  query: string;
  searchedAt: string;
  filters?: string[];
  retrieved: number;
};

type SearchLedger = {
  subClaimId: string;
  databases: SearchDatabaseRecord[];
  totalRetrieved: number;
  afterDeduplication: number;
  screened: number;
  excluded: number;
  included: number;
  fullTextUnavailable: number;
  decisiveSourcesMissingFullText: string[];
  citationChasing: {
    backwardFrom: string[];
    forwardFrom: string[];
    registryMatched: string[];
    duplicateReportsLinked: string[];
  };
};

Screening rules

Screening runs independently in two isolated contexts.

Yes

Include when a source

directly tests the normalized claim
informs a necessary upstream premise
is needed to resolve contradiction or integrity concerns

Exclude when a source

is clearly off-topic
only repeats another source without new data
is vendor marketing copy posing as research
is anecdotal context that cannot raise evidence quality

Every exclusion must have a reason code. Screening is not complete until A/B decisions reconcile, exclusion reason codes are finalized, PRISMA counts reconcile to the deduplicated inventory, and decisive full-text gaps are logged.

§ 08

Study extraction

Duplicate extraction

Study characteristics and outcome data are extracted independently by two reviewers. This is especially important for primary outcomes, adverse events, effect sizes, route and dose, population definitions, and follow-up duration.

For AI workflows, “independent” means the second extractor must not see the first extractor’s filled form before submitting its own.

Evidence-card fields

type EvidenceCard = {
  studyId: string;
  design:
    | "systematic_review"
    | "meta_analysis"
    | "rct"
    | "nonrandomized_human"
    | "case_series"
    | "animal"
    | "in_vitro"
    | "preprint"
    | "registry";
  population: string;
  n?: number;
  route?: string;
  dose?: string;
  comparator?: string;
  endpoint: string;
  endpointType: "clinical" | "surrogate" | "mechanistic" | "safety";
  timepoint?: string;
  resultSummary: string;
  effectSize?: string;
  adverseEvents?: string;
  exactQuotesOrSnippets: string[];
};

The extractor must label whether the endpoint is clinical, surrogate, mechanistic, or safety. This prevents mechanistic wins from being mistaken for clinical wins.

§ 09

Bias & integrity assessment

Study-design ladder

When evidence conflicts, we prefer:

systematic reviews / meta-analyses of high-quality human trials
registered, peer-reviewed human RCTs
non-randomized human comparative studies
case series / uncontrolled pilots
animal in-vivo studies
in-vitro / cell / mechanistic studies
expert opinion without new data
anecdote / testimony

Domain-based bias appraisal

For randomized trials we use RoB 2 style domains:

randomization process
deviations from intended interventions
missing outcome data
outcome measurement
selective reporting

For non-randomized human studies we use ROBINS-I style reasoning:

confounding
participant selection
intervention classification
deviations from intended interventions
missing data
outcome measurement
selective reporting

For every included study, we also flag:

preregistration or registry record present / absent
funding source and sponsor
author conflicts
replication status
retraction status
PubPeer or major integrity concerns

Integrity override

Note

A source with serious integrity concerns is not silently averaged into the body of evidence. It is flagged, discussed explicitly, and down-weighted or excluded by rule.

§ 10

Adversarial review architecture

This is the part that makes the protocol AI-ready instead of AI-flavored.

Required roles

Role	Task
Decomposer A	Independently splits the source claim into atomic sub-claims and normalizes them
Decomposer B	Independently repeats decomposition without seeing A
Search planner A	Builds a direct-support query plan for each sub-claim
Search planner B / contradiction-seeking planner	Builds null, contradictory, safety, integrity, and scope-mismatch queries
Retriever / query executor	Runs the reconciled search plan and builds the search ledger
Screening reviewer A	Independently screens candidate records
Screening reviewer B	Independently screens the same records without seeing A
Extractor A	Independently extracts structured evidence cards
Extractor B	Independently extracts the same studies without seeing A
Bias auditor	Scores risk of bias and integrity concerns
Steelman reviewer	Makes the strongest defensible case that the claim is true
Skeptic reviewer	Makes the strongest defensible case that the claim is false or overstated
Mismatch reviewer	Looks for population / route / dose / endpoint / timeframe mismatches
Omitted-evidence reviewer	Searches specifically for null, contradictory, or inconvenient evidence
Arbitrator	Resolves disagreements, assigns status, assigns confidence, writes the decision log
Publication linter	Runs deterministic wording and provenance checks before anything ships

If a single model is used for all roles, contexts are isolated and prior outputs are not revealed until arbitration. Better still: heterogeneous models or at least heterogeneous prompting and retrieval contexts.

Required adversarial questions

Before a verdict is finalized, the workflow must answer:

What is the strongest evidence for the claim?
What is the strongest evidence against the claim?
Are the supportive studies actually testing the same population, route, dose, comparator, and endpoint as the claim?
Are supportive results clinical outcomes or only mechanistic / surrogate outcomes?
Are contradictory or null studies being omitted because they are harder to explain?
Are reviews being used to smuggle in unsupported primary-study conclusions?
Does the claim generalize beyond the studied tissue, route, timeframe, or population?
Is the verdict being driven by one lab, one paper, or one uncontrolled pilot?
Is there any integrity signal that should cap certainty regardless of apparent effect?

Overstatement check

Many public claims are not cleanly true or false. They are partly grounded, then overstated. For composite claims, the workflow computes the formal status of each sub-claim and whether the public framing overgeneralizes the evidence.

OVERSTATED exists precisely so a composite claim is not collapsed into VALIDATED or FALSIFIED when the fairer answer is “the underlying phenomenon exists in narrow scope, but the public framing outruns the evidence.”

§ 11

Unified workflow

Every claim audit runs through these stages in order.

Stage 01
Run duplicate claim decomposition
Input
source artifact · verbatim claim text · target population (if known)
Output
Decomposition A · Decomposition B · reconciled claim manifest · atomic sub-claims
Gate
every sub-claim is testable; route / outcome / population explicit; material framing differences reconciled or escalated
Stage 02
Duplicate and adversarial search planning
Input
reconciled claim manifest · atomic sub-claims
Output
Search Plan A · Search Plan B / contradiction-seeking plan · reconciled query set
Gate
direct-support and contradiction queries for every central sub-claim; coverage checklist logged; required source classes explicit
Stage 03
Transparent retrieval and screening
Input
normalized sub-claims · reconciled search plan
Output
search ledger · candidate sources · screening log
Gate
search terms saved; contradictory-evidence search explicit; A/B screening reconciled; PRISMA counts reconcile; full-text gaps logged
Stage 04
Trace provenance recursively
Input
screened source set · each claim
Output
classification per claim: Direct · Inherited · Speculative
Gate
primary source reached or dead end recorded; inherited claims not credited as direct evidence
Stage 05
Extract evidence in duplicate
Input
included studies
Output
reconciled evidence cards
Gate
discrepancies reconciled or escalated; multiple reports merged; exact snippets back every important extracted claim
Stage 06
Appraise bias and integrity
Input
evidence cards
Output
bias cards with design classification, domain-based assessment, integrity check, replication note
Gate
the workflow can explain why higher- and lower-quality studies were weighted differently
Stage 07
Build support and contradiction maps
Input
evidence cards · bias cards
Output
per sub-claim: supports · contradicts · mentions — plus directness and evidence-type annotations
Gate
every sub-claim has a contradiction map, even if empty
Stage 08
Run adversarial review
Input
all prior artifacts
Output
steelman, skeptic, mismatch, and omitted-evidence reviews
Gate
at least one adversarial pass challenged the favored interpretation; unresolved disagreements logged, not buried
Stage 09
Adjudicate status and confidence
Input
evidence cards · bias cards · maps · adversarial findings
Output
sub-claim statuses · overall status · confidence
Gate
final summary does not outrun the evidence cards; confidence capped by unresolved gaps, directness problems, or integrity concerns
Stage 10
Publish with provenance and linting
Input
decision log
Output
verdict summary · what you can / cannot say · citations · evidence chain · last verified date · structured data · sentence-level provenance · linter report
Gate
every public sentence maps to evidence cards and snippets; linter checks pass or are explicitly waived by policy; wording does not exceed allowed strength

§ 12

Protocol compliance matrix

The minimum machine-checkable completion spec for a run.

Stage	Required artifact	Acceptance gate	Failure response
Claim decomposition	DecompositionPacket[], ClaimManifest	Central claims have population, intervention, outcome, route, and time qualifiers where relevant	Re-enter decomposition
Search planning	SearchPlan[]	Direct and contradiction-seeking queries logged; coverage checklist complete	Re-enter search planning
Retrieval and screening	SearchLedger, ScreeningLog	A/B screening reconciled; PRISMA counts reconcile; full-text gaps logged	Re-enter retrieval or screening
Provenance tracing	Provenance trace records	Central claims reach primary source or dead end is logged	Re-enter provenance tracing
Extraction	EvidenceCard[]	Critical fields reconciled; exact snippets captured	Re-enter extraction reconciliation
Bias and integrity	BiasCard[]	Decisive sources receive bias and integrity review	Re-enter bias appraisal
Contradiction mapping	ContradictionMap	Every sub-claim mapped to support, contradiction, or mention	Re-enter contradiction mapping
Adversarial review	AdversarialReviewLog	Steelman, skeptic, mismatch, and omitted-evidence passes complete	Re-enter adversarial review
Arbitration	DecisionLog	Verdict traceable to evidence cards and capped by rule	Re-enter adjudication or escalate
Publication	PublicClaimSentence[], PublicationPacket, linter output	Every sentence has provenance; wording checks pass	Re-enter publication packet only

§ 13

Publication & provenance controls

Sentence-level provenance is a first-class object, not a formatting afterthought.

type PublicClaimSentence = {
  sentenceId: string;
  text: string;
  sourceEvidenceCardIds: string[];
  sourceSnippetIds: string[];
  allowedStrength:
    | "directly_supported"
    | "qualified_support"
    | "context_only"
    | "not_allowed";
};

The publication packet refuses to ship sentences marked not_allowed or sentences whose wording exceeds the allowed strength of their linked evidence.

Publication linter

Before publication, deterministic wording checks run:

no dosing advice
no individualized medical advice
no unsupported safety reassurance
no “proven,” “clinically established,” or “safe” unless policy and evidence status allow it
no human efficacy wording from animal-only or in-vitro evidence
no review article cited as if it were primary evidence
no factual biomedical sentence without citation coverage
no sentence stronger than the weakest central sub-claim permits

Any failure blocks publication or requires an explicit, logged override.

§ 14

Status taxonomy

Eight formal statuses:

Status	Meaning
VALIDATED	Multiple independent, reasonably high-quality sources directly support the claim in the relevant scope.
CONTESTED	Comparable evidence points in different directions and the disagreement is material, not superficial.
UNVALIDATED	The claim has not been tested adequately enough to justify support or falsification.
OVERSTATED	Composite popular framing extends beyond what the underlying evidence supports. Parts may be validated in narrow scope; the framing as stated is not.
FALSIFIED	Better evidence directly contradicts the claim, or central sub-claims are directly false. Requires replicated, positively contradictory evidence.
WITHDRAWN	The supporting source base is retracted or fatally compromised.
DEPENDENT	The claim is only true if one or more upstream premises are true, and those premises remain unsettled.
SPECULATIVE	No traceable source adequately supports the exact proposition.

Decision rules for composite claims

Do not use a naive “worst sub-claim wins” rule. Instead:

identify the central necessary sub-claims
identify the supporting but non-central sub-claims
classify each sub-claim separately
assign the overall status based on the central necessary sub-claims plus any material overstatement

§ 15

Confidence taxonomy

Confidence is separate from status.

Confidence	When to assign
High	Search broad and explicit; evidence directly on point; disagreement limited; major bias concerns resolved; replication status reasonably clear.
Moderate	Core evidence present, but gaps in directness, search completeness, replication, or bias resolution.
Low	Retrieval gaps, dead ends, sparse studies, serious indirectness, or unresolved integrity problems materially weaken certainty.

Confidence is judged on five axes: search completeness, directness, quality / bias profile, consistency, integrity / provenance clarity. The final confidence does not exceed the weakest material axis.

Confidence caps and status ceilings

Condition	Max status	Max confidence	Required action
Human efficacy claim supported only by animal or in-vitro evidence	UNVALIDATED or OVERSTATED	Low	State preclinical scope explicitly
Decisive source lacks full text	Depends	Low	Log the gap and avoid strong verdicts
One uncontrolled human case series only	Usually UNVALIDATED	Low or Moderate	Explain directness and bias limits
Comparable direct support and contradiction exist	CONTESTED	Moderate	Map the contradiction explicitly
Serious unresolved integrity concern	WITHDRAWN, UNVALIDATED, or hold	Low	Down-weight, exclude, or escalate
Review-only support while readable primary studies remain unread	Block verdict	None	Read the primary studies first
Route, dose, population, or time qualifiers missing but central	UNVALIDATED	Low	Narrow the claim or rerun retrieval
Preprint-only decisive support	Depends	Low	Mark status as provisional and cap certainty

§ 16

Hard rules & abstention

Hard rules

No human efficacy claim may be marked VALIDATED on animal or in-vitro evidence alone.
No claim may be marked FALSIFIED merely because support is absent; absence of evidence is usually UNVALIDATED.
No review article may be the sole basis for a decisive status if the primary studies are available.
No final verdict may ignore contradictory evidence found in search.
No study may be counted twice across multiple reports.
No final summary may introduce claims absent from the evidence cards.

Abstention triggers

Cap the verdict at UNVALIDATED or Low confidence when:

full text is not available for decisive sources
the claim depends on route / dose / population details not reported in the literature
only uncontrolled pilots exist
only one lab drives the literature
the protocol / registry record materially conflicts with the paper and cannot be reconciled
serious integrity concerns remain unresolved

§ 17

Calibration & regression testing

Treat the protocol itself as something that must be measured. We track at least:

decisive-source recall
false inclusion rate
false exclusion rate
critical-field extraction accuracy
verdict agreement with expert benchmark sets
rate of overconfident wrong verdicts
disagreement rates across decomposition, search planning, screening, and extraction

Watch

Near-perfect agreement is not automatically a success condition. If supposedly independent reviewers agree almost all the time, verify the pipeline is not leaking context or collapsing into prompt coupling. Thresholds are calibrated empirically against benchmark claims — not treated as universal constants.

§ 18

Operating modes

Both modes use the same machinery above.

Path 1

Forward · Claim → Evidence

Start with a public claim. Decompose it, retrieve evidence, challenge it adversarially, then publish a claim audit.

Entry: A claim such as 'BPC-157 has wolverine-like effects'
Output: /claims/[slug]

Path 2

Backward · Artifact → Recursive origins

Start with an artifact. Decompose every claim inside it, trace each upstream, and show which parts are direct, inherited, speculative, or overstated.

Entry: A podcast, newsletter, paper, or post
Output: /breakdowns/[slug]

§ 19

Practical implementation notes for LLM workflows

Independence

If you want fair adversarial review, do not let later reviewers inherit earlier reviewers’ conclusions. Shared context creates correlated errors.

Deterministic checks

Use non-LLM validation wherever possible for:

DOI / PMID resolution
deduplication
publication dates
registry status
retraction status
schema validation of outputs

Calibration

Prompt the arbitrator to justify why a stronger status was earned rather than defaulting to the weaker one. If that justification is thin, downgrade.

Escalation

Escalate to human review when:

the claim is legally or medically high stakes
the studies are highly technical and difficult to normalize
the adversarial reviewers disagree on central sub-claims
integrity concerns are non-trivial

§ 20

Reference anchors

This protocol is grounded in:

GRADE certainty-of-evidence logic
Cochrane RoB 2 and ROBINS-I style bias assessment
PRISMA 2020 reporting discipline
Cochrane duplicate independent extraction practice
SciFact scientific claim verification
FEVER claim verification and FEVER adversarial evaluation
post-publication integrity checks via PubPeer, retraction tracking, and trial registries

These anchors matter because they push the workflow toward transparency over vibes, duplication over single-pass confidence, adversarial challenge over confirmation bias, and abstention over overclaiming.

§ 21

Review cadence

The canonical cadence SLAs — per-trigger response times (same-day, 7-day, 30-day, quarterly, annual) — live in the internal grading-and-reassessment protocol. This is the short version; if the two disagree, the internal protocol wins.

Re-run a claim audit within 30 days when new peer-reviewed evidence materially affects a central sub-claim.
Re-run immediately (same day) when a decisive source is retracted, corrected, or flagged for serious integrity issues.
Re-run same day when a major regulatory change alters the real-world status of the molecule or use case.
Re-run stale claims at least annually even if no trigger fires; quarterly housekeeping audit across the catalog.

Last updated: 2026-04-20

Evidenceunder adversarial review.

TL;DR

Scope — what this protocol is for

Scientific scaffolding

Core design principles

Evidence first, never memory first

Separate retrieval, extraction, and judgment

Require duplicate independent passes for subjective steps

Force adversarial postures

Do not let mechanism substitute for efficacy

Human claims require human evidence

Reviews summarize; primary studies decide

Abstention is a valid result

Every public sentence must be traceable

Machine-operable artifacts

Claim unit & normalization

Duplicate claim decomposition

Atomic claim extraction

Search & screening protocol

Duplicate and adversarial search planning

Retrieval completeness checklist

Databases and source classes

Citation chasing and record linkage

PRISMA-style search ledger

Screening rules

Study extraction

Duplicate extraction

Evidence-card fields

Bias & integrity assessment

Study-design ladder

Domain-based bias appraisal

Integrity override

Adversarial review architecture

Required roles

Required adversarial questions

Overstatement check

Unified workflow

Run duplicate claim decomposition

Duplicate and adversarial search planning

Transparent retrieval and screening

Trace provenance recursively

Extract evidence in duplicate

Appraise bias and integrity

Build support and contradiction maps

Run adversarial review

Adjudicate status and confidence

Publish with provenance and linting

Protocol compliance matrix

Publication & provenance controls

Publication linter

Status taxonomy

Decision rules for composite claims

Confidence taxonomy

Confidence caps and status ceilings

Hard rules & abstention

Hard rules

Abstention triggers

Calibration & regression testing

Operating modes

Forward · Claim → Evidence

Backward · Artifact → Recursive origins

Practical implementation notes for LLM workflows

Independence

Deterministic checks

Calibration

Escalation

Reference anchors

Review cadence

Evidence
under adversarial review.