The grading layer
How letters get
assigned and updated.
How every peptide letter is computed, when it gets re-evaluated, and who has to sign off. This is the operational doc — use it when a new paper, claim audit, safety signal, or regulatory event might change a published grade.
The full research-audit process that produces the underlying evidence lives on /methodology/research-protocol.
§ 01
The grading unit
Publicly we present grades as peptide × outcome because it reads cleanly. Internally the grading unit is narrower:
Internal grade key
One public grade, five editorial qualifiers
Behind each public label sits a tighter frame that tells us exactly what evidence belongs in the bucket and what does not.
| Field | Editorial Meaning | Why It Matters |
|---|---|---|
Peptide | Which molecule is actually under review | Not the entire category. One peptide, one evidentiary object. |
Outcome | The specific promise being scored | Tendon healing, glucose control, sleep quality, fat loss, and so on. |
Population | Who the evidence is really about | Healthy adults, IBD patients, older adults, rodent tendon-injury model. |
Route | How the material was given | Oral, intranasal, subcutaneous, local injection. Route changes the story. |
Time horizon | The window in which the claim is supposed to hold | Acute response, 12-week treatment, 6-month follow-up, long-term maintenance. |
“BPC-157 for tendon healing” is not one evidentiary object if the underlying studies differ materially by population, route, or time window. Injectable BPC-157 in adults with musculoskeletal injury is not the same grade object as oral BPC-157 in a rodent gut-inflammation model.
Operational rules
- Always score against an InternalGradeKey, even if the public page collapses to a simpler peptide × outcome label.
- Only collapse multiple internal keys into one public row when evidence is directionally aligned and the collapse does not hide a weaker population, route, or time horizon behind a stronger one.
- If route, population, or time horizon changes the evidentiary story, maintain separate internal records and state the qualifier on the public page.
Watch
Protocols are rated separately. See §16 — we deliberately do NOT borrow the peptide A–F letters for protocols.
§ 02
The letter grades
| Grade | Label | Meaning |
|---|---|---|
| A | Strong | Multiple independent high-quality controlled human trials converge in the same direction. Mechanism is well characterized. Longer-term safety is described in humans. Effect clearly exceeds placebo or comparator. A single positive Phase 2 trial is not enough. |
| B | Promising | At least one well-powered controlled human trial shows clinically meaningful benefit in the relevant internal grading unit. Mechanism is plausible to moderately established. Replication and/or longer-term safety remain limited. |
| C | Mixed / early signal | Some direct human signal exists, but the evidence is incomplete, conflicting, underpowered, uncontrolled, or still leaning heavily on animal/mechanistic support. |
| D | Weak | Evidence is identifiable but weak: animal-only, mechanistic-only, anecdotal, or sparse uncontrolled human signal without convincing controlled confirmation. |
| F | Disproven / unsafe | Decisive human evidence shows no clinically meaningful effect, harm, or unacceptable risk for the studied use. Safety- or efficacy-driven medical regulatory rejection or withdrawal can also support F. |
| Pending | Below threshold | The internal grading unit does not yet have enough peer-reviewed studies to produce a defensible letter. Editorial backfill is in progress. |
| Insufficient | — | Cannot yet be meaningfully graded — claim is underspecified, literature is effectively absent, or decisive sources cannot be verified. Use instead of D when we cannot honestly call the evidence weak because we cannot yet inspect the evidence set. |
These thresholds are bands, not formulas. The letter is the editor’s defensible synthesis of the six sub-scores, the hard caps below, and the sign-off rules.
§ 03
Minimum evidence threshold · Pending status
No peptide × outcome receives a letter grade unless the underlying evidence set clears a minimum count of qualifying studies. Below the threshold, the outcome carries Pending until editorial backfill completes.
Note
A letter grade requires at least 3 peer-reviewed studies for the internal grading unit. Fewer than 3 → Pending. At 3+ → eligible for A–F per the rubric.
Rationale: with fewer than 3 studies you cannot assess consistency, and consistency — replication, directional agreement, or deliberate contradiction — is the core evidence-grading judgment. Two studies can agree by chance; three is the smallest count where a pattern becomes visible. Aligns with GRADE and Cochrane norms for when certainty assessment becomes meaningful.
What counts as a qualifying study
- Peer-reviewed and published in an indexed journal, or a registered clinical trial with posted results. Preprints, conference abstracts without full publication, vendor white papers, and marketing collateral do not count.
- About the relevant InternalGradeKey — the right peptide, outcome, population, route, and time horizon. A rodent oral study does not count toward a human injectable count.
- Independent of other counted studies when assessing replication. Follow-on papers from the same lab on the same cohort count as one unless the design is materially different.
- Not retracted or flagged with unresolved integrity concerns at the time of grading.
Systematic reviews and meta-analyses count as one study each toward the threshold but strengthen the grade through sub-score 02.
Additional quality gates on top of the count
- A or B still require at least one well-powered controlled human trial. 50 animal studies and zero human trials cannot clear B.
- F still requires decisive evidence of null effect, harm, or medical regulatory rejection, and must be supported by at least 2 studies showing null/harmful effect — a single trial cannot falsify.
- Study-type weighting (RCT > cohort > case series > animal > in vitro) is handled by the sub-scores, not the threshold.
Pending vs Insufficient
| Status | Meaning | Expected outcome |
|---|---|---|
| Pending | Fewer than 3 qualifying studies exist and editorial backfill is in progress. Literature may or may not support a graded claim — we have not completed the review. | Becomes a letter (or Insufficient) once backfill completes. |
| Insufficient | Evidence set has been examined and the claim cannot be meaningfully graded — unit underspecified, literature structurally absent, or decisive sources unverifiable. | Remains Insufficient until the underlying problem is resolved. |
Operational rule: when the 3-study threshold is not met, default to Pending if the gap is expected to be fillable through ordinary literature search, and Insufficient if the claim itself is the problem (vague indication, unverifiable sourcing, absent primary literature by design).
Promotion out of Pending
- Threshold met + rubric supports a letter → assign the letter, update lastUpdated, log a grade-history entry with fromGrade: “Pending”.
- Threshold met but rubric caps low (e.g. qualifying studies are all animal-only for a human-outcome claim and sub-score 03 is capped at 1) → assign the letter grade the rubric supports; use Insufficient only if the grading unit itself is still unworkable.
- Editorial review confirms literature is structurally absent → convert Pending → Insufficient with a short rationale.
Promotions out of Pending follow the same sign-off rules as any other grade assignment (see §12). The initial Pending → letter transition is treated as a one-letter change for sign-off purposes.
Why Pending and not a lower letter
The tempting shortcut is to call a 0–2-study outcome a D (“weak evidence”). The problem: D is a grading judgment, and grading requires the evidence set to be large enough to judge. Calling something D on 1 study claims more confidence in the negative than the data supports — it says “we looked and the evidence is weak,” when the honest statement is “we have not completed the review.” Pending forces that honesty.
§ 04
Bridge from claim audits to grade reassessment
Claim audits and peptide grades are related but not identical. A claim audit evaluates a public assertion using the research protocol. The grading layer only acts on structured outputs from that process, not on free-form prose.
Every claim audit that touches a peptide grade must emit one or more EvidenceConclusion objects:
Claim-audit handoff
What the grading team needs from an audit
Think of this as the handoff sheet between the claim-review process and the grading layer: exact target, exact verdict, exact implication.
| Field | Editorial Meaning | Why It Matters |
|---|---|---|
Target | Peptide, outcome, and internal grading unit | The handoff has to name the exact grade object being touched. |
Sub-claim | The precise assertion that was tested | Not a vibe, not a paragraph. One checkable claim. |
Status | Validated, contested, unvalidated, overstated, falsified, withdrawn, dependent, or speculative | This is the audit verdict on the claim itself. |
Confidence | High, moderate, or low | How hard the audit is willing to lean on the conclusion. |
Affected sub-scores | Mechanism, human studies, effect vs placebo, long-term safety, side effects, regulatory | Only the touched parts of the rubric should move first. |
Grade impact | None, possible upgrade, possible downgrade, or mandatory reassessment | The audit can force reconsideration, but it never edits the letter directly. |
Decisive evidence | Named study cards and confidence caps | The grade editor should be able to trace exactly what carried the decision. |
Rationale | A short editorial explanation | Plain-language reasoning that survives outside the workflow. |
Operational rules
- A claim audit never changes a letter grade directly. It produces EvidenceConclusion records that may trigger reassessment.
- OVERSTATED / FALSIFIED / WITHDRAWN normally imply gradeImpact = “mandatory_reassessment” when they touch a central sub-claim or decisive cited source.
- CONTESTED implies mandatory_reassessment when comparable-quality direct support and contradiction exist for the same internal grading unit.
- VALIDATED may justify possible_upgrade, but never auto-upgrades. The rubric still has to be re-run.
- DEPENDENT / SPECULATIVE usually imply none, unless the peptide page is currently presenting them as direct support. In that case the grade may stay unchanged while the page wording is corrected.
§ 05
The six sub-scores
Every grade rolls up six weighted sub-scores, each rated 1–5 with a written justification visible on the peptide page.
| # | Sub-score | What it measures | Weight |
|---|---|---|---|
| 01 | Mechanism understood | Do we know how the molecule produces the claimed effect at a molecular and physiological level for the internal grading unit? "Plausible" is not the same as "demonstrated." | High |
| 02 | Human studies | How many controlled studies in humans exist for the relevant population, route, and time horizon? RCT vs observational vs case report, plus sample size and power. | Highest |
| 03 | Effect vs placebo | The controlled human effect signal. Tracks placebo- or comparator-adjusted human outcomes, not animal sham comparisons standing in for human efficacy. | Highest |
| 04 | Long-term safety | What is the longest published human exposure and follow-up window for the internal grading unit? Is there post-marketing or registry surveillance? | Medium |
| 05 | Side effect profile | Observed adverse events and tolerability, capped by how certain we are. A clean signal from tiny human exposure is not a well-characterized safety profile. | Medium |
| 06 | Regulatory status | Medical-review and safety-regulatory context for the studied use. Low-weight and must not conflate medical approval, safety warnings, legal availability, and sports prohibition. | Low |
The first three carry the most weight. Efficacy and human directness are what the grade is fundamentally about. Safety and regulatory context matter, but a perfectly safe molecule with no demonstrated effect still earns a low grade.
§ 06
Scoring scale per sub-score
| Score | Generic interpretation |
|---|---|
| 5 | Best-in-class evidence. Multiple high-quality, replicated, recent, directly relevant. |
| 4 | Strong. One pivotal trial or substantial consistent data. |
| 3 | Moderate. Reasonable evidence with notable gaps or limits. |
| 2 | Weak. Sparse, indirect, low-quality, or poorly replicated evidence. |
| 1 | Effectively absent, decisively negative, or too compromised to support the claim. |
§ 07
Hard caps & edge-case rules
These rules are not optional. They exist to stop the most common grading inflation errors.
Sub-score 03 · Effect vs placebo
- 03 ≥ 3 requires controlled human outcome data in the relevant InternalGradeKey.
- 03 = 2 is the ceiling when some direct human outcome signal exists, but it is uncontrolled, retrospective, open-label, or otherwise not comparator-based.
- 03 = 1 is the default when no direct controlled human efficacy evidence exists for the relevant grading unit.
- Animal-versus-sham findings may strengthen sub-score 01 and narrative context, but do not raise sub-score 03 above what human evidence allows.
Sub-score 05 · Side effect profile
- Cumulative direct human exposure under 50 people, or follow-up under 30 days → 05 cannot exceed 3.
- No direct human exposure data, safety case mostly animal toxicology or mechanistic inference → 05 cannot exceed 2.
- “No adverse events reported” in a tiny pilot is an early tolerability signal, not a mature safety profile.
- Any credible serious adverse event signal, unresolved integrity problem, or major uncertainty about the administered material caps the score at 1–2 pending review.
Sub-score 06 · Regulatory status
Track these dimensions separately in page notes and internal data:
- medicalApprovalStatus — approval, non-approval, or review status for the studied use
- safetyWarningStatus — warning letters, withdrawals, safety alerts, refusals
- availabilityStatus — compounding, legal-access, or supply-chain status
- sportsProhibitedStatus — WADA or other sports-governing-body status
- Sub-score 06 primarily reflects medical-review and safety-regulatory context.
- WADA status is compliance information for athletes. It does not imply efficacy or clinical danger by itself.
- Legal availability or compounding restrictions alone do not push a grade to F.
- Only safety- or efficacy-driven medical regulatory action can materially support a downgrade toward F.
§ 08
From sub-scores to letter
There is no rigid formula. The editorial heuristic:
| Letter | Requirements |
|---|---|
| A | At least two independent high-quality controlled human trials, or one pivotal trial plus an independent confirmatory controlled study of comparable directness. Sub-scores 02 and 03 both ≥ 4, 01 ≥ 4, 04 and 05 both ≥ 3. A single positive Phase 2 trial cannot produce A. |
| B | At least one well-powered controlled human trial showing clinically meaningful benefit in the relevant grading unit. Sub-scores 02 and 03 both ≥ 3, 01 ≥ 3. |
| C | Default when some direct human signal exists but meaningful gaps remain, or when strong indirect evidence still carries too much of the case. Common C profile: 01 ≥ 3, 02 = 2–3, 03 = 1–2. |
| D | Evidence is weak but identifiable — animal-only, mechanistic-only, anecdotal, or sparse uncontrolled human evidence. Requires ≥ 3 qualifying studies (not Pending). |
| F | Actively negative human evidence, unacceptable risk, or safety-/efficacy-driven medical regulatory rejection or withdrawal for the studied use. Sports prohibition or legal-access constraints alone are not enough. |
| Insufficient | Internal grading unit cannot be specified cleanly, literature is effectively absent, or decisive sources cannot be verified. |
Note
When in doubt, grade DOWN rather than up. Credibility is built on under-promising.
§ 09
Pending vs Insufficient vs D
This boundary must be applied consistently:
- D
We looked, and what exists is weak
At least 3 qualifying studies and the rubric supports a weak-evidence letter. Animal data, mechanistic papers, uncontrolled human case series, or anecdotal evidence that is inspectable and clearly limited.
- Pending
We have not finished looking
Evidence set has fewer than 3 qualifying studies and editorial backfill is reasonably expected to close the gap.
- Insufficient
Evidence set is structurally unworkable
Target cannot yet be meaningfully assessed — claim underspecified, route/population/time horizon unclear, literature effectively absent by design, or decisive sources unverifiable.
§ 10
Reassessment triggers
An internal peptide grade is re-evaluated when ANY of the following occurs:
| Trigger | Detection | SLA |
|---|---|---|
| New peer-reviewed RCT for that internal grade key | PubMed alerts on peptide name + outcome + route/population terms | Within 30 days |
| New systematic review or meta-analysis | PubMed alerts | Within 30 days |
| Claim audit emits an EvidenceConclusion with gradeImpact ≠ "none" | Claim-review workflow handoff | Within 7 days |
| Retraction of any cited paper | Retraction Watch + manual quarterly check | Same day |
| Major medical regulatory action (approval, refusal, withdrawal, warning, safety-grounded compounding action) | Regulatory bulletins + monthly check | Same day |
| Sports-prohibited status change | WADA or governing-body bulletins | Within 30 days (immediate for athlete-safety copy) |
| PubPeer integrity flag on a cited paper | PubPeer monitoring | Within 7 days |
| Need to split or narrow the internal grade key (route, population, or time horizon change) | Editorial review or claim audit handoff | Within 14 days |
| Quarterly housekeeping audit (no specific trigger) | Calendar | At least every 90 days |
| Reader-submitted evidence challenge via corrections@peptigrade.io | Inbox | Acknowledged in 7 days · resolved in 30 |
§ 11
The reassessment workflow
When a trigger fires, walk through this:
- § 01
Confirm the trigger is in scope
Read the new evidence, claim-audit handoff, retraction notice, or regulatory action. Confirm it concerns the peptide, outcome, population, route, and time horizon under consideration. A new oral rodent study does not automatically change an injectable human grade.
- § 02
Import the EvidenceConclusion if the trigger came from a claim audit
Do not translate prose by hand when a structured claim-audit output exists. Identify the sub-claim, affected internal grade key, touched sub-scores, and whether the outcome is wording-only, possible band move, or mandatory reassessment. If fields are missing, send it back for completion before changing the grade.
- § 03
Re-score only the affected sub-scores, applying the hard caps
Identify which sub-score(s) the new evidence actually touches. Re-score only those first. Apply the caps in §07 — do not let narrative enthusiasm override them. Write updated justifications in plain language. Examples: a placebo-controlled human RCT typically touches 02 and 03; longer follow-up touches 04 and maybe 05; a safety warning affects 05 and 06.
- § 04
Re-roll the letter grade
Apply the heuristic in §08. The letter may change up, down, or stay the same. A single positive Phase 2 trial can plausibly move C → B if well-powered and clinically meaningful; it does not move C → A on its own. If the internal grade key needs to be split by route/population/time horizon first, do that, then re-roll each resulting grade separately.
- § 05
Determine whether the publication packet changes
Four common outcomes: wording correction only (update copy, citations, lastReviewed; no grade-history entry); grade unchanged with updated sub-scores (update notes and lastReviewed, no editorial note required); grade up or down by one letter (add Grade history entry, editor + second-editor review required); grade change of two+ letters or any move to/from A or F (Grade history entry, full editorial-board review, publish a CHANGE NOTE).
- § 06
Update related artifacts
Grade changes ripple. Update the peptide's topGrade if this was the top outcome and the grade moved. Update any /protocols/[slug] page that includes the peptide as a component. Update any /claims/[slug] page that depends on the affected sub-claim. Update the home-page carousel and featured-peptides section if featured. Regenerate sitemap.xml on the next build.
- § 07
Editorial sign-off
Wording correction only → author editor. No grade change with sub-score updates → author editor. One-letter change → author editor + one second editor. Two+ letter change or any move to/from A or F → author editor + second editor + clinical advisor. Retraction-, integrity-, or safety-driven change → same chain plus same-day publication once verified.
§ 12
Ownership & authority
The grading layer needs clear authority boundaries:
| Decision | Owner |
|---|---|
| Routine sub-score refresh with no letter change | Primary evidence editor for that peptide |
| Any letter-grade movement | Primary evidence editor proposes · second editor approves |
| Any move to/from A or F, or any safety-/integrity-driven downgrade | Primary evidence editor + second editor + clinical advisor |
| Override of a claim-audit arbitrator, or override of gradeImpact = "mandatory_reassessment" | Editorial lead + clinical advisor · rationale logged in Grade history |
| Publication of a medical- or safety-related grade change | Editorial lead owns release and timing |
| Quarterly stale-review sweep | Managing editor or designated evidence-ops owner |
Note
No single person should author the triggering claim audit, arbitrate the claim audit, and approve the resulting grade change alone.
§ 13
Versioning & audit trail
Every grade change must leave a trail. The pattern:
Grade history entry
What a clean change log should capture
Not software for software's sake. Just the minimum record needed so a future reader can understand what changed, why it changed, and who approved it.
| Field | Editorial Meaning | Why It Matters |
|---|---|---|
When | The date the grade changed | An ISO-stamped moment in the public record. |
What moved | The exact internal grading unit | Not just the peptide page broadly, but the specific row that changed. |
From → to | The prior letter and the new letter | Readers should be able to see direction, not just the latest state. |
Why now | The trigger: paper, regulatory event, or claim-audit run | Every shift needs a traceable cause. |
What changed underneath | The affected sub-scores and linked audit outputs | The movement should be reconstructible, not just asserted. |
Editorial rationale | A one- or two-sentence explanation of the move | Short enough to scan, precise enough to defend. |
Approval trail | Who signed off | Especially important for safety-led downgrades or moves into A or F. |
We do not yet store this in the codebase — currently each peptide outcome has only lastUpdated. Next step: extend the OutcomeGrade type with an optional history: GradeHistoryEntry[] field, render the history on the peptide page as a small Grade history panel under the grade matrix, and surface it in the Drug JSON-LD.
§ 14
Errata vs reassessment vs claim audit
Three distinct operations, three different workflows:
| Operation | When | Workflow |
|---|---|---|
| Errata | A factual error is found on a published page: wrong PMID, dose number, author name, label text. | Fix the error, append a dated correction note, mention in the next weekly dispatch. No grade change. Owner: any editor. |
| Reassessment | New evidence or a structured claim-audit handoff triggers re-evaluation of a published internal peptide grade. | This document, Steps 1–7. Owner: editor responsible for the peptide. |
| Claim audit | A popular framing or specific assertion needs adjudication. | New /claims/[slug] page following the research protocol. If grade-relevant, it emits EvidenceConclusion objects. Owner: editor + clinical advisor for the relevant peptide. |
§ 15
What never raises a grade
These never count as supporting evidence regardless of how persuasive they sound:
- Influencer testimonials
- Vendor marketing copy
- Forum or social-media anecdotes
- Vendor-funded white papers without peer review
- “It worked for me” reports
- Mechanistic plausibility absent direct evidence
- Review articles cited as if they were decisive primary efficacy evidence
These can be cited as context but they cannot move sub-scores 02, 03, 04, or 05 in a positive direction.
§ 16
Protocol-level evidence labels
A multi-compound protocol is not a peptide × outcome pair. Borrowing the A–F peptide letters for a protocol would imply direct regimen-level human-outcome evidence that almost never exists.
Protocols use a distinct label set (see /protocols):
| Label | Meaning |
|---|---|
| Exploratory synthesis | Combination of research-backed hypotheses; the regimen as assembled has not been tested as a unit in a controlled human trial. |
| Mechanistically plausible | Components act on characterized pathways; protocol-level human outcome evidence is absent. |
| Emerging clinical evidence | Some protocol-level human outcome data exists, but it is under-powered or preliminary. |
| Established protocol | Validated protocol-level human RCT evidence. |
Watch
Compound-level peptide grades and the protocol evidence label are scored separately. A protocol may contain one C-grade peptide, two D-grade peptides, and still carry an “Exploratory synthesis” label because the full stack has not been tested together. Do not aggregate letter grades across compounds into a protocol “top grade.”
§ 17
Worked example: BPC-157 × tendon healing
Public label: BPC-157 × tendon healing
Internal grade key, simplified: BPC-157 × tendon healing × adults with musculoskeletal injury × injectable/systemic exposure × acute/subacute healing window
Current state: Grade C, sub-scores 4 / 2 / 1 / 2 / 2 / 1, top outcome.
| Sub-score | Value | Justification |
|---|---|---|
| 01 Mechanism | 4 | FAK-paxillin, VEGFR2-Akt-eNOS, and GH-receptor-related pathways are characterized in animal and in-vitro systems, but human PK and a confirmed receptor story remain absent. |
| 02 Human studies | 2 | Sparse direct human data exists, but no well-powered controlled human trial for this grading unit. |
| 03 Effect vs placebo | 1 | No placebo- or comparator-controlled human efficacy trial exists for the relevant grading unit. Animal-versus-sham consistency helps context, not this score. |
| 04 Long-term safety | 2 | Human exposure windows are short, with no robust longer-term follow-up. |
| 05 Side effect profile | 2 | Small human exposure and short follow-up mean the observed tolerability signal is still low-certainty, even if animal toxicology looks favorable. |
| 06 Regulatory status | 1 | Medical approval is absent. Safety/access/sports notes are tracked separately, and WADA status does not by itself imply inefficacy or danger. |
Why C and not B
The B rubric requires at least one well-powered controlled human trial plus sub-scores 02 and 03 both ≥ 3. BPC-157 × tendon healing does not meet that bar. Strong and replicated animal evidence plus sparse uncontrolled human signal is a canonical C, not a B.
What would move it to B
A positive, well-powered controlled Phase 2 or Phase 3 human trial showing clinically meaningful benefit in the relevant grading unit could plausibly move 02 to 3–4 and 03 to 3, producing a likely C → B reassessment if safety remains acceptable.
What would move it to A
At least one additional independent high-quality controlled human trial pointing the same way, plus stronger longer-term human safety data. A single positive Phase 2 trial would not be enough.
What would move it down
A decisive high-quality null RCT, a credible serious safety signal, or a safety-/efficacy-driven medical regulatory action could force reassessment toward D or F depending on how conclusive the new evidence is.