Live mirror of CHANGELOG.md from the public MCP server repository. Refreshed hourly.

Changelog

All notable changes to HEORAgent MCP Server.

v1.13.0 (2026-05-28) — Feature: AI Transparency Disclosure (ISPOR ELEVATE-GenAI aligned)

Adds a structured AI-assistance disclosure block to tool outputs, aligned with the ISPOR ELEVATE-GenAI reporting guidelines (Fleurence RL et al., Value Health 2025;28(11):1611–1625).

New

Wiring by tool tier

TierToolsDefault level
StandardriskOfBias, screenAbstracts, itcFeasibility, populationAdjustedComparison, survivalFitting, costEffectivenessModel, budgetImpactModel"standard"
SubmissionhtaDossierPrep, htaWorkflow, utilityValueSet, maicWorkflow, jcaPicoScope, pvClassify, pvSignalWorkflow, irbReview, icfReadabilityCheck"submission"
Excludedknowledge.*, project.create, utils.validate_linksno change

ISPOR citation

Fleurence RL, Dawoud D, Bian J, Higashi MK, Wang X, Xu H, Chhatwal J, Ayer T; ISPOR Working Group on Generative AI. ELEVATE-GenAI: Reporting Guidelines for the Use of Large Language Models in Health Economics and Outcomes Research: An ISPOR Working Group Report. Value Health. 2025;28(11):1611–1625. doi:10.1016/j.jval.2025.06.018


v1.11.3 (2026-05-22) — Fix: expose run_owsa and study_types in MCP schemas

run_owsa was accepted by the models.cost_effectiveness handler (if (params.run_owsa !== false)) but absent from the JSON inputSchema — MCP clients couldn't discover or disable one-way sensitivity analysis. study_types was defined as a Zod enum in literature.search but similarly missing from the JSON schema.

Both fields are now published in their respective inputSchema objects with full type and description. tests/schemas/mcpToolSchemas.test.ts extended with 2 new drift-guard assertions (total: 13).

v1.11.2 (2026-05-22) — Fix: expose 6 hidden hta_dossier fields in MCP schema

heterogeneity_per_outcome, upgrading_per_outcome, severity_modifier, health_inequalities, pv_classification, and regulatory_landscape were all accepted and used by the hta.dossier handler but absent from the published MCP JSON inputSchema. External MCP clients (Claude Desktop, Smithery, etc.) could not discover or pass these fields, silently breaking the pipe workflows that pv_classify and regulatory_status_check were designed to feed into hta.dossier.

All 6 fields are now fully documented in the inputSchema with types and descriptions matching the Zod schemas. tests/schemas/mcpToolSchemas.test.ts extended with 6 new drift-guard assertions.

v1.11.1 (2026-05-22) — Bug fixes: MFN schema exposure, PartSA MFN runner, telemetry

Fixed: MFN fields missing from MCP-published tool schemas

mfn_sensitivity was implemented in the models.cost_effectiveness Zod schema and handler but absent from the exported costEffectivenessModelToolSchema JSON — external MCP clients (Claude Desktop, Smithery, etc.) could not discover the field. Likewise mfn_context was missing from the hta.dossier MCP inputSchema. Both fields are now present in the published schemas with full descriptions and required-field lists.

Added tests/schemas/mcpToolSchemas.test.ts as a permanent drift guard so Zod and MCP schemas can't diverge silently again.

Fixed: MFN sensitivity always used Markov runner even for PartSA models

runMfnSensitivity was called with runMarkovAndComputeICER regardless of model_type. When the base model was partsa, the MFN curve was computed from a Markov run, producing mixed-method output (PartSA base case + Markov MFN curve). The callback now dispatches to runPartSA when model_type="partsa", producing a consistent single-method result.

Fixed (web): hta_body enum in web/lib/tools.ts missing "gvd"

The web-tier tool definition exposed ["nice", "ema", "fda", "iqwig", "has", "jca"]"gvd" was present in the MCP server schema but not in the Claude web UI tool definition. GVD dossiers were silently inaccessible from the web UI.

Fixed (web): MCP tool errors tracked as status=ok in PostHog

McpSession.dispatch() catches all errors and returns "Error: ..." strings (by design — so Claude receives the error text). The chat route's trackToolCall call sat immediately after dispatch() and always emitted status: "ok". The route now checks for the "Error: " prefix and emits status: "error" with error_class: "McpError" and the message body.

Also fixed: the PostHog distinctId was hardcoded to "chatgpt_adapter" for all surfaces. Claude web UI calls now use distinctId: "anon_claude_web" so the two surfaces are distinguishable in analytics.

v1.11.0 (2026-05-09) — MFN-aware tooling: basket data, dossier section, CE price sweep

Implements the Most-Favored-Nation pricing layer across three tool surfaces. Triggered by CMS proposed GUARD (Part D) and GLOBE (Part B) payment models, which anchor US drug prices to a 19-country OECD basket minimum — a structural shift that makes the gap between US net price and the MFN ceiling a first-order market-access input. Design log #27.

New: src/data/mfnBasket.ts — 19-country basket data + ceiling math

New: src/models/mfnSensitivity.ts — deterministic ICER price sweep

runMfnSensitivity(baseParams, inputs, runModel) sweeps drug price from min_basket to current_us_price at N discrete points (default 11) and returns:

Why deterministic sweep instead of another PSA? MFN is an exogenous price shock, not statistical uncertainty. 11 Markov runs vs 1000+ PSA runs; output is payer-readable ("ICER drops from $X to $Y; WTP crossover at price $Z").

13 unit tests in tests/models/mfnSensitivity.test.ts.

Extended: models.cost_effectivenessmfn_sensitivity input field

When caller supplies mfn_sensitivity: { min_basket, current_us_price, n_points?, wtp_thresholds? }:

6 integration tests in tests/tools/costEffectivenessModelMfn.test.ts.

Extended: hta.dossiermfn_context input field

When caller supplies mfn_context: { basket_prices, us_current_net_price?, basket_revision?, excluded_countries? } and the HTA body is NICE / EMA / FDA / IQWiG / HAS / GVD:

15 integration tests in tests/tools/htaDossierMfn.test.ts.

Extended: src/server.ts — MFN telemetry flags

trackToolCall on success now includes:

Enables PostHog HogQL queries to measure MFN feature adoption without schema changes.

Extended: web tier — SYSTEM_PROMPT + tool schema

12 web tests in web/__tests__/mfnPhase4.test.ts.

Full test suite

1133 tests passing (1121 MCP + 12 new web). 0 failures.


v1.10.2 (2026-05-12) — stop reusing the 500-char telemetry cap as the client response

A 0-CRITICAL hygiene release that fixes a quiet bug discovered while debugging a real ChatGPT failure on 2026-05-12 09:15:24 (user 1be263 called hta.dossier with no payload).

The bug

classifyToolError in src/analytics.ts returned a single error_message field, truncated to 500 chars. That truncation was intended for telemetry hygiene — PostHog event properties have a size limit, and a multi-issue ZodError dump on a heavy schema can run 1-2KB.

src/server.ts:475 then used that same truncated string as the client-facing response content (the text of the text content block returned to whoever called the tool). Multi-issue ZodErrors arrived at clients (ChatGPT Custom GPT especially) with the JSON cut mid-key"received": "string", "rece — and were unparseable. ChatGPT bounced instead of retrying with the missing fields. Five real-user errors followed this pattern in the 14 days before discovery.

The fix

Split the field. classifyToolError now returns:

{
  error_class: string;       // unchanged — Error subclass name for dashboards
  error_message: string;     // FULL text — for the client response
  telemetry_message: string; // capped at 500 chars — for PostHog
}

server.ts:472 now passes telemetry_message to trackToolCall (PostHog hygiene preserved) and uses error_message for the text content (full message reaches the client).

Why it isn't strictly redundant with the web-tier fix (deployed 2026-05-12)

The web tier (web/lib/zodErrorFormatter.ts) already reformats raw ZodError JSON arriving from MCP into one-line "field.path: Required" text, AND salvages complete issues from truncated arrays. So clients calling via the web/ChatGPT adapter are protected today even on v1.10.1.

But:

  1. Direct MCP clients (Claude Desktop, Cursor, the npm npx users) don't go through the web tier. They see the raw truncated ZodError straight from Railway. v1.10.2 fixes their experience.
  2. The web-tier fix is defensive masking; v1.10.2 is the root-cause fix. Both layers help — defense in depth.

Tests

tests/analytics/errorClassification.test.ts updated to cover the split:

Full suite: 8/8 in errorClassification, no other tests touched.

Non-breaking for consumers

error_class and error_message are still present and PostHog dashboards keep working unchanged (telemetry is now slightly more selective about which message it stores). The only behavioral change visible to a client of the MCP server is: error responses are no longer truncated mid-message. That's a strict improvement.


v1.10.1 (2026-05-10) — auto-wire regulatory.status_check (the "make the right thing easy" follow-up to v1.10.0)

v1.10.0 shipped the primary-source regulatory lookup tool. v1.10.1 closes the loop: the tool now fires automatically inside evidence.unmet_need and a new hta_workflow Phase 3.6, so the model can no longer fabricate a "no approved option" claim by simply forgetting to call it. Design log #26.

evidence.unmet_need — default-on regulatory fan-out

When treatment_landscape.current_soc[] is supplied, the handler now fans out to regulatory.status_check for each molecule across the user-supplied jurisdictions[]. Results are injected as a structured regulatory_context[] array AND rendered as inline label-quote attributions in the treatment-landscape paragraph ("Per FDA/OpenFDA label retrieved 2026-05-10: fremanezumab is approved for the preventive treatment of migraine in adults and in pediatric patients 6 years of age and older [citation N].").

hta_workflow Phase 3.6 — new "regulatory_landscape" phase

Inserted between Phase 3.5 (evidence.unmet_need) and Phase 4 (CE model). Fans out across comparators surfaced in earlier phases, pipes results into hta_dossier as a new regulatory_landscape[] parameter. Always runs when comparators are present, regardless of hta_body. Adds ~5-10s to total workflow time on a typical 4-comparator dossier.

hta_dossier — new "Regulatory Landscape" section

Renders for nice / jca / gvd / amcp bodies. Table format: comparator × region × current approved indication × label-revision date × source URL. Provides auditable provenance for the regulatory claims that downstream payers verify line-by-line.

Graceful degradation — non-negotiable

api_error or current_status: "unknown" from regulatory.status_check never blocks dossier rendering. Instead the failure is appended to gaps[]:

The dossier proceeds with whatever regulatory context is available. This is the same design philosophy as the literature-search degradation: surface gaps explicitly, don't fail the workflow.

Cycle safety

regulatory.status_check's handler is statically prevented from importing evidence.unmet_need (per design log #26 Q10). Tests assert this so a future contributor doesn't create an A→B→A loop.

Rate-limit headroom

OPENFDA_API_KEY env var is now respected by the OpenFDA client (the v1.10.0 implementation accepted it but the wiring shipped here). Anonymous OpenFDA limit is 240 req/min; with key, 120K/day. Production should set the env var.

Tests

29 new regression tests across autoCheck (9), evidence.unmet_need integration (8), hta_workflow Phase 3.6 (6), hta_dossier regulatory-landscape rendering (8). Full suite: 111 suites / 1069 tests (up from 110 / 1037).

Compatibility


v1.10.0 (2026-05-10) — regulatory.status_check tool (#28) — primary-source label lookup

New tool that closes a real-user incident category. Design log #25.

The trigger — fremanezumab/pediatric-migraine, 2026-05-07

Michael's colleagues at work asked evidence.unmet_need for a fremanezumab/pediatric-migraine dossier. The output asserted "CGRP mAbs have no approved pediatric indication" — true at LLM training cutoff, false since FDA approval of AJOVY (fremanezumab-vfrm) for pediatric episodic migraine in August 2025 (sBLA 761089/s031). Same staleness trap is waiting on every drug with recent label changes (Aimovig, lecanemab, donanemab, biosimilars, withdrawals…). Pointing the LLM at orange_book via literature_search returns product index entries, not the current Indications and Usage section. HEORAgent had no canonical regulatory-status lookup. v1.10.0 ships one.

What the tool does

regulatory.status_check({ drug: "fremanezumab", region: "us", indication?: "migraine" })

Returns:

The CRITICAL invariant

current_status never equals "not_approved". Primary-source absence ≠ proof of non-approval — that's the exact fremanezumab failure inverted. Database miss → unknown + did_you_mean[]. Documented in the tool description, asserted in tests.

Sources

Caching

24h TTL, in-memory, shared across MCP sessions. force_refresh: true bypasses. Cache key includes drug-name normalisation so Fremanezumab / fremanezumab-vfrm / AJOVY hit the same entry.

Tests

Live OpenFDA smoke test confirmed primary-source retrieval on real label queries. Full suite 110 suites / 1037 tests at v1.10.0 ship.

Companion fix bundled in this release: Codex review P1+P2+P3

Three correctness fixes that shipped alongside the new tool:

7 new regression tests for these (3 PartSA, 2 utility, 2 schema-exposure).


v1.6.3 (2026-05-07) — code-review polish for v1.6.2 + Slack-digest hardening

Two parallel reviewers audited v1.6.2 and the new Slack weekly-digest feature within hours of ship. Combined: 0 CRITICAL, 0 HIGH, 6 MEDIUM, 6 LOW. All real findings addressed; cosmetic items deferred. Total tests 833 → 838.

Fixed (schema, MCP server)

Fixed (Slack weekly digest)

Added — pinning tests

Skipped (cosmetic)

Tests

833 → 838 MCP tests passing (+4 trim regression + 1 studies:{} pinning + 1 instrument case-insensitive). 154/154 web tests still passing (Slack stats fixes are network-dependent paths; covered by type-check + manual audit, no fetch-mocking integration test added in this patch).

Non-breaking

All changes are silent failure-mode hardening + LLM ergonomics. No API surface changes; no migration needed.

v1.9.2 (2026-05-09) — polish: Nelder-Mead early exit + 6 review nits

Six small quality improvements deferred from the v1.9.1 review. None are correctness fixes; these are polish on the v1.7-1.9 work. Two have a measurable performance impact:

Performance — full test suite 215s → 19s (11×)

Two of the changes turn out to dominate overall test runtime:

Correctness / hygiene

Tests

909/909 still passing. No new tests added — all changes either tighten existing assertions or are pure refactors of correct code.

Performance impact in production

Negligible. The Nelder-Mead early exit speeds up survival_fitting IPD calls by ~3-4× in the typical case (faster convergence) but wall-clock for a 500-patient fit was already <100ms before, so the user-visible difference is "fast" → "very fast". The log floor change is functionally invisible at production parameter values.

Non-breaking

All changes are pure quality improvements. No API surface changes; no observable behavior change at production parameter ranges.

v1.9.1 (2026-05-09) — code-review fixes for v1.7.0 / v1.8.0 / v1.9.0

Two parallel reviewers (math/statistics + ICF formula correctness) audited the v1.7-1.9 work within hours of ship. 0 CRITICAL, 4 HIGH, 8 MEDIUM, 7 LOW. All 4 HIGHs were real correctness bugs in code that produces HTA / CMS / IRB-grade outputs. All addressed in this patch.

Fixed (HIGH)

Fixed (MEDIUM)

Skipped (cosmetic)

Tests

10 new regression tests:

899 → 909 MCP tests passing. Web tests still 177/177.

Non-breaking

All changes are bug fixes. No API surface changes. The mean_survival_restricted field semantics are now correct (RMST instead of unrestricted mean) — callers that already used it as RMST per its documented meaning get more accurate values; callers that were treating it as unrestricted mean were getting the wrong field anyway.

v1.9.0 (2026-05-09) — survival_fitting patient-level MLE path (no longer ⚠️ EXPERIMENTAL on the IPD input)

The tool now accepts patient-level event-time data (event_data: Array<{time, event: 0 | 1}>) alongside the legacy km_data step-summary path. Caller picks one (Zod refine enforces). When event_data is supplied, the fit is true right-censored maximum likelihood per Collett (2015) and NICE DSU TSD 14 (Latimer 2013) — no approximation warning. The KM-table path remains supported for back-compat with literature-digitization workflows but emits an explicit "approximation, less reliable" warning and points the caller at event_data.

Added

Changed

Tests

14 new IPD-path tests in tests/models/survivalFitting.test.ts:

5 additional tool-level tests for the new schema paths (event_data path methodology, KM-path "approximation" warning unchanged but no longer says "EXPERIMENTAL", mutual-exclusivity validation).

882 → 901 MCP tests passing.

Non-breaking

v1.8.0 (2026-05-09) — icf_readability_check tool (paired with irb_review)

New tool. Closes the v2 deferral from design log #21: paired ICF readability analyzer that was promised when irb_review v1 shipped.

Added — icf_readability_check (icf.readability_check)

Takes ICF text, returns:

Pure logic, no external API. <300ms on a 50-sentence ICF.

References baked into the methodology

Tests

34 new tests across schema validation, syllable counting (heuristic ±1 of CMU dict), sentence splitting (handles abbreviations: Dr. / Mr. / e.g. / i.e.), word tokenization, FKGL/FRE formula correctness on known reference texts, per-sentence breakdown, jargon detection (case-insensitive, whole-word, capped at 5 occurrences), verdict logic, output structure, performance.

848 → 882 MCP tests passing. Web tool count assertions bumped 26 → 27 across 4 test files.

Tool count

26 → 27. Full tool list: literature.search, literature.screen, evidence.network, evidence.indirect, evidence.population_adjusted, evidence.survival, evidence.risk_of_bias, evidence.itc, evidence.clinical_scale, evidence.unmet_need, models.cost_effectiveness, models.budget_impact, hta.dossier, hta.utility, hta.workflow, utils.validate_links, project.create, knowledge.search, knowledge.read, knowledge.write, examples, workflow.maic, pv.classify, pv.signal_workflow, jca.pico_scope, irb.review, icf.readability_check ← new.

v1.7.0 (2026-05-09) — EVPPI promoted out of ⚠️ EXPERIMENTAL

Three quality fixes to the Strong-2014 binning estimator in cost_effectiveness_model's EVPPI path. Removes the long-standing CLAUDE.md caveat ("non-parametric binning, noisy when total EVPI ~0").

Fixed

Added

Tests

13 new EVPPI tests across 6 describe blocks:

848/848 tests passing (was 835).

Non-breaking

EVPPIResult adds 4 optional fields (evppi_ci_lower, evppi_ci_upper, evppi_se, below_noise_floor). The pre-existing evppi, evppi_proportion, parameter fields are unchanged. No migration needed.

Open methodology gaps (deferred)

The binning estimator still doesn't match the gold-standard methods (GAM regression per Strong 2014, Gaussian-process regression per Heath-Manolopoulou-Baio 2018) for accuracy in challenging cases. Adding a real GAM smoother is ~2 weeks of work and is candidate for a future v1.x patch when there's appetite. v1.7.0 makes the binning estimator HONEST (no fake positives, proper uncertainty quantification, mathematical bounds) — appropriate for the current default use case where EVPPI is one of many sensitivity outputs, not the primary model output.

v1.6.2 (2026-05-07) — schema hardening for LLM input shapes

Two LLM-input-shape fixes surfaced by a PostHog audit of project.create and evidence.risk_of_bias errors.

Fixed

Tests

+9 helper tests (tests/util/caseInsensitive.test.ts) + 7 regression tests across project_create and risk_of_bias. Total 822 → 829 passing.

Non-breaking

Canonical lowercase still works exactly as before. New code only adds tolerance for upper/mixed case. No API surface changes; no migration needed.

v1.6.1 (2026-05-07) — hta_workflow GVD routing + Phase 3.5 unmet-need integration

Wires the new evidence.unmet_need tool from v1.6.0 into the hta_workflow orchestrator as Phase 3.5, between risk-of-bias and cost-effectiveness. Also extends hta_workflow to route GVD-specific section generators when hta_body: "gvd".

Added

Fixed

v1.6.0 (2026-05-07) — evidence.unmet_need tool + Global Value Dossier section generators

Two design-log items shipped together. Tool count 25 → 26.

Added — evidence.unmet_need (design log #23)

New tool: structured 4-dimension unmet-need framework. Inputs: indication + jurisdiction + optional literature_evidence (output from literature_search). Output: markdown report + structured unmet_need_summary JSON object that pipes into hta_dossier({hta_body:"gvd"}) Section 4 and hta_dossier({hta_body:"nice"}) for the NICE Severity & Inequalities section.

Four dimensions:

  1. Disease burden — incidence/prevalence, mortality, morbidity, demographics
  2. Treatment landscape gap — current SoC limitations, response rates, AE profiles, off-label patterns
  3. QoL impact — EQ-5D / disease-specific instruments, work productivity, caregiver burden
  4. Economic burden — direct medical, indirect costs, productivity loss, healthcare utilisation

Per-jurisdiction depth (light v1): adds country-specific epidemiology and SoC where the user supplies a jurisdiction code. Citations carry URL with pre-validation. 12+ tests.

Added — Global Value Dossier section generators (design log #22)

Existing hta_dossier({hta_body:"gvd"}) was a 13-section skeleton emitting generic boilerplate. v1.6.0 ships actual section generators that consume literature_search / risk_of_bias / evidence_indirect / cost_effectiveness_model / budget_impact_model / evidence.unmet_need outputs and produce GVD-specific prose:

Plus a gvd_evidence_pack pipe interface so GVD output can pre-fill country-specific dossiers (NICE / JCA / AMCP). DOCX table styling. AMCP Format 4.1 deliberately deferred to v1.7. 15+ tests.

v1.5.2 (2026-05-07) — live-formula XLSX + neurology clinical scales

Two more design-log items in a single release. Both surfaced gaps from the v1.4.x management benchmark vs Claude.ai (slide 6 "❌ today" → ✅).

Added — evidence.clinical_scale (design log #19)

New umbrella tool covering 6 neurology and cognitive scales:

Per-scale total + subscale scoring, MCID-based responder analysis (Krismer 2017 / Horváth 2015 / Andrews 2019 thresholds), trajectory comparison vs natural-history reference cohorts (NNIPPS / EMSA-SG / PPMI / ADNI summary-level v1). Time-to-milestone integration via survival_fitting.

Three new JCA indication sub-classes added to jca_pico_scope:

Per-country comparator universes:

17 tests. Tool count 24 → 25.

Changed — live-formula XLSX upgrade (design log #20)

Refactored formatters/xlsx.ts so the XLSX output for cost_effectiveness_model and budget_impact_model emits live Excel formulas instead of pre-computed values:

PSA per-iteration values are kept as static numbers (audit reproducibility — re-running PSA stochasticity inside Excel would break determinism).

Same treatment for budget_impact_model XLSX (year-by-year SUM formulas referencing the inputs sheet).

15 tests. Closes the v1.4.x management benchmark "partial" rating on Slide 6. Customers can now genuinely edit any input → trace recomputes → ICER updates → CEAC curve shifts.

v1.5.1 (2026-05-06) — irb_review code-review fixes

Three parallel reviewers (regulatory accuracy with WebFetch verification, decision-tree correctness, test-gap analysis) audited v1.5.0 within hours of ship. 3 HIGH regulatory citation errors + 4 HIGH correctness bugs + 6 untested branches identified. All real findings verified against primary sources (eCFR via govinfo.gov / Cornell LII; EU CTR 536/2014 via legislation.gov.uk + European Commission) and patched. Total tests 683 → 708.

Fixed (HIGH — regulatory accuracy)

Fixed (HIGH — decision-tree correctness)

Fixed (MEDIUM)

Fixed (LOW)

Tests

+25 new regression tests across:

683 → 708 tests, 100% pass rate, no regressions.

Process learning

The reviewer-hallucination memory (saved 2026-05-05 after 3 incidents in 48h) saved this patch from introducing fabricated regulatory citations. All 4 regulatory findings were verified against official sources before applying any patch — eCFR via govinfo.gov for §46.204, Cornell LII for §46.306 and §46.406, legislation.gov.uk for CTR 536/2014 Article 14 and Annex I. Each verification quote was checked verbatim against the reviewer's claim. The pattern of "spawn 3 reviewers in parallel, fan out by audit angle, verify regulatory claims via WebFetch before patching" is now the default for every release with regulatory output.

v1.5.0 (2026-05-06) — irb_review tool

New IRB / Ethics Committee submission classifier (design log #21). Pure decision-tree logic, <300ms, no external I/O. Tool count 22 → 23.

Added — irb_review (irb.review)

Classifies a planned study under 45 CFR 46 (US Common Rule) + EU CTR 536/2014 to produce an IRB submission scaffold. Inputs: study_design (7-enum), data_handling (5-enum), risk_level, funding_source, jurisdictions (us_irb / eu_cec), 4 vulnerable-population yes/no flags, optional pv_classification, optional expedited_category_claim, plus 9 hint flags that disambiguate exempt/expedited categories.

Outputs:

Sign-off ambiguities resolved (v1)

Tests

57 new tests (683 total). Each Common Rule exempt category 1-8 reachable; each expedited category 1-7 reachable; full-board path; Subpart B/C/D layered correctly; GDPR Art. 9 fires only on EU + non-anon; HIPAA §164.514 fires only on US + identifiable; CTR/FDA/PSUR SAE frameworks; PASS_imposed override; cover-letter content + word count; irb_ruleset stamp; <300ms perf; A1/A2/A3/A4 regression coverage.

v2 deferrals (committed in design log #21)

v1.4.2 (2026-05-06) — code-review fixes for v1.3.2 / v1.4.0 / v1.4.1

Three releases (v1.3.2 NICE TA precedents + JCA scope eligibility, v1.4.0 hta_workflow orchestrator, v1.4.1 HFrEF per-country comparator depth) shipped to production without independent review. Three parallel code reviews surfaced 10 HIGH and 8 MEDIUM findings of regulatory consequence. The headline "CRITICAL" turned out to be a reviewer hallucination (TA773 → TA849 swap that would have introduced a real fabrication; verified via webfetch against nice.org.uk that TA773 is in fact correct for empagliflozin HFrEF). All real findings addressed.

Fixed (HIGH)

Fixed (MEDIUM)

Tests

Process learning

The HFrEF reviewer's "CRITICAL" — claiming TA773 is ivosidenib for AML and that empagliflozin HFrEF is TA849 — was a confident regulatory hallucination. WebFetch against nice.org.uk confirmed: TA773 IS empagliflozin HFrEF (9 March 2022); TA849 is cabozantinib for HCC. Without verifying we would have introduced a real fabrication while "fixing" a false alarm. Memory note saved: always verify subagent regulatory ID claims against the official public database before editing code.


v1.3.1 (2026-05-05) — pv_classify + pv_signal_workflow code-review fixes

Independent code reviews of both PV tools surfaced 1 CRITICAL + 6 HIGH findings of regulatory consequence. All addressed before redeployment.

pv_signal_workflow fixes

pv_classify fixes

Tests

Why this is a patch release

Pure correctness + transparency fixes. No API breaking changes (the encepp_protocol_template field rename is a transparency improvement; the previous IDs were not real ENCePP references, so callers depending on them were depending on a fiction). No new tools.


v1.3.0 (2026-05-05) — pv_signal_workflow tool (EMA GVP Module IX rev 2)

Added

Why this release

EMA GVP Module IX rev 2 (effective 2026) makes EVDAS integration mandatory for all EU MAHs from 12 February 2026, ending the EudraVigilance signal-detection pilot. EMA's accompanying message: "AI-powered pharmacovigilance is now expected, not optional." This tool absorbs the disproportionality-statistics + workflow-recommendation step into HEORAgent so PV teams stop maintaining ad-hoc Excel signal sheets.

Roadmap committed (not in v1.3.0)

Tests

References


v1.2.2 (2026-05-05) — error telemetry + permissive input validation

Fixed

Tests

Why this is a patch release

Pure telemetry + UX improvements. No API changes; no breaking changes; no new features.


v1.2.1 (2026-05-04) — jca_pico_scope code-review fixes

Fixed

Tests

Why this is a patch release

All v1.2.0 functionality is unchanged for correct inputs. Fixes only affect (a) edge-case indication strings that were silently misclassified, (b) dead-field traps for future contributors, (c) error/warning surfaces for previously silent failure modes. No API changes; no breaking changes.


v1.2.0 (2026-05-04) — EU JCA PICO matrix analyzer

Added

Why now

EU JCA has been in force since 12 January 2025 for oncology / ATMPs. 2026 brings high-risk medical devices into scope; orphan drugs join in 2028; all medicines by 2030. Manufacturers have 100 days from the consolidated PICO list to dossier submission — and no tool to scope it. This tool absorbs the 3-week consultancy step into a 200ms call.

Tests

References


v1.1.1 (2026-05-04) — NICE PMG36 update: severity modifier + health inequalities

Added

Why now

NICE published a refreshed PMG36 manual on 31 March 2026 (covering devices/diagnostics/digital alongside medicines per the NHS 10-Year Plan). The May 2025 modular inequalities update is now part of every NICE submission. Both changes were under-reflected in our NICE STA template.

Tests

References


v1.1.0 (2026-05-04) — Pharmacovigilance study classification + HTA dossier PV section

Added

Tests

References


v1.0.6 (2026-05-04) — MAIC workflow orchestration tool

Added

Tests


v1.0.5 (2026-05-04) — ChatGPT MAIC workflow recipe

Added

Tests


v1.0.4 (2026-05-02) — Bucher consistency, GRADE upgrading, EQ-5D baseline-utility, ChatGPT support

Added

Fixed (code review)

Tests

References

Bucher HC et al. J Clin Epidemiol. 1997;50(6):683-691; Cochrane Handbook Ch. 11.4.3; NICE DSU TSD 18; Guyatt GH et al. J Clin Epidemiol. 2011;64(12):1311-1316; Biz, Hernández Alava, Wailoo (2026) Value in Health forthcoming.


v1.0.3 (2026-04-29) — Senior HEOR methodology fixes

Fixed

Added

References

Cochrane Handbook for Systematic Reviews of Interventions Ch. 10.10, 11.4.3; GRADE Handbook 5.1; Guyatt GH et al. J Clin Epidemiol. 2011;64(12):1311-1316; Higgins & Thompson Stat Med 2002; Bucher HC et al. J Clin Epidemiol. 1997;50(6):683-691; NICE DSU TSD 18; Biz, Hernández Alava, Wailoo (2026) Value in Health forthcoming.

v1.0.1 (2026-04-28) — Risk of Bias assessment tool

Added

Source

Implements design log 07 — based on Cochrane RoB 2 (Sterne et al. 2019), ROBINS-I (Sterne et al. 2016), AMSTAR-2 (Shea et al. 2017).

v0.9.8 (2026-04-22) — ITC methods, evLYG, CMS IRA context

Added

Security

v0.9.7 (2026-04-22) — UK EQ-5D-5L transition

Added

Source

Implements design log 09 — based on public OHE / EuroQol materials + Biz, Hernández Alava, Wailoo (2026). Switching from EQ-5D-3L to EQ-5D-5L in England: the impact in NICE technology appraisals. Value in Health (forthcoming).

v0.9.6 (2026-04-19)

Added

v0.9.5 (2026-04-16)

Added

v0.9.4 (2026-04-16)

Added

v0.9.3 (2026-04-16)

Fixed (from code review)

Changed

Added

v0.9.1 (2026-04-16)

Added

v0.9.0 (2026-04-16)

Added

Fixed

v0.8.0 (2026-04-16)

Added

v0.7.0 (2026-04-16)

Added

v0.6.0 (2026-04-15)

Added

v0.5.0 (2026-04-15)

Added

v0.4.0 (2026-04-15)

Added

Fixed

v0.3.0 (2026-04-14)

Added

v0.2.0 (2026-04-14)

Added

Fixed

v0.1.4 (2026-04-12)

Added

v0.1.2 (2026-04-10)

Added

Changed

v0.1.0 (2026-04-06)

Added


See also Privacy · AI Transparency · Source.