Skip to main content

Documentation Index

Fetch the complete documentation index at: https://playbook.pharmatools.ai/llms.txt

Use this file to discover all available pages before exploring further.

Core principle

An agent earns its keep when the task involves multiple steps that benefit from being orchestrated by AI — typically because the steps include verification, decision-making, or adaptation based on intermediate results. Single-step tasks rarely need an agent. Multi-step verification loops often do. The decision is never “agents are better” or “agents are dangerous”. It is: does the loop add value for this task, against the cost, latency, and audit complexity it creates?

Why this matters now

Agentic AI moved from research curiosity to production-ready through 2024–2025. RefCheckr is one example: a verify-fix-recheck loop that closes itself on a claim. The pattern generalises — to systematic literature review conduct, compliance pre-screening with rewrite, evidence synthesis across studies, and more. Tool vendors and pharma innovation teams are increasingly pitching “agentic” solutions for systematic literature review, MLR triage, manuscript drafting, and evidence synthesis. Some are genuinely agentic; some are scripted pipelines marketed under the agent label. Medical writers are increasingly the people who have to assess these pitches — knowing what an agent should deliver, and what red flags to look for, is now part of the job.

What “an agent” means here

An agent, in the sense relevant to medical writing, is an AI system that:
  • Plans a sequence of steps for a goal
  • Executes each step (often by calling tools — search APIs, document retrievers, verification services)
  • Observes the result of each step
  • Adapts based on what it observed (re-tries, re-plans, escalates)
  • Loops until a goal condition is met or a limit is hit
This is different from:
  • A single-shot LLM prompt (one input, one output, no adaptation)
  • A scripted pipeline (fixed steps, no decision-making between them)
  • A human-orchestrated workflow (the human chooses what happens between steps)
The defining feature is that the AI is making decisions about what to do next, not just generating text in response to a single prompt.

Agent patterns worth knowing

The agent generates an output, checks it against a constraint, fixes the output if the check fails, then re-checks. Loops until the output passes or the limit is hit. RefCheckr’s verify → fix → re-check is the canonical example. Useful anywhere there is a clear pass/fail signal: claim-vs-source verification, compliance signal pre-screening, style-rule conformance.
The agent decomposes a task into steps, executes each, then verifies the whole. Useful for systematic literature review conduct, manuscript outline generation, and any task with clear sub-deliverables.
The agent calls external services to do parts of its work — a search API for evidence retrieval, a calculator for statistics, a document store for source grounding. The reasoning is in the model; the answers come from the tools. PubCrawl, when called via MCP, is a tool an agent can use.
The agent drafts, critiques its own draft, then refines. Useful for prose polishing under explicit constraints (style guide, audience adaptation, plain-language rewriting).
Multiple specialised agents hand off — a “drafter” passes output to a “verifier”, which passes to a “compliance reviewer”. Useful when sub-tasks need different prompts, models, or tools. More complex to audit.

When an agent earns its keep

Signal that an agent fitsWhy
The task has a natural verification gateThe loop has a place to close
Errors compound across stepsMulti-step thinking outperforms single-shot
The task is high-volumeAutomation pays back the build cost
Intermediate steps would benefit from tool useAgent can route to the right tool per step
You can audit each stepThe agent’s value is auditable, not just visible in the final output
Concrete examples in medical writing:
ApplicationPatternStatus (mid-2026)
Claim verification (RefCheckr)Verify-fix loopProduction
Compliance pre-screen with rewriteVerify-fix loopEmerging
Systematic literature review conductPlan-execute-verify with tool useEmerging
Evidence synthesis across multiple studiesPlan-execute-verifyEmerging
Manuscript drafting with reference self-checkIterative refinementEmerging
Plain-language summary with source verificationVerify-fix loopEmerging
Pharmacovigilance signal triage (high-risk)Plan-execute-verifyEmerging — flag as high-risk under EU AI Act

When an agent is the wrong tool

“Draft a discussion paragraph from this finding” is a single-shot task. Wrapping it in an agent adds latency and cost without adding value.
A board-bound regulatory document drafted once and reviewed by humans does not benefit from agentic decomposition. The cost of the loop exceeds its value when volume is low and stakes are high enough that humans must verify everything anyway.
If checking the agent’s output is more work than doing the task by hand, the agent is creating work, not saving it. This is common when the “verification” requires the same expert judgement as the original task.
An agent whose decisions are opaque is an audit liability under Review and Accountability and the EU AI Act. If you cannot show what the agent did at each step, the loop is faster but also harder to defend.
Agents take longer than single-shot prompts. For interactive use (turn-by-turn editing, live brainstorming), the latency makes them painful. Save the loop for back-end work.

Failure modes specific to agents

The AI failure modes page covers the core risks. Agents add several patterns on top:
  • Cascading errors: One wrong intermediate step poisons every downstream step. The final output is fluent and confidently wrong.
  • Reasoning chain fabrication: The agent’s “thinking” can include invented citations or fabricated checks that pass its own verification but fail real verification. Covered in Choosing Your Model.
  • Tool misuse: Agents calling the wrong tool, or the right tool with wrong parameters. A claim sent to a literature search instead of a source-paper verifier looks like work but produces noise.
  • Cost and latency runaway: Loops that don’t converge, or that keep retrying. Cap iterations explicitly.
  • Prompt injection at any step: Each step that ingests external content (a paper, a web result, a user upload) is a potential injection point. Defend at each.
  • Audit-trail loss: Multi-step agents are harder to log meaningfully than single-shot prompts. Plan logging upfront, not after the fact.

Practical assessment for medical writing teams

Before commissioning or using an agentic workflow:
A fixed pipeline (always step 1 → step 2 → step 3, no branching) is not an agent and does not need agent infrastructure. Calling a fixed pipeline an “agent” is marketing, not architecture. Save the agent vocabulary for systems that adapt based on intermediate observations.
Agents earn their value at verification gates. If you can’t point to the moment “the agent checks itself”, the loop is open and the system is just a longer pipeline. Ask: what is the pass/fail signal that ends the loop?
Total cost = tokens per step × steps per run × runs. Reasoning models in agentic loops can be 10–100× the cost of equivalent single-shot prompts. Budget for it, and cap iterations.
For each step, can you produce a record of: input, model used, tool called (if any), output, decision rule that led to the next step? If not, you don’t have an audit trail; you have a black box. This affects Review and Accountability and EU AI Act documentation.
The playbook’s risk tier framework applies to agentic workflows. High-risk and critical work requires more verification at every step, not less, regardless of how clever the agent is.

Common mistakes

“Drafting a paragraph” does not need an agent. Adding plan-execute-verify around it makes it slower, costlier, and harder to audit, with no quality benefit.
A loop without a check at the end is just a pipeline that runs more steps. The closing check is the entire point. If there isn’t one, redesign — or use a single-shot prompt.
An agent saying “I verified this against the source” is not verification. Either run independent verification (a second model, a deterministic check) or treat the self-report as a hint, not an assurance.
Always cap iterations. Three to five passes is typical. Beyond that, the loop is unlikely to converge and is burning budget.
“We’re using agents” is not an architecture statement. The pattern (verify-fix vs. plan-execute vs. multi-agent) determines the failure modes, the audit requirements, and the cost profile. Specify the pattern.

How this connects to other playbook principles

  • Closed-loop AI: The verify-fix loop is the foundational agentic pattern. RefCheckr is the worked example.
  • Choosing Your Model: Agents typically use reasoning models for the planning and verification steps; standard models for generation. Mixing classes is the cost-efficient default.
  • Source grounding: Agents do not exempt content from source grounding. Each generation step still needs verifiable source attribution.
  • Review and accountability: The audit-trail requirements apply per agent step, not per overall agent run. Plan logging upfront.
  • AI Regulation in Pharma: Some agentic uses (PV signal triage, clinical decision support) cross into high-risk territory under the EU AI Act. Map the use, not the technology.
  • AI failure modes: Agents inherit single-shot failure modes and add new ones (cascading errors, tool misuse, audit-trail loss).

The bottom line

Agents are not better or worse than single-shot prompts; they are different tools for different tasks. The pattern earns its keep on multi-step verification, high-volume work, and tasks with natural verification gates. It is the wrong tool for single-step generation, low-volume one-shots, and anywhere the loop cannot close audibly. Specify the pattern, cap the iterations, log every step — and verify the agent the same way you would verify any other AI output.
Last reviewed: 4 May 2026 · 8 min read