Four Agents, One Governance Layer: What the April 2026 DMCommunity Challenge Taught Us

The April 2026 DMCommunity Challenge asked participants to build an agentic medical services system for acute sinusitis treatment. The scenario was specific: a patient presents with sinus symptoms, and an AI agent pipeline must work through renal function assessment, drug interaction checking, medication selection, and documentation, all in sequence, each step depending on the last.

It is the kind of problem that looks deceptively straightforward until you try to build it for a production context. And production context is where the interesting questions live.

What We Built

Our submission used four standalone FastAPI microagents, each scoped to a single clinical function:

Creatinine Clearance Agent calculates renal function using the Cockcroft-Gault formula and applies dosing adjustments based on the result. Renal impairment changes the therapeutic window for most antibiotics; this agent runs first because everything downstream depends on it.

Drug Interaction Agent checks the proposed treatment against a patient’s existing medications using a structured CSV-backed interaction database. It does not hallucinate interactions. It looks them up.

Sinusitis Medication Agent applies six deterministic treatment rules to select the appropriate antibiotic, dose, and duration based on severity, allergy profile, and the renal adjustment from step one. The rules are explicit and auditable.

Rules Export Agent produces the decision rationale in multiple formats (Markdown, HTML, plain text) so a business analyst or clinician can read exactly which rules fired and why.

Four agents. Each one small enough to be tested independently. Each one doing exactly one thing.

Two Paths Through the Pipeline

The submission exposed two execution paths.

Path A runs the agents directly via five MCP tools. Input flows in, results flow out. It is fast, clean, and works well in a sandbox.

Path B routes every agent call through Novus Forge before execution. This is where things get interesting.

When a call goes through Novus Forge, it passes through the governance interceptor chain: PII scanning, policy rule evaluation, audit logging, and the Decision Integrity check via Novus Inspector. Every agent invocation gets a telemetry record. Every output is stamped with the evaluation model version, the token count, and a confidence signal.

Path A answers the clinical question. Path B answers the clinical question and creates a verifiable audit trail of how that answer was reached.

Why Governance Changes the Conversation

Most discussions about AI agents in healthcare focus on accuracy. Can the model identify the right drug? Does it know the dosing guidelines? These are legitimate questions. They are also the easy questions.

The harder question is: when something goes wrong, can you reconstruct exactly what happened and why?

In a regulated environment (and healthcare is one of the most regulated environments in the world), the answer to that question has to be yes. Not “probably yes.” Not “we could reconstruct it if we still have the logs.” Yes, with cryptographic evidence, with a hash chain, with timestamps that cannot be altered after the fact.

That is what the governance path demonstrates. The Agentic Interaction Ledger (our AIL component) records each interaction with a hash-chained audit entry. The ReasoningInterceptor captures the model’s chain-of-thought where enabled. The Decision Integrity interceptor runs a faithfulness and hallucination check against the output before it leaves the pipeline.

A Path A answer and a Path B answer might contain the same clinical recommendation. The Path B answer also contains proof that the recommendation was evaluated, that the model was not hallucinating, that no PII leaked between steps, and that the business rules were applied correctly.

In a legal dispute, in a regulatory audit, in a malpractice review: that proof is not optional.

What the Challenge Reveals About Agent Architecture

Building for a challenge is different from building for production. Challenges reward correctness. Production rewards correctness and observability, auditability, and failure recovery.

A few things stood out during this build:

Specialization is a feature, not a limitation. Each of our four agents does one thing. This is sometimes framed as an architectural weakness: more moving parts, more failure surfaces. We see it as the opposite. A single-purpose agent is testable in isolation. Its failure modes are predictable. When the drug interaction agent fails, you know exactly what failed. With a monolithic agent that does everything, failure attribution is guesswork.

Deterministic rules belong in deterministic systems. The medication selection logic uses six explicit rules. We did not use an LLM to select the medication. We used an LLM to structure the intake data, then applied rules. This is the correct division of labor. LLMs are good at language. Rules engines are good at rules. Conflating them produces systems that are hard to explain and impossible to audit.

The governance layer is not overhead. It is common to frame compliance tooling as friction: something you bolt on to satisfy a requirement. The architecture we used here shows a different model. Governance runs inline. Every agent call that goes through Novus Forge produces a richer output than one that does not: more metadata, more traceability, more defensibility. The overhead is real (a few hundred milliseconds per call). The value is also real, and it compounds over time as the audit log grows.

Where This Points

The DMCommunity Challenge is a community learning exercise. The problems are constrained, the scenarios are clean, and there is no real patient at the end of the pipeline. But the architectural questions it surfaces are the same ones that healthcare systems, insurance carriers, and financial institutions are grappling with in production.

How do you build AI agent systems that can be audited? How do you separate the decision logic from the language model? How do you prove, after the fact, that a recommendation was grounded in the data you provided rather than something the model invented?

These are not hypothetical questions. They are the questions that determine whether an AI deployment survives regulatory scrutiny.

The challenge submission is a working reference implementation of one answer. The full submission details are available at the DMCommunity review post.