Click Below to Get the Code

Browse, clone, and build from real-world templates powered by Harper.
Tutorial
GitHub Logo

Harper's AI Stack: Models API, Agent Loop, and Built-in Agent

Harper 5.1 ships three layered AI capabilities: a provider-agnostic Models API supporting OpenAI, Anthropic, Bedrock, and Ollama; a built-in agentic loop via toolMode: 'auto' with hard budget controls; and an opt-in Harper Agent component. Switching providers is a config change, not a code change.
A.I.
Tutorial
A.I.

Harper's AI Stack: Models API, Agent Loop, and Built-in Agent

Kris Zyp
SVP of Engineering
at Harper
June 18, 2026
Kris Zyp
SVP of Engineering
at Harper
June 18, 2026
Kris Zyp
SVP of Engineering
at Harper
June 18, 2026
June 18, 2026
Harper 5.1 ships three layered AI capabilities: a provider-agnostic Models API supporting OpenAI, Anthropic, Bedrock, and Ollama; a built-in agentic loop via toolMode: 'auto' with hard budget controls; and an opt-in Harper Agent component. Switching providers is a config change, not a code change.
Kris Zyp
SVP of Engineering

Harper 5.1 ships three related AI capabilities that are worth describing together because they build on each other: a provider-agnostic Models API, a built-in agent loop on scope.models.generate(), and an opt-in Harper Agent component. Each can be used independently, but the full picture is an application platform where AI is a first-class runtime concern rather than an external service you bolt on.

The Models API

scope.models is a provider-agnostic LLM interface available in Harper resource handlers and component code. You configure providers in YAML and call them by logical name:

models:
  main:
    provider: anthropic
    model: claude-sonnet-4-6
  embedder:
    provider: openai
    model: text-embedding-3-small
  local:
    provider: ollama
    model: llama3

In a resource handler:

export class ArticleSummary extends Resource {
  async get(query) {
    const article = await Article.get(query.id);
    const result = await scope.models.generate({
      model: 'main',
      messages: [{ role: 'user', content: `Summarize: ${article.body}` }],
      maxTokens: 256
    });
    return { summary: result.content };
  }
}

Supported providers in 5.1: OpenAI, Anthropic, AWS Bedrock (via AWS SDK as an optional peer dependency), and Ollama. All four share the same generate / generateStream / embed interface — switching providers is a config change, not a code change.

Model usage is recorded in Harper's analytics system, so embed and generate call counts and token volumes show up in the standard analytics tables alongside your other operational metrics.

toolMode: 'auto' — the agent loop

Previously scope.models.generate() was single-shot. In 5.1, passing toolMode: 'auto' enables a built-in agentic loop: the model receives tools, the loop dispatches tool calls, appends results to the conversation, and continues until the model stops requesting tools or a budget is exhausted.

const result = await scope.models.generate({
  model: 'main',
  messages,
  toolMode: 'auto',
  tools: myTools,
  maxToolIterations: 10,
  maxCostUsd: 0.50,
  includeToolTrace: true
});

if (result.trace) {
  // Array of { tool, input, output } for each call in the loop
  console.log(result.trace);
}

Budget parameters act as hard stops:

  • maxToolIterations — maximum number of tool-call rounds
  • maxToolTokens — cumulative token budget across the loop (distinct from per-call maxTokens)
  • maxCostUsd — estimated cost ceiling

When a budget is exceeded, generate throws a BudgetExceededError that includes the partial tool trace — useful for logging what the model was doing when it ran out of budget.

toolParallelism: 'parallel' lets the loop dispatch independent tool calls concurrently within each iteration. For tool-heavy prompts where calls don't depend on each other, this can meaningfully reduce end-to-end latency.

The built-in Harper Agent

For cases where you want a general-purpose agent without writing your own loop or tool dispatch, 5.1 ships a built-in harper-agent component. It's disabled by default:

agent:
  enabled: true
  autoApprove: false # require explicit approval for destructive tools

Once enabled, six operations are available:

  • agent_prompt — submit a prompt and receive a session ID
  • get_agent_session — fetch the full transcript for a session
  • list_agent_sessions — list recent sessions
  • cancel_agent_run — stop a running session
  • approve_agent_action — approve or deny a pending destructive tool call
  • set_agent_config — update runtime config

Transcripts are persisted in system.hdb_agent_session, so sessions survive restarts and are queryable like any other Harper table.

The built-in tools available to the agent include filesystem access (scoped to componentsRoot, logDir, and configDir), schedule_followup, and http_fetch. Destructive filesystem operations require explicit approval via approve_agent_action when autoApprove: false, which is the default. When a destructive tool call is pending, the agent loop halts and waits.

What this is and isn't for

The built-in agent is primarily useful as an operator tool: deploying components, reading logs, diagnosing issues, adjusting configuration. It runs with operator-level access, so it should not be exposed directly to end users without careful consideration of the trust boundary.

For application-facing AI features — summarization, classification, RAG — the scope.models API and toolMode: 'auto' are the right primitives. You write the resource handler and control exactly what tools are available.

The two paths converge over time: the agent component will eventually use the same MCP tool registry as the server-side MCP implementation, meaning any tools you expose over MCP are also available to the built-in agent. That integration is ongoing work.

Harper 5.1 ships three related AI capabilities that are worth describing together because they build on each other: a provider-agnostic Models API, a built-in agent loop on scope.models.generate(), and an opt-in Harper Agent component. Each can be used independently, but the full picture is an application platform where AI is a first-class runtime concern rather than an external service you bolt on.

The Models API

scope.models is a provider-agnostic LLM interface available in Harper resource handlers and component code. You configure providers in YAML and call them by logical name:

models:
  main:
    provider: anthropic
    model: claude-sonnet-4-6
  embedder:
    provider: openai
    model: text-embedding-3-small
  local:
    provider: ollama
    model: llama3

In a resource handler:

export class ArticleSummary extends Resource {
  async get(query) {
    const article = await Article.get(query.id);
    const result = await scope.models.generate({
      model: 'main',
      messages: [{ role: 'user', content: `Summarize: ${article.body}` }],
      maxTokens: 256
    });
    return { summary: result.content };
  }
}

Supported providers in 5.1: OpenAI, Anthropic, AWS Bedrock (via AWS SDK as an optional peer dependency), and Ollama. All four share the same generate / generateStream / embed interface — switching providers is a config change, not a code change.

Model usage is recorded in Harper's analytics system, so embed and generate call counts and token volumes show up in the standard analytics tables alongside your other operational metrics.

toolMode: 'auto' — the agent loop

Previously scope.models.generate() was single-shot. In 5.1, passing toolMode: 'auto' enables a built-in agentic loop: the model receives tools, the loop dispatches tool calls, appends results to the conversation, and continues until the model stops requesting tools or a budget is exhausted.

const result = await scope.models.generate({
  model: 'main',
  messages,
  toolMode: 'auto',
  tools: myTools,
  maxToolIterations: 10,
  maxCostUsd: 0.50,
  includeToolTrace: true
});

if (result.trace) {
  // Array of { tool, input, output } for each call in the loop
  console.log(result.trace);
}

Budget parameters act as hard stops:

  • maxToolIterations — maximum number of tool-call rounds
  • maxToolTokens — cumulative token budget across the loop (distinct from per-call maxTokens)
  • maxCostUsd — estimated cost ceiling

When a budget is exceeded, generate throws a BudgetExceededError that includes the partial tool trace — useful for logging what the model was doing when it ran out of budget.

toolParallelism: 'parallel' lets the loop dispatch independent tool calls concurrently within each iteration. For tool-heavy prompts where calls don't depend on each other, this can meaningfully reduce end-to-end latency.

The built-in Harper Agent

For cases where you want a general-purpose agent without writing your own loop or tool dispatch, 5.1 ships a built-in harper-agent component. It's disabled by default:

agent:
  enabled: true
  autoApprove: false # require explicit approval for destructive tools

Once enabled, six operations are available:

  • agent_prompt — submit a prompt and receive a session ID
  • get_agent_session — fetch the full transcript for a session
  • list_agent_sessions — list recent sessions
  • cancel_agent_run — stop a running session
  • approve_agent_action — approve or deny a pending destructive tool call
  • set_agent_config — update runtime config

Transcripts are persisted in system.hdb_agent_session, so sessions survive restarts and are queryable like any other Harper table.

The built-in tools available to the agent include filesystem access (scoped to componentsRoot, logDir, and configDir), schedule_followup, and http_fetch. Destructive filesystem operations require explicit approval via approve_agent_action when autoApprove: false, which is the default. When a destructive tool call is pending, the agent loop halts and waits.

What this is and isn't for

The built-in agent is primarily useful as an operator tool: deploying components, reading logs, diagnosing issues, adjusting configuration. It runs with operator-level access, so it should not be exposed directly to end users without careful consideration of the trust boundary.

For application-facing AI features — summarization, classification, RAG — the scope.models API and toolMode: 'auto' are the right primitives. You write the resource handler and control exactly what tools are available.

The two paths converge over time: the agent component will eventually use the same MCP tool registry as the server-side MCP implementation, meaning any tools you expose over MCP are also available to the built-in agent. That integration is ongoing work.

Harper 5.1 ships three layered AI capabilities: a provider-agnostic Models API supporting OpenAI, Anthropic, Bedrock, and Ollama; a built-in agentic loop via toolMode: 'auto' with hard budget controls; and an opt-in Harper Agent component. Switching providers is a config change, not a code change.

Download

White arrow pointing right
Harper 5.1 ships three layered AI capabilities: a provider-agnostic Models API supporting OpenAI, Anthropic, Bedrock, and Ollama; a built-in agentic loop via toolMode: 'auto' with hard budget controls; and an opt-in Harper Agent component. Switching providers is a config change, not a code change.

Download

White arrow pointing right
Harper 5.1 ships three layered AI capabilities: a provider-agnostic Models API supporting OpenAI, Anthropic, Bedrock, and Ollama; a built-in agentic loop via toolMode: 'auto' with hard budget controls; and an opt-in Harper Agent component. Switching providers is a config change, not a code change.

Download

White arrow pointing right

Explore Recent Resources

Tutorial
GitHub Logo

Harper's AI Stack: Models API, Agent Loop, and Built-in Agent

Harper 5.1 ships three layered AI capabilities: a provider-agnostic Models API supporting OpenAI, Anthropic, Bedrock, and Ollama; a built-in agentic loop via toolMode: 'auto' with hard budget controls; and an opt-in Harper Agent component. Switching providers is a config change, not a code change.
A.I.
Tutorial
Harper 5.1 ships three layered AI capabilities: a provider-agnostic Models API supporting OpenAI, Anthropic, Bedrock, and Ollama; a built-in agentic loop via toolMode: 'auto' with hard budget controls; and an opt-in Harper Agent component. Switching providers is a config change, not a code change.
Person with very short blonde hair wearing a light gray button‑up shirt, standing with arms crossed and smiling outdoors with foliage behind.
Kris Zyp
SVP of Engineering
Tutorial

Harper's AI Stack: Models API, Agent Loop, and Built-in Agent

Harper 5.1 ships three layered AI capabilities: a provider-agnostic Models API supporting OpenAI, Anthropic, Bedrock, and Ollama; a built-in agentic loop via toolMode: 'auto' with hard budget controls; and an opt-in Harper Agent component. Switching providers is a config change, not a code change.
Kris Zyp
Jun 2026
Tutorial

Harper's AI Stack: Models API, Agent Loop, and Built-in Agent

Harper 5.1 ships three layered AI capabilities: a provider-agnostic Models API supporting OpenAI, Anthropic, Bedrock, and Ollama; a built-in agentic loop via toolMode: 'auto' with hard budget controls; and an opt-in Harper Agent component. Switching providers is a config change, not a code change.
Kris Zyp
Tutorial

Harper's AI Stack: Models API, Agent Loop, and Built-in Agent

Harper 5.1 ships three layered AI capabilities: a provider-agnostic Models API supporting OpenAI, Anthropic, Bedrock, and Ollama; a built-in agentic loop via toolMode: 'auto' with hard budget controls; and an opt-in Harper Agent component. Switching providers is a config change, not a code change.
Kris Zyp
Comparison
GitHub Logo

Kafka-Centered Stacks vs. a Single Harper Cluster: Where Real-Time Latency Actually Comes From

End-to-end latency in real-time pipelines comes from coordination across systems, not from any single component. Four common workloads, tested two ways, show where multi-hop architectures compound delays and where collapsing storage, messaging, and compute into one runtime changes the math.
Cache
Comparison
End-to-end latency in real-time pipelines comes from coordination across systems, not from any single component. Four common workloads, tested two ways, show where multi-hop architectures compound delays and where collapsing storage, messaging, and compute into one runtime changes the math.
Person with short dark hair and moustache, wearing a colorful plaid shirt, smiling outdoors in a forested mountain landscape.
Aleks Haugom
Senior Manager of GTM
Comparison

Kafka-Centered Stacks vs. a Single Harper Cluster: Where Real-Time Latency Actually Comes From

End-to-end latency in real-time pipelines comes from coordination across systems, not from any single component. Four common workloads, tested two ways, show where multi-hop architectures compound delays and where collapsing storage, messaging, and compute into one runtime changes the math.
Aleks Haugom
Jun 2026
Comparison

Kafka-Centered Stacks vs. a Single Harper Cluster: Where Real-Time Latency Actually Comes From

End-to-end latency in real-time pipelines comes from coordination across systems, not from any single component. Four common workloads, tested two ways, show where multi-hop architectures compound delays and where collapsing storage, messaging, and compute into one runtime changes the math.
Aleks Haugom
Comparison

Kafka-Centered Stacks vs. a Single Harper Cluster: Where Real-Time Latency Actually Comes From

End-to-end latency in real-time pipelines comes from coordination across systems, not from any single component. Four common workloads, tested two ways, show where multi-hop architectures compound delays and where collapsing storage, messaging, and compute into one runtime changes the math.
Aleks Haugom