Click Below to Get the Code

Browse, clone, and build from real-world templates powered by Harper.
Blog
GitHub Logo

Noetic Caching: The Key to Smarter, Faster Chatbots

Since late 2022, Large Language Model (LLM) chatbots have revolutionized AI-driven conversations using Retrieval-Augmented Generation (RAG) to pull in relevant data for context. However, latency remains a challenge in retrieving information quickly. Noetic Caching addresses this by caching contextual data rather than responses, leveraging locality principles to store frequently used or regionally relevant data closer to users. This approach, integrated by Harper, optimizes retrieval speeds, reduces reliance on distant databases, and enhances chatbot performance, offering a balance of speed, accuracy, and cost-effectiveness for better AI experiences.
Blog

Noetic Caching: The Key to Smarter, Faster Chatbots

Vince Berk
Board Member
at Harper
February 6, 2025
Vince Berk
Board Member
at Harper
February 6, 2025
Vince Berk
Board Member
at Harper
February 6, 2025
February 6, 2025
Since late 2022, Large Language Model (LLM) chatbots have revolutionized AI-driven conversations using Retrieval-Augmented Generation (RAG) to pull in relevant data for context. However, latency remains a challenge in retrieving information quickly. Noetic Caching addresses this by caching contextual data rather than responses, leveraging locality principles to store frequently used or regionally relevant data closer to users. This approach, integrated by Harper, optimizes retrieval speeds, reduces reliance on distant databases, and enhances chatbot performance, offering a balance of speed, accuracy, and cost-effectiveness for better AI experiences.
Vince Berk
Board Member

Since November of 2022, the Large Language Model Chatbot has taken center stage.  A steady march of progress in neural network research – which aims to duplicate the functioning of the human brain through mathematical models of neurons – had reached a zenith where a convincing conversation could be had with a computer.  It appears that the various Chatbots are practically unbeatable in their knowledge of virtually any topic.  Ask any question, and some very good answers come back.

But much like anything, tipping the veil on how this works under the hood reveals a complex system with many moving parts.  In fact, the actual “chatbot” part turns out to know very little.  Other than how to converse convincingly.


How Chatbots “Know”

The magic behind the “knowing” of specific details is in large part based on a technique called RAG – retrieval augmented generation.

It works like this:  in its simplest form, the Chatbot predicts a sequence of words that would best follow the words it has recently seen.  This history is what we call a “context.”  The bigger and more specific the context, the better the conversation will go and the more details the Chatbot seems to have.  The trick, therefore, to get a Chatbot to talk about specific details of any kind is to fill this context window with as many relevant details as possible.

So when I ask a Chatbot what the frost depth is in New Hampshire, the RAG backend will quickly try to retrieve articles on “frost depth,” on “New Hampshire,” perhaps on building codes, foundation footings, etc.  These articles are then entered into the context of the Chatbot as if they were part of my question.  Only then is my question appended, and the Chatbot can now assertively answer with the details that were given to it right before.

This works the same way if you’d like a Chatbot on your site that describes, for instance, what people think of a particular product you are selling or answer a question hidden deep inside your product's documentation.  And the quicker you can feed the relevant articles to the Chatbot, the more convincing and natural the answer will appear to your user.

RAG and the Latency Dilemma

Speed is of the essence, but speed is elusive on websites with tens of millions of possible pieces of data.

Retrieval Augmented Generation relies on mapping words to these data pieces.  It allows the quick matching of all your data pieces to whatever the conversation is currently about.  Vector databases have made great advances here allowing for this matching to be quick and easy. However, if the data to be retrieved is located far away, the retrieval speed decreases. This refers to the actual articles, reviews, and other text snippets that need to be integrated into the chatbot's context.

At first glance, caching does not work.  The answer that the Chatbot gives is unique to each interaction.  Likewise, the summary of reviews is unique to the product the customer is looking for, or the synopsis of the documentation or tech support question is likewise unique.  In large and distributed sites, this seems to be an Achilles heel, and it is true in the traditional sense of caching.

Noetic Caching: Storing Knowledge Beyond the Origin

However, if we change our perspective and take the view of the fully integrated database and caching system, it is possible to see a world where the relevant contextual data is cached, and the conversations or answers produced by the Chatbot are fully unique.

Here’s how it works:  much like caching of traditional website content, most of the context that goes into a Chatbot will be similar for most conversations or questions.  Consider this “base knowledge.”  These data pages can be cached instead.  Similarly, caching of more specific data pages, articles, or reviews has a certain principle of locality to it: either in time or in location.  Put another way:  because topics ebb and flow with what is on people's minds, problems to solve, events that happen, or simply because the same people tend to connect to the same local edge web servers, there is tremendous value in caching the data before it goes into the Chatbot.

We call this Noetic Caching.  The deeply integrated design of Harper allows the caching of LLM content for RAG to happen after the vector query is resolved.  This means the data that is needed for the Chatbot to carry a meaningful conversation quickly, and with the right details is immediately available from Harper, and there is no need to retrieve this from a far-away origin database.  Nor does all the content need to be kept at the edge.

This gives an almost magical mixture of speed, accuracy, and cost that cannot be achieved in more traditional architectures.

Get Started with Noetic Caching

If you're looking to reduce the latency of generative AI, Noetic Caching could be a game-changer. Understanding how caching can optimize retrieval speeds while keeping responses relevant is key to building better AI experiences. If you’d like to explore how this works in practice, our team is happy to share insights and best practices.

Since November of 2022, the Large Language Model Chatbot has taken center stage.  A steady march of progress in neural network research – which aims to duplicate the functioning of the human brain through mathematical models of neurons – had reached a zenith where a convincing conversation could be had with a computer.  It appears that the various Chatbots are practically unbeatable in their knowledge of virtually any topic.  Ask any question, and some very good answers come back.

But much like anything, tipping the veil on how this works under the hood reveals a complex system with many moving parts.  In fact, the actual “chatbot” part turns out to know very little.  Other than how to converse convincingly.


How Chatbots “Know”

The magic behind the “knowing” of specific details is in large part based on a technique called RAG – retrieval augmented generation.

It works like this:  in its simplest form, the Chatbot predicts a sequence of words that would best follow the words it has recently seen.  This history is what we call a “context.”  The bigger and more specific the context, the better the conversation will go and the more details the Chatbot seems to have.  The trick, therefore, to get a Chatbot to talk about specific details of any kind is to fill this context window with as many relevant details as possible.

So when I ask a Chatbot what the frost depth is in New Hampshire, the RAG backend will quickly try to retrieve articles on “frost depth,” on “New Hampshire,” perhaps on building codes, foundation footings, etc.  These articles are then entered into the context of the Chatbot as if they were part of my question.  Only then is my question appended, and the Chatbot can now assertively answer with the details that were given to it right before.

This works the same way if you’d like a Chatbot on your site that describes, for instance, what people think of a particular product you are selling or answer a question hidden deep inside your product's documentation.  And the quicker you can feed the relevant articles to the Chatbot, the more convincing and natural the answer will appear to your user.

RAG and the Latency Dilemma

Speed is of the essence, but speed is elusive on websites with tens of millions of possible pieces of data.

Retrieval Augmented Generation relies on mapping words to these data pieces.  It allows the quick matching of all your data pieces to whatever the conversation is currently about.  Vector databases have made great advances here allowing for this matching to be quick and easy. However, if the data to be retrieved is located far away, the retrieval speed decreases. This refers to the actual articles, reviews, and other text snippets that need to be integrated into the chatbot's context.

At first glance, caching does not work.  The answer that the Chatbot gives is unique to each interaction.  Likewise, the summary of reviews is unique to the product the customer is looking for, or the synopsis of the documentation or tech support question is likewise unique.  In large and distributed sites, this seems to be an Achilles heel, and it is true in the traditional sense of caching.

Noetic Caching: Storing Knowledge Beyond the Origin

However, if we change our perspective and take the view of the fully integrated database and caching system, it is possible to see a world where the relevant contextual data is cached, and the conversations or answers produced by the Chatbot are fully unique.

Here’s how it works:  much like caching of traditional website content, most of the context that goes into a Chatbot will be similar for most conversations or questions.  Consider this “base knowledge.”  These data pages can be cached instead.  Similarly, caching of more specific data pages, articles, or reviews has a certain principle of locality to it: either in time or in location.  Put another way:  because topics ebb and flow with what is on people's minds, problems to solve, events that happen, or simply because the same people tend to connect to the same local edge web servers, there is tremendous value in caching the data before it goes into the Chatbot.

We call this Noetic Caching.  The deeply integrated design of Harper allows the caching of LLM content for RAG to happen after the vector query is resolved.  This means the data that is needed for the Chatbot to carry a meaningful conversation quickly, and with the right details is immediately available from Harper, and there is no need to retrieve this from a far-away origin database.  Nor does all the content need to be kept at the edge.

This gives an almost magical mixture of speed, accuracy, and cost that cannot be achieved in more traditional architectures.

Get Started with Noetic Caching

If you're looking to reduce the latency of generative AI, Noetic Caching could be a game-changer. Understanding how caching can optimize retrieval speeds while keeping responses relevant is key to building better AI experiences. If you’d like to explore how this works in practice, our team is happy to share insights and best practices.

Since late 2022, Large Language Model (LLM) chatbots have revolutionized AI-driven conversations using Retrieval-Augmented Generation (RAG) to pull in relevant data for context. However, latency remains a challenge in retrieving information quickly. Noetic Caching addresses this by caching contextual data rather than responses, leveraging locality principles to store frequently used or regionally relevant data closer to users. This approach, integrated by Harper, optimizes retrieval speeds, reduces reliance on distant databases, and enhances chatbot performance, offering a balance of speed, accuracy, and cost-effectiveness for better AI experiences.

Download

White arrow pointing right
Since late 2022, Large Language Model (LLM) chatbots have revolutionized AI-driven conversations using Retrieval-Augmented Generation (RAG) to pull in relevant data for context. However, latency remains a challenge in retrieving information quickly. Noetic Caching addresses this by caching contextual data rather than responses, leveraging locality principles to store frequently used or regionally relevant data closer to users. This approach, integrated by Harper, optimizes retrieval speeds, reduces reliance on distant databases, and enhances chatbot performance, offering a balance of speed, accuracy, and cost-effectiveness for better AI experiences.

Download

White arrow pointing right
Since late 2022, Large Language Model (LLM) chatbots have revolutionized AI-driven conversations using Retrieval-Augmented Generation (RAG) to pull in relevant data for context. However, latency remains a challenge in retrieving information quickly. Noetic Caching addresses this by caching contextual data rather than responses, leveraging locality principles to store frequently used or regionally relevant data closer to users. This approach, integrated by Harper, optimizes retrieval speeds, reduces reliance on distant databases, and enhances chatbot performance, offering a balance of speed, accuracy, and cost-effectiveness for better AI experiences.

Download

White arrow pointing right

Explore Recent Resources

Blog
GitHub Logo

How a Shopify Custom Tie Shop Exposes a Common Flaw in Agent Architecture

Explore how a Shopify-based custom tie shop reveals a critical flaw in one LLM agent design strategy, and why context-first architectures with unified runtimes deliver faster, more accurate, and scalable customer support automation.
Blog
Explore how a Shopify-based custom tie shop reveals a critical flaw in one LLM agent design strategy, and why context-first architectures with unified runtimes deliver faster, more accurate, and scalable customer support automation.
Person with short dark hair and moustache, wearing a colorful plaid shirt, smiling outdoors in a forested mountain landscape.
Aleks Haugom
Senior Manager of GTM & Marketing
Blog

How a Shopify Custom Tie Shop Exposes a Common Flaw in Agent Architecture

Explore how a Shopify-based custom tie shop reveals a critical flaw in one LLM agent design strategy, and why context-first architectures with unified runtimes deliver faster, more accurate, and scalable customer support automation.
Aleks Haugom
Apr 2026
Blog

How a Shopify Custom Tie Shop Exposes a Common Flaw in Agent Architecture

Explore how a Shopify-based custom tie shop reveals a critical flaw in one LLM agent design strategy, and why context-first architectures with unified runtimes deliver faster, more accurate, and scalable customer support automation.
Aleks Haugom
Blog

How a Shopify Custom Tie Shop Exposes a Common Flaw in Agent Architecture

Explore how a Shopify-based custom tie shop reveals a critical flaw in one LLM agent design strategy, and why context-first architectures with unified runtimes deliver faster, more accurate, and scalable customer support automation.
Aleks Haugom
Blog
GitHub Logo

Nobody Wants to Pick a Data Center (And They Shouldn't Have To)

Harper Fabric simplifies cloud deployment by eliminating the need to choose data centers, automating infrastructure, scaling, and global distribution. Built for Harper’s unified runtime, it enables developers to deploy high-performance, distributed applications quickly without managing complex cloud configurations or infrastructure overhead.
Blog
Harper Fabric simplifies cloud deployment by eliminating the need to choose data centers, automating infrastructure, scaling, and global distribution. Built for Harper’s unified runtime, it enables developers to deploy high-performance, distributed applications quickly without managing complex cloud configurations or infrastructure overhead.
Headshot of a smiling woman with shoulder-length dark hair wearing a black sweater with white stripes and a gold pendant necklace, standing outdoors with blurred trees and mountains in the background.
Bari Jay
Senior Director of Product Management
Blog

Nobody Wants to Pick a Data Center (And They Shouldn't Have To)

Harper Fabric simplifies cloud deployment by eliminating the need to choose data centers, automating infrastructure, scaling, and global distribution. Built for Harper’s unified runtime, it enables developers to deploy high-performance, distributed applications quickly without managing complex cloud configurations or infrastructure overhead.
Bari Jay
Apr 2026
Blog

Nobody Wants to Pick a Data Center (And They Shouldn't Have To)

Harper Fabric simplifies cloud deployment by eliminating the need to choose data centers, automating infrastructure, scaling, and global distribution. Built for Harper’s unified runtime, it enables developers to deploy high-performance, distributed applications quickly without managing complex cloud configurations or infrastructure overhead.
Bari Jay
Blog

Nobody Wants to Pick a Data Center (And They Shouldn't Have To)

Harper Fabric simplifies cloud deployment by eliminating the need to choose data centers, automating infrastructure, scaling, and global distribution. Built for Harper’s unified runtime, it enables developers to deploy high-performance, distributed applications quickly without managing complex cloud configurations or infrastructure overhead.
Bari Jay
Blog
GitHub Logo

New RocksDB Binding for Node.js

rocksdb-js is a modern Node.js binding for RocksDB, offering full transaction support, lazy range queries, and a TypeScript API. Built for performance and scalability, it enables reliable write-heavy workloads, real-time replication, and high-concurrency applications in Harper 5.0 and beyond.
Blog
rocksdb-js is a modern Node.js binding for RocksDB, offering full transaction support, lazy range queries, and a TypeScript API. Built for performance and scalability, it enables reliable write-heavy workloads, real-time replication, and high-concurrency applications in Harper 5.0 and beyond.
Person with short hair and rectangular glasses wearing a plaid shirt over a dark T‑shirt, smiling broadly with a blurred outdoor background of trees and hills.
Chris Barber
Staff Software Engineer
Blog

New RocksDB Binding for Node.js

rocksdb-js is a modern Node.js binding for RocksDB, offering full transaction support, lazy range queries, and a TypeScript API. Built for performance and scalability, it enables reliable write-heavy workloads, real-time replication, and high-concurrency applications in Harper 5.0 and beyond.
Chris Barber
Apr 2026
Blog

New RocksDB Binding for Node.js

rocksdb-js is a modern Node.js binding for RocksDB, offering full transaction support, lazy range queries, and a TypeScript API. Built for performance and scalability, it enables reliable write-heavy workloads, real-time replication, and high-concurrency applications in Harper 5.0 and beyond.
Chris Barber
Blog

New RocksDB Binding for Node.js

rocksdb-js is a modern Node.js binding for RocksDB, offering full transaction support, lazy range queries, and a TypeScript API. Built for performance and scalability, it enables reliable write-heavy workloads, real-time replication, and high-concurrency applications in Harper 5.0 and beyond.
Chris Barber
Blog
GitHub Logo

Open Sourcing Harper

Harper is now open source, with its core platform released under Apache 2.0 and enterprise features source-available. This shift builds trust, enables community contributions, and positions Harper as a unified, transparent platform for developers and AI-driven applications.
Blog
Harper is now open source, with its core platform released under Apache 2.0 and enterprise features source-available. This shift builds trust, enables community contributions, and positions Harper as a unified, transparent platform for developers and AI-driven applications.
Person with shoulder‑length curly brown hair and light beard wearing a gray long‑sleeve shirt, smiling outdoors with trees and greenery in the background.
Ethan Arrowood
Senior Software Engineer
Blog

Open Sourcing Harper

Harper is now open source, with its core platform released under Apache 2.0 and enterprise features source-available. This shift builds trust, enables community contributions, and positions Harper as a unified, transparent platform for developers and AI-driven applications.
Ethan Arrowood
Apr 2026
Blog

Open Sourcing Harper

Harper is now open source, with its core platform released under Apache 2.0 and enterprise features source-available. This shift builds trust, enables community contributions, and positions Harper as a unified, transparent platform for developers and AI-driven applications.
Ethan Arrowood
Blog

Open Sourcing Harper

Harper is now open source, with its core platform released under Apache 2.0 and enterprise features source-available. This shift builds trust, enables community contributions, and positions Harper as a unified, transparent platform for developers and AI-driven applications.
Ethan Arrowood
Blog
GitHub Logo

The Resource API in Harper v5: HTTP Done Right

Harper v5's Resource API maps JavaScript class methods directly to HTTP verbs, eliminating routing and translation layers. Tables extend the same Resource class, unifying HTTP handling and data access into one interface. Key v5 additions include pre-parsed RequestTarget objects, Response-aware source caching with stale-while-revalidate support, and async context tracking via getContext().
Product Update
Blog
Harper v5's Resource API maps JavaScript class methods directly to HTTP verbs, eliminating routing and translation layers. Tables extend the same Resource class, unifying HTTP handling and data access into one interface. Key v5 additions include pre-parsed RequestTarget objects, Response-aware source caching with stale-while-revalidate support, and async context tracking via getContext().
Person with very short blonde hair wearing a light gray button‑up shirt, standing with arms crossed and smiling outdoors with foliage behind.
Kris Zyp
SVP of Engineering
Blog

The Resource API in Harper v5: HTTP Done Right

Harper v5's Resource API maps JavaScript class methods directly to HTTP verbs, eliminating routing and translation layers. Tables extend the same Resource class, unifying HTTP handling and data access into one interface. Key v5 additions include pre-parsed RequestTarget objects, Response-aware source caching with stale-while-revalidate support, and async context tracking via getContext().
Kris Zyp
Apr 2026
Blog

The Resource API in Harper v5: HTTP Done Right

Harper v5's Resource API maps JavaScript class methods directly to HTTP verbs, eliminating routing and translation layers. Tables extend the same Resource class, unifying HTTP handling and data access into one interface. Key v5 additions include pre-parsed RequestTarget objects, Response-aware source caching with stale-while-revalidate support, and async context tracking via getContext().
Kris Zyp
Blog

The Resource API in Harper v5: HTTP Done Right

Harper v5's Resource API maps JavaScript class methods directly to HTTP verbs, eliminating routing and translation layers. Tables extend the same Resource class, unifying HTTP handling and data access into one interface. Key v5 additions include pre-parsed RequestTarget objects, Response-aware source caching with stale-while-revalidate support, and async context tracking via getContext().
Kris Zyp
News
GitHub Logo

Harper 5.0 Is Here: Open Source, RocksDB, and a Runtime Built for the Agentic Era

Harper 5.0 launches with a fully open-source core under Apache 2.0, RocksDB as a native storage engine alongside LMDB, and source-available Harper Pro. This release delivers a unified runtime purpose-built for agentic engineering, from prototype to production.
Product Update
News
Harper 5.0 launches with a fully open-source core under Apache 2.0, RocksDB as a native storage engine alongside LMDB, and source-available Harper Pro. This release delivers a unified runtime purpose-built for agentic engineering, from prototype to production.
Person with short dark hair and moustache, wearing a colorful plaid shirt, smiling outdoors in a forested mountain landscape.
Aleks Haugom
Senior Manager of GTM & Marketing
News

Harper 5.0 Is Here: Open Source, RocksDB, and a Runtime Built for the Agentic Era

Harper 5.0 launches with a fully open-source core under Apache 2.0, RocksDB as a native storage engine alongside LMDB, and source-available Harper Pro. This release delivers a unified runtime purpose-built for agentic engineering, from prototype to production.
Aleks Haugom
Apr 2026
News

Harper 5.0 Is Here: Open Source, RocksDB, and a Runtime Built for the Agentic Era

Harper 5.0 launches with a fully open-source core under Apache 2.0, RocksDB as a native storage engine alongside LMDB, and source-available Harper Pro. This release delivers a unified runtime purpose-built for agentic engineering, from prototype to production.
Aleks Haugom
News

Harper 5.0 Is Here: Open Source, RocksDB, and a Runtime Built for the Agentic Era

Harper 5.0 launches with a fully open-source core under Apache 2.0, RocksDB as a native storage engine alongside LMDB, and source-available Harper Pro. This release delivers a unified runtime purpose-built for agentic engineering, from prototype to production.
Aleks Haugom