Click Below to Get the Code

Browse, clone, and build from real-world templates powered by Harper.
Blog
GitHub Logo

What is Polyglot Persistence and Why is it Awful?

This post examines the concept of polyglot persistence—using multiple data storage technologies for one application—and argues that mixing databases often adds needless complexity. It suggests that choosing a single versatile database can simplify development and improve performance.
Blog

What is Polyglot Persistence and Why is it Awful?

Stephen Goldberg
CEO & Co-Founder
at Harper
April 23, 2019
Stephen Goldberg
CEO & Co-Founder
at Harper
April 23, 2019
Stephen Goldberg
CEO & Co-Founder
at Harper
April 23, 2019
April 23, 2019
This post examines the concept of polyglot persistence—using multiple data storage technologies for one application—and argues that mixing databases often adds needless complexity. It suggests that choosing a single versatile database can simplify development and improve performance.
Stephen Goldberg
CEO & Co-Founder

According to Wikipedia, Polyglot persistence is the concept of using different data storage technologies to handle different data storage needs within a given software application.”  James Serra, in his blog writes,  “Polyglot Persistence is a fancy term to mean that when storing data, it is best to use multiple data storage technologies, chosen based upon the way data is being used by individual applications or components of a single application.  Different kinds of data are best dealt with different data stores.

The logic behind this methodology according to Wikipedia, and most other sources is, “There are numerous databases available to solve different problems.  Using a single database to satisfy all of a program's requirements can result in a non-performant, "jack of all trades, master of none" solution.  Relational databases, for example, are good at enforcing relationships that exist between various data tables.  To discover a relationship or to find data from different tables that belong to the same object, a SQLjoin operation can be used.  This might work when the data is smaller in size, but becomes problematic when the data involved grows larger.  A graph database might solve the problem of relationships in the case of Big Data, but it might not solve the problem of database transactions, which are provided by RDBM systems.  Instead, a NoSQLdocument database might be used to store unstructured data for that particular part of the problem.  Thus, different problems are solved by different database systems, all within the same application.” 

As James Serra notes, this is a “fancy term”, and it certainly sounds smart.  It has clearly become the reigning paradigm for most large scale data management implementations; however, I would argue that it is a terrible idea.  Why is it such a bad idea? The answer is pretty simple - consistency, cost, and complexity.  

The three Cs - Consistency, Cost, and Complexity

The idea that you should adopt the right tool for the job seems sound.  When it comes to implementation in the software world it often is a good idea.  Windows and OSX are great operating systems for end-user interfaces for laptops, mobile devices, and desktops, but far from ideal for server environments.  Conversely, I wouldn’t want to support my sales team using Linux, I’ve been there and done that during my days at Red Hat - it was awful.    

 I think at this point it’s pretty clear that data is the lifeblood of any organization regardless of size or industry.  One could argue, this belief has gone too far.  I recently listened to a podcast where the CEO of one of the world’s largest auto manufacturers claimed they weren’t a car company anymore, but a “data platform company”.  This made me roll my eyes.  Despite this, while my firm belief is that car companies should build cars, data is still a vital asset to any organization, and we all know this. 

So, if we go back to my three Cs, consistency, cost, and complexity, let’s examine how the pervasive concept of polyglot persistence is a major threat to those areas of an organization’s data management strategy.   

While my career has taken many twists and turns, I have basically spent the entirety of it trying to achieve one single goal for organizations like Red Hat, Nissan North America, The Charlotte Hornets, Parkwood Entertainment, and many others - get a single view of their data.  I have learned an enormous amount on the hundreds of projects I have worked, trying to provide a single view of the truth, and I have fought one battle time and time again, consistency.    

By introducing many different databases, as the Wikipedia article suggests above, into their technology ecosystems, companies inevitably create a situation where their data is inconsistent. Add to that the fact that we are consuming data at a frequency never before seen it becomes nearly impossible to keep the data in synch.  The very nature of polyglot persistence ensures this, as it states that certain types of data should live in certain types of systems.  That would be fine if there was an easy way to access it holistically across these systems, but that simply doesn’t exist.     A year or two ago, many folks argued with me that the solution to this problem was data lakes like Hadoop and other technologies, but I haven’t heard that argument very often in the last 6 to 12 months.  Why, because data goes to data lakes to die.  They are slow, expensive, difficult to maintain, and make it challenging to get a near real-time view of your data.

The issue is that this model requires a significant reliance on memory and CPU for each data silo to perform on read transformations and calculations of data.  These systems are being asked to essentially do double duty; their primary function that they have been designated for in a polyglot model, and then function as a worker for a data lake.  This is over taxing these systems, adds to latency and data corruption, and creates a lot of complexity.  

I fully agree that  RDBMS’s are ideal for relationships and transactions but fail at massive scale. That said what you end up with in a polyglot persistence paradigm is an inability to get a consistent view of your data across your entire organization.   

A Database for IoT and the convergence of OT and IT

All data is valuable because of its relationships.  To truly achieve the promise of Industry 4.0, it will be essential to drive a convergence of OT and IT.  This is combining operational technology (OT) data with IT data.  OT data comes at a very high frequency.  A single electrical relay can put off 2000 records a second.  One project we are working on, that is smaller in scale, has 40 electrical relays - that’s 80,000 records a second.  This power consumption data, to be valuable, needs to be combined with production data in other systems like ERPs.  These relationships will drive the value of that data.  For example, being able to understand in real-time, what the cost in power is to produce a unit of inventory, is a question that would need to be answered.  This requires a database for IoT as well as a database that can functionally handle IT data.  

Most folks would use a polyglot persistence model to achieve this.  They would use a highly scalable streaming data solution, or an edge database, to consume the OT power data.  They would then use an RDBMS to consume the inventory data.  How then do we correlate those in real-time?  Most likely by sending them to a third system.  Things will get lost in transit, integrations will break, and consistency is lost.   

The True Cost of Polyglot Persistence 

Furthermore, this is highly complex.  As we begin to add additional systems for each of these data types, we need additional specialized resources in both people and hardware to maintain them. We also need multiple integration layers that often times lack a dynamic nature, and ultimately become the failure points in these architectures.  The more and more layers we add to these architectures, the more challenging it becomes to determine consistency and to manage this complexity.  It also adds significant costs to house the same data in multiple places, as well as increased compute costs.   

We are also paying in lost productivity more than anywhere else.  How long does it take to triage an issue in your operational data pipeline when you have 5 to 7 different sources of the truth?  How do you determine what is causing data corruption?    

There is also a major risk in terms of compliance.  If we look at the different data breaches across social media companies, credit bureaus, financial institutions, etc. how much time has it taken for them to diagnose the real effect of those breaches?  Why is that?  The answer is pretty simple, they don’t have a holistic picture of their data nor a unified audit trail on said data.  This is becoming more and more dangerous as the impact of these breaches becomes more dramatic effecting personal data while more things become connected.   

What is the solution?

I am not suggesting we go back to the days of monolithic RDBMS environments.  I think it’s clear that paradigm is over.  Nor am I suggesting that we abandon many of the products we currently are using.  Many of the developer tools are awesome for different uses.  Tools for search like ElastiCache have become vital parts of the technology ecosystem, and in-memory databases play important roles for areas where very high speed search on relational data is needed.   

What I am suggesting is that we need to look at data architectures that provide a single persistence model across all these tools, providing those tools with the stability and consistency that they require.  Creating data planes with persistence that can function as middleware layers as well as stream processing engines will be key to reducing complexity.    

If we stop relying on each of these individual tools for long term persistence, but rather view the data inside them as transient, we can then accept the fact that their version of the data might be out of synch and they might crash.  If we are able to put persistence in a stream processing layer with ACID compliance and very high stability, we can then rely on that layer to provide a holistic view of our data.  Stop overtaxing these systems with data lakes where storage algorithms make it impossible to do performant transformation and aggregation, but rather allow these end-point data stores to do their jobs and provide that functionality in a layer that can be used as an operational data pipeline.

According to Wikipedia, Polyglot persistence is the concept of using different data storage technologies to handle different data storage needs within a given software application.”  James Serra, in his blog writes,  “Polyglot Persistence is a fancy term to mean that when storing data, it is best to use multiple data storage technologies, chosen based upon the way data is being used by individual applications or components of a single application.  Different kinds of data are best dealt with different data stores.

The logic behind this methodology according to Wikipedia, and most other sources is, “There are numerous databases available to solve different problems.  Using a single database to satisfy all of a program's requirements can result in a non-performant, "jack of all trades, master of none" solution.  Relational databases, for example, are good at enforcing relationships that exist between various data tables.  To discover a relationship or to find data from different tables that belong to the same object, a SQLjoin operation can be used.  This might work when the data is smaller in size, but becomes problematic when the data involved grows larger.  A graph database might solve the problem of relationships in the case of Big Data, but it might not solve the problem of database transactions, which are provided by RDBM systems.  Instead, a NoSQLdocument database might be used to store unstructured data for that particular part of the problem.  Thus, different problems are solved by different database systems, all within the same application.” 

As James Serra notes, this is a “fancy term”, and it certainly sounds smart.  It has clearly become the reigning paradigm for most large scale data management implementations; however, I would argue that it is a terrible idea.  Why is it such a bad idea? The answer is pretty simple - consistency, cost, and complexity.  

The three Cs - Consistency, Cost, and Complexity

The idea that you should adopt the right tool for the job seems sound.  When it comes to implementation in the software world it often is a good idea.  Windows and OSX are great operating systems for end-user interfaces for laptops, mobile devices, and desktops, but far from ideal for server environments.  Conversely, I wouldn’t want to support my sales team using Linux, I’ve been there and done that during my days at Red Hat - it was awful.    

 I think at this point it’s pretty clear that data is the lifeblood of any organization regardless of size or industry.  One could argue, this belief has gone too far.  I recently listened to a podcast where the CEO of one of the world’s largest auto manufacturers claimed they weren’t a car company anymore, but a “data platform company”.  This made me roll my eyes.  Despite this, while my firm belief is that car companies should build cars, data is still a vital asset to any organization, and we all know this. 

So, if we go back to my three Cs, consistency, cost, and complexity, let’s examine how the pervasive concept of polyglot persistence is a major threat to those areas of an organization’s data management strategy.   

While my career has taken many twists and turns, I have basically spent the entirety of it trying to achieve one single goal for organizations like Red Hat, Nissan North America, The Charlotte Hornets, Parkwood Entertainment, and many others - get a single view of their data.  I have learned an enormous amount on the hundreds of projects I have worked, trying to provide a single view of the truth, and I have fought one battle time and time again, consistency.    

By introducing many different databases, as the Wikipedia article suggests above, into their technology ecosystems, companies inevitably create a situation where their data is inconsistent. Add to that the fact that we are consuming data at a frequency never before seen it becomes nearly impossible to keep the data in synch.  The very nature of polyglot persistence ensures this, as it states that certain types of data should live in certain types of systems.  That would be fine if there was an easy way to access it holistically across these systems, but that simply doesn’t exist.     A year or two ago, many folks argued with me that the solution to this problem was data lakes like Hadoop and other technologies, but I haven’t heard that argument very often in the last 6 to 12 months.  Why, because data goes to data lakes to die.  They are slow, expensive, difficult to maintain, and make it challenging to get a near real-time view of your data.

The issue is that this model requires a significant reliance on memory and CPU for each data silo to perform on read transformations and calculations of data.  These systems are being asked to essentially do double duty; their primary function that they have been designated for in a polyglot model, and then function as a worker for a data lake.  This is over taxing these systems, adds to latency and data corruption, and creates a lot of complexity.  

I fully agree that  RDBMS’s are ideal for relationships and transactions but fail at massive scale. That said what you end up with in a polyglot persistence paradigm is an inability to get a consistent view of your data across your entire organization.   

A Database for IoT and the convergence of OT and IT

All data is valuable because of its relationships.  To truly achieve the promise of Industry 4.0, it will be essential to drive a convergence of OT and IT.  This is combining operational technology (OT) data with IT data.  OT data comes at a very high frequency.  A single electrical relay can put off 2000 records a second.  One project we are working on, that is smaller in scale, has 40 electrical relays - that’s 80,000 records a second.  This power consumption data, to be valuable, needs to be combined with production data in other systems like ERPs.  These relationships will drive the value of that data.  For example, being able to understand in real-time, what the cost in power is to produce a unit of inventory, is a question that would need to be answered.  This requires a database for IoT as well as a database that can functionally handle IT data.  

Most folks would use a polyglot persistence model to achieve this.  They would use a highly scalable streaming data solution, or an edge database, to consume the OT power data.  They would then use an RDBMS to consume the inventory data.  How then do we correlate those in real-time?  Most likely by sending them to a third system.  Things will get lost in transit, integrations will break, and consistency is lost.   

The True Cost of Polyglot Persistence 

Furthermore, this is highly complex.  As we begin to add additional systems for each of these data types, we need additional specialized resources in both people and hardware to maintain them. We also need multiple integration layers that often times lack a dynamic nature, and ultimately become the failure points in these architectures.  The more and more layers we add to these architectures, the more challenging it becomes to determine consistency and to manage this complexity.  It also adds significant costs to house the same data in multiple places, as well as increased compute costs.   

We are also paying in lost productivity more than anywhere else.  How long does it take to triage an issue in your operational data pipeline when you have 5 to 7 different sources of the truth?  How do you determine what is causing data corruption?    

There is also a major risk in terms of compliance.  If we look at the different data breaches across social media companies, credit bureaus, financial institutions, etc. how much time has it taken for them to diagnose the real effect of those breaches?  Why is that?  The answer is pretty simple, they don’t have a holistic picture of their data nor a unified audit trail on said data.  This is becoming more and more dangerous as the impact of these breaches becomes more dramatic effecting personal data while more things become connected.   

What is the solution?

I am not suggesting we go back to the days of monolithic RDBMS environments.  I think it’s clear that paradigm is over.  Nor am I suggesting that we abandon many of the products we currently are using.  Many of the developer tools are awesome for different uses.  Tools for search like ElastiCache have become vital parts of the technology ecosystem, and in-memory databases play important roles for areas where very high speed search on relational data is needed.   

What I am suggesting is that we need to look at data architectures that provide a single persistence model across all these tools, providing those tools with the stability and consistency that they require.  Creating data planes with persistence that can function as middleware layers as well as stream processing engines will be key to reducing complexity.    

If we stop relying on each of these individual tools for long term persistence, but rather view the data inside them as transient, we can then accept the fact that their version of the data might be out of synch and they might crash.  If we are able to put persistence in a stream processing layer with ACID compliance and very high stability, we can then rely on that layer to provide a holistic view of our data.  Stop overtaxing these systems with data lakes where storage algorithms make it impossible to do performant transformation and aggregation, but rather allow these end-point data stores to do their jobs and provide that functionality in a layer that can be used as an operational data pipeline.

This post examines the concept of polyglot persistence—using multiple data storage technologies for one application—and argues that mixing databases often adds needless complexity. It suggests that choosing a single versatile database can simplify development and improve performance.

Download

White arrow pointing right
This post examines the concept of polyglot persistence—using multiple data storage technologies for one application—and argues that mixing databases often adds needless complexity. It suggests that choosing a single versatile database can simplify development and improve performance.

Download

White arrow pointing right
This post examines the concept of polyglot persistence—using multiple data storage technologies for one application—and argues that mixing databases often adds needless complexity. It suggests that choosing a single versatile database can simplify development and improve performance.

Download

White arrow pointing right

Explore Recent Resources

Blog
GitHub Logo

How a Shopify Custom Tie Shop Exposes a Common Flaw in Agent Architecture

Explore how a Shopify-based custom tie shop reveals a critical flaw in one LLM agent design strategy, and why context-first architectures with unified runtimes deliver faster, more accurate, and scalable customer support automation.
Blog
Explore how a Shopify-based custom tie shop reveals a critical flaw in one LLM agent design strategy, and why context-first architectures with unified runtimes deliver faster, more accurate, and scalable customer support automation.
Person with short dark hair and moustache, wearing a colorful plaid shirt, smiling outdoors in a forested mountain landscape.
Aleks Haugom
Senior Manager of GTM & Marketing
Blog

How a Shopify Custom Tie Shop Exposes a Common Flaw in Agent Architecture

Explore how a Shopify-based custom tie shop reveals a critical flaw in one LLM agent design strategy, and why context-first architectures with unified runtimes deliver faster, more accurate, and scalable customer support automation.
Aleks Haugom
Apr 2026
Blog

How a Shopify Custom Tie Shop Exposes a Common Flaw in Agent Architecture

Explore how a Shopify-based custom tie shop reveals a critical flaw in one LLM agent design strategy, and why context-first architectures with unified runtimes deliver faster, more accurate, and scalable customer support automation.
Aleks Haugom
Blog

How a Shopify Custom Tie Shop Exposes a Common Flaw in Agent Architecture

Explore how a Shopify-based custom tie shop reveals a critical flaw in one LLM agent design strategy, and why context-first architectures with unified runtimes deliver faster, more accurate, and scalable customer support automation.
Aleks Haugom
Blog
GitHub Logo

Nobody Wants to Pick a Data Center (And They Shouldn't Have To)

Harper Fabric simplifies cloud deployment by eliminating the need to choose data centers, automating infrastructure, scaling, and global distribution. Built for Harper’s unified runtime, it enables developers to deploy high-performance, distributed applications quickly without managing complex cloud configurations or infrastructure overhead.
Blog
Harper Fabric simplifies cloud deployment by eliminating the need to choose data centers, automating infrastructure, scaling, and global distribution. Built for Harper’s unified runtime, it enables developers to deploy high-performance, distributed applications quickly without managing complex cloud configurations or infrastructure overhead.
Headshot of a smiling woman with shoulder-length dark hair wearing a black sweater with white stripes and a gold pendant necklace, standing outdoors with blurred trees and mountains in the background.
Bari Jay
Senior Director of Product Management
Blog

Nobody Wants to Pick a Data Center (And They Shouldn't Have To)

Harper Fabric simplifies cloud deployment by eliminating the need to choose data centers, automating infrastructure, scaling, and global distribution. Built for Harper’s unified runtime, it enables developers to deploy high-performance, distributed applications quickly without managing complex cloud configurations or infrastructure overhead.
Bari Jay
Apr 2026
Blog

Nobody Wants to Pick a Data Center (And They Shouldn't Have To)

Harper Fabric simplifies cloud deployment by eliminating the need to choose data centers, automating infrastructure, scaling, and global distribution. Built for Harper’s unified runtime, it enables developers to deploy high-performance, distributed applications quickly without managing complex cloud configurations or infrastructure overhead.
Bari Jay
Blog

Nobody Wants to Pick a Data Center (And They Shouldn't Have To)

Harper Fabric simplifies cloud deployment by eliminating the need to choose data centers, automating infrastructure, scaling, and global distribution. Built for Harper’s unified runtime, it enables developers to deploy high-performance, distributed applications quickly without managing complex cloud configurations or infrastructure overhead.
Bari Jay
Blog
GitHub Logo

New RocksDB Binding for Node.js

rocksdb-js is a modern Node.js binding for RocksDB, offering full transaction support, lazy range queries, and a TypeScript API. Built for performance and scalability, it enables reliable write-heavy workloads, real-time replication, and high-concurrency applications in Harper 5.0 and beyond.
Blog
rocksdb-js is a modern Node.js binding for RocksDB, offering full transaction support, lazy range queries, and a TypeScript API. Built for performance and scalability, it enables reliable write-heavy workloads, real-time replication, and high-concurrency applications in Harper 5.0 and beyond.
Person with short hair and rectangular glasses wearing a plaid shirt over a dark T‑shirt, smiling broadly with a blurred outdoor background of trees and hills.
Chris Barber
Staff Software Engineer
Blog

New RocksDB Binding for Node.js

rocksdb-js is a modern Node.js binding for RocksDB, offering full transaction support, lazy range queries, and a TypeScript API. Built for performance and scalability, it enables reliable write-heavy workloads, real-time replication, and high-concurrency applications in Harper 5.0 and beyond.
Chris Barber
Apr 2026
Blog

New RocksDB Binding for Node.js

rocksdb-js is a modern Node.js binding for RocksDB, offering full transaction support, lazy range queries, and a TypeScript API. Built for performance and scalability, it enables reliable write-heavy workloads, real-time replication, and high-concurrency applications in Harper 5.0 and beyond.
Chris Barber
Blog

New RocksDB Binding for Node.js

rocksdb-js is a modern Node.js binding for RocksDB, offering full transaction support, lazy range queries, and a TypeScript API. Built for performance and scalability, it enables reliable write-heavy workloads, real-time replication, and high-concurrency applications in Harper 5.0 and beyond.
Chris Barber
Blog
GitHub Logo

Open Sourcing Harper

Harper is now open source, with its core platform released under Apache 2.0 and enterprise features source-available. This shift builds trust, enables community contributions, and positions Harper as a unified, transparent platform for developers and AI-driven applications.
Blog
Harper is now open source, with its core platform released under Apache 2.0 and enterprise features source-available. This shift builds trust, enables community contributions, and positions Harper as a unified, transparent platform for developers and AI-driven applications.
Person with shoulder‑length curly brown hair and light beard wearing a gray long‑sleeve shirt, smiling outdoors with trees and greenery in the background.
Ethan Arrowood
Senior Software Engineer
Blog

Open Sourcing Harper

Harper is now open source, with its core platform released under Apache 2.0 and enterprise features source-available. This shift builds trust, enables community contributions, and positions Harper as a unified, transparent platform for developers and AI-driven applications.
Ethan Arrowood
Apr 2026
Blog

Open Sourcing Harper

Harper is now open source, with its core platform released under Apache 2.0 and enterprise features source-available. This shift builds trust, enables community contributions, and positions Harper as a unified, transparent platform for developers and AI-driven applications.
Ethan Arrowood
Blog

Open Sourcing Harper

Harper is now open source, with its core platform released under Apache 2.0 and enterprise features source-available. This shift builds trust, enables community contributions, and positions Harper as a unified, transparent platform for developers and AI-driven applications.
Ethan Arrowood
Blog
GitHub Logo

The Resource API in Harper v5: HTTP Done Right

Harper v5's Resource API maps JavaScript class methods directly to HTTP verbs, eliminating routing and translation layers. Tables extend the same Resource class, unifying HTTP handling and data access into one interface. Key v5 additions include pre-parsed RequestTarget objects, Response-aware source caching with stale-while-revalidate support, and async context tracking via getContext().
Product Update
Blog
Harper v5's Resource API maps JavaScript class methods directly to HTTP verbs, eliminating routing and translation layers. Tables extend the same Resource class, unifying HTTP handling and data access into one interface. Key v5 additions include pre-parsed RequestTarget objects, Response-aware source caching with stale-while-revalidate support, and async context tracking via getContext().
Person with very short blonde hair wearing a light gray button‑up shirt, standing with arms crossed and smiling outdoors with foliage behind.
Kris Zyp
SVP of Engineering
Blog

The Resource API in Harper v5: HTTP Done Right

Harper v5's Resource API maps JavaScript class methods directly to HTTP verbs, eliminating routing and translation layers. Tables extend the same Resource class, unifying HTTP handling and data access into one interface. Key v5 additions include pre-parsed RequestTarget objects, Response-aware source caching with stale-while-revalidate support, and async context tracking via getContext().
Kris Zyp
Apr 2026
Blog

The Resource API in Harper v5: HTTP Done Right

Harper v5's Resource API maps JavaScript class methods directly to HTTP verbs, eliminating routing and translation layers. Tables extend the same Resource class, unifying HTTP handling and data access into one interface. Key v5 additions include pre-parsed RequestTarget objects, Response-aware source caching with stale-while-revalidate support, and async context tracking via getContext().
Kris Zyp
Blog

The Resource API in Harper v5: HTTP Done Right

Harper v5's Resource API maps JavaScript class methods directly to HTTP verbs, eliminating routing and translation layers. Tables extend the same Resource class, unifying HTTP handling and data access into one interface. Key v5 additions include pre-parsed RequestTarget objects, Response-aware source caching with stale-while-revalidate support, and async context tracking via getContext().
Kris Zyp
News
GitHub Logo

Harper 5.0 Is Here: Open Source, RocksDB, and a Runtime Built for the Agentic Era

Harper 5.0 launches with a fully open-source core under Apache 2.0, RocksDB as a native storage engine alongside LMDB, and source-available Harper Pro. This release delivers a unified runtime purpose-built for agentic engineering, from prototype to production.
Product Update
News
Harper 5.0 launches with a fully open-source core under Apache 2.0, RocksDB as a native storage engine alongside LMDB, and source-available Harper Pro. This release delivers a unified runtime purpose-built for agentic engineering, from prototype to production.
Person with short dark hair and moustache, wearing a colorful plaid shirt, smiling outdoors in a forested mountain landscape.
Aleks Haugom
Senior Manager of GTM & Marketing
News

Harper 5.0 Is Here: Open Source, RocksDB, and a Runtime Built for the Agentic Era

Harper 5.0 launches with a fully open-source core under Apache 2.0, RocksDB as a native storage engine alongside LMDB, and source-available Harper Pro. This release delivers a unified runtime purpose-built for agentic engineering, from prototype to production.
Aleks Haugom
Apr 2026
News

Harper 5.0 Is Here: Open Source, RocksDB, and a Runtime Built for the Agentic Era

Harper 5.0 launches with a fully open-source core under Apache 2.0, RocksDB as a native storage engine alongside LMDB, and source-available Harper Pro. This release delivers a unified runtime purpose-built for agentic engineering, from prototype to production.
Aleks Haugom
News

Harper 5.0 Is Here: Open Source, RocksDB, and a Runtime Built for the Agentic Era

Harper 5.0 launches with a fully open-source core under Apache 2.0, RocksDB as a native storage engine alongside LMDB, and source-available Harper Pro. This release delivers a unified runtime purpose-built for agentic engineering, from prototype to production.
Aleks Haugom