Click Below to Get the Code

Browse, clone, and build from real-world templates powered by Harper.
Blog
GitHub Logo

What is Database Clustering?

Learn about shared nothing and shared disk database clustering architectures as well as the advantages of utilizing them.
Blog

What is Database Clustering?

By
Mostafa Ibrahim
October 24, 2022
By
Mostafa Ibrahim
October 24, 2022
By
Mostafa Ibrahim
October 24, 2022
October 24, 2022
Learn about shared nothing and shared disk database clustering architectures as well as the advantages of utilizing them.
Mostafa Ibrahim
Community Collaborator

What is a database cluster?

Database clustering is the process of connecting more than one single database instance or server to your system. In most common database clusters, multiple database instances are usually managed by a single database server called the master. In the systems design world, implementing such a design may be necessary especially in large systems (web or mobile applications), as a single database server would not be capable of handling all of the customers’ requests. To fix this issue, the utilization of multiple database servers that work in parallel will be introduced to the system.

It goes without saying that using such a technique comes with numerous benefits to our system such as handling more users and overcoming system failures. One of the main disadvantages of such implementation is the additional complexity introduced into the system. To handle additional complexity, multiple database servers should be managed by a higher-level server that monitors the flow of data throughout the system.

Example of multiple database servers in a single system (Source)

As shown in the above image, multiple database servers are connected together using a SAN device. SAN short for Storage area network is a computer network device that provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from servers so that the devices appear to the operating system as direct-attached storage. While you still can build your own database cluster, recently, companies do provide third-party cloud database storage as a service for customers. Using such services customers can save costs on maintaining and monitoring their own database servers or clusters.

In this article, we will explain the two most common clustering architectural types. Moving on we will provide you with some advantages of database clustering.

Database Cluster Architecture

Shared-Nothing Architecture

To build a shared-nothing database architecture each database server must be independent of all other nodes. Meaning that each node has its own database server to store and access data from. In this type of architecture, no single database server is master. Meaning that there is no one central database node that monitors and controls the access of data in the system. Note that a shared-nothing architecture offers great horizontal scalability as no resources are being shared between either nodes or database servers.

An image of a shared-nothing architecture (Source)

Shared-Disk Architecture

On the other hand, we have the shared-disk architecture. In this architecture, all nodes(CPU) share access to all the database servers available, subsequently having access to all the  system’s data. Unlike the shared-nothing architecture, the interconnection network layer is between the CPU and the database servers allowing for multiple database servers' access. It is worth noting that a shared disk cluster does not offer much scalability when compared to the shared-nothing architecture, as if all nodes share access to the same data a controlling node is required to monitor the data flow in the system. The issue is that after exceeding a certain number of slave nodes, the master node would be unable to monitor and control all the slave nodes efficiently. 

A shared disk architecture (Source)

Advantages of database clustering

  1. Load balancing the system

Load balancing is the process of distributing a given number of tasks onto multiple different resources. The aim of such a task is to make the overall processing of the system much more efficient. The main reason for performing load balancing is to prevent the chance of any system overload causing a sudden system failure.

While a small application might not require multiple databases, as an application grows, the need for introducing more servers will be required. While it is still applicable for companies to replace their database server with more efficient ones, there is a limit to how many requests that a single server can handle. To solve this issue, multiple database servers are introduced into the system. Along with a master node that will distribute the user requests among them equally. The idea is to not overload a single server while keeping other servers free.

  1. Reaching more customers

One of the main reasons why companies invest in database clustering is scalability. By adding more database servers, companies can handle a much greater number of users from different parts of the world. 

Note that having multiple database servers implemented in different geographical locations will allow for faster customer interaction by having the actual database server closer to the customer's geographical location. This will be necessary for worldwide used applications such as Facebook, Youtube, and Google with users all over the world..

  1. Data redundancy across the system

Data redundancy is the process of storing data in two or more different storage spaces. In the case of database clustering, while they may be multiple database servers, all servers must hold the same exact data. Data redundancy is so significant because if one database server gets corrupted (data is lost or changed) we can still have a copy of the data stored in another database server. In cases where an issue occurs in a given system’s database, data redundancy can be the

  1. Overcoming the risk of application failure

By having multiple database servers working in parallel, database engineers can overcome the single point of failure issue. If an application has only one database server and this server fails or goes down then the system can be considered halted. To resolve such issues, other database servers must be on standby. You may never know when a database server is going to go down, thus it’s always better to keep other servers available. Note that the difference between data redundancy and application failure is that in the case of data redundancy all data is lost. On the other hand, in the case of a single point of failure, the database server is down for a limited amount of time and is expected to be up and running again, thus the system can still function but it has no space to store or retrieve data in the current time.

Clustering with Harper

Harper allows for database clustering by having the Harper clustering engine replicate data between multiple instances of the database. This approach follows a high-performant, bi-directional pub/sub model on a per-table basis. As with most database clustering solutions, Harper offers load balancing, data redundancy, and high data availability.

Using Harper, data is replicated asynchronously across all the database servers in the cluster. Individual transactions are sent in the order in which they were transacted and are processed in an ACID-complaint manner.  By having an ACID complaint system, all the database transactions are guaranteed to maintain high data validity despite errors, power failures, and any other mishaps that might occur. Learn more about Harper clustering in the docs here.

What is a database cluster?

Database clustering is the process of connecting more than one single database instance or server to your system. In most common database clusters, multiple database instances are usually managed by a single database server called the master. In the systems design world, implementing such a design may be necessary especially in large systems (web or mobile applications), as a single database server would not be capable of handling all of the customers’ requests. To fix this issue, the utilization of multiple database servers that work in parallel will be introduced to the system.

It goes without saying that using such a technique comes with numerous benefits to our system such as handling more users and overcoming system failures. One of the main disadvantages of such implementation is the additional complexity introduced into the system. To handle additional complexity, multiple database servers should be managed by a higher-level server that monitors the flow of data throughout the system.

Example of multiple database servers in a single system (Source)

As shown in the above image, multiple database servers are connected together using a SAN device. SAN short for Storage area network is a computer network device that provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from servers so that the devices appear to the operating system as direct-attached storage. While you still can build your own database cluster, recently, companies do provide third-party cloud database storage as a service for customers. Using such services customers can save costs on maintaining and monitoring their own database servers or clusters.

In this article, we will explain the two most common clustering architectural types. Moving on we will provide you with some advantages of database clustering.

Database Cluster Architecture

Shared-Nothing Architecture

To build a shared-nothing database architecture each database server must be independent of all other nodes. Meaning that each node has its own database server to store and access data from. In this type of architecture, no single database server is master. Meaning that there is no one central database node that monitors and controls the access of data in the system. Note that a shared-nothing architecture offers great horizontal scalability as no resources are being shared between either nodes or database servers.

An image of a shared-nothing architecture (Source)

Shared-Disk Architecture

On the other hand, we have the shared-disk architecture. In this architecture, all nodes(CPU) share access to all the database servers available, subsequently having access to all the  system’s data. Unlike the shared-nothing architecture, the interconnection network layer is between the CPU and the database servers allowing for multiple database servers' access. It is worth noting that a shared disk cluster does not offer much scalability when compared to the shared-nothing architecture, as if all nodes share access to the same data a controlling node is required to monitor the data flow in the system. The issue is that after exceeding a certain number of slave nodes, the master node would be unable to monitor and control all the slave nodes efficiently. 

A shared disk architecture (Source)

Advantages of database clustering

  1. Load balancing the system

Load balancing is the process of distributing a given number of tasks onto multiple different resources. The aim of such a task is to make the overall processing of the system much more efficient. The main reason for performing load balancing is to prevent the chance of any system overload causing a sudden system failure.

While a small application might not require multiple databases, as an application grows, the need for introducing more servers will be required. While it is still applicable for companies to replace their database server with more efficient ones, there is a limit to how many requests that a single server can handle. To solve this issue, multiple database servers are introduced into the system. Along with a master node that will distribute the user requests among them equally. The idea is to not overload a single server while keeping other servers free.

  1. Reaching more customers

One of the main reasons why companies invest in database clustering is scalability. By adding more database servers, companies can handle a much greater number of users from different parts of the world. 

Note that having multiple database servers implemented in different geographical locations will allow for faster customer interaction by having the actual database server closer to the customer's geographical location. This will be necessary for worldwide used applications such as Facebook, Youtube, and Google with users all over the world..

  1. Data redundancy across the system

Data redundancy is the process of storing data in two or more different storage spaces. In the case of database clustering, while they may be multiple database servers, all servers must hold the same exact data. Data redundancy is so significant because if one database server gets corrupted (data is lost or changed) we can still have a copy of the data stored in another database server. In cases where an issue occurs in a given system’s database, data redundancy can be the

  1. Overcoming the risk of application failure

By having multiple database servers working in parallel, database engineers can overcome the single point of failure issue. If an application has only one database server and this server fails or goes down then the system can be considered halted. To resolve such issues, other database servers must be on standby. You may never know when a database server is going to go down, thus it’s always better to keep other servers available. Note that the difference between data redundancy and application failure is that in the case of data redundancy all data is lost. On the other hand, in the case of a single point of failure, the database server is down for a limited amount of time and is expected to be up and running again, thus the system can still function but it has no space to store or retrieve data in the current time.

Clustering with Harper

Harper allows for database clustering by having the Harper clustering engine replicate data between multiple instances of the database. This approach follows a high-performant, bi-directional pub/sub model on a per-table basis. As with most database clustering solutions, Harper offers load balancing, data redundancy, and high data availability.

Using Harper, data is replicated asynchronously across all the database servers in the cluster. Individual transactions are sent in the order in which they were transacted and are processed in an ACID-complaint manner.  By having an ACID complaint system, all the database transactions are guaranteed to maintain high data validity despite errors, power failures, and any other mishaps that might occur. Learn more about Harper clustering in the docs here.

Learn about shared nothing and shared disk database clustering architectures as well as the advantages of utilizing them.

Download

White arrow pointing right
Learn about shared nothing and shared disk database clustering architectures as well as the advantages of utilizing them.

Download

White arrow pointing right
Learn about shared nothing and shared disk database clustering architectures as well as the advantages of utilizing them.

Download

White arrow pointing right

Explore Recent Resources

Blog
GitHub Logo

Why a Multi-Tier Cache Delivers Better ROI Than a CDN Alone

Learn why a multi-tier caching strategy combining a CDN and mid-tier cache delivers better ROI. Discover how deterministic caching, improved origin offload, lower tail latency, and predictable costs outperform a CDN-only architecture for modern applications.
Cache
Blog
Learn why a multi-tier caching strategy combining a CDN and mid-tier cache delivers better ROI. Discover how deterministic caching, improved origin offload, lower tail latency, and predictable costs outperform a CDN-only architecture for modern applications.
Person with short dark hair and moustache, wearing a colorful plaid shirt, smiling outdoors in a forested mountain landscape.
Aleks Haugom
Senior Manager of GTM & Marketing
Blog

Why a Multi-Tier Cache Delivers Better ROI Than a CDN Alone

Learn why a multi-tier caching strategy combining a CDN and mid-tier cache delivers better ROI. Discover how deterministic caching, improved origin offload, lower tail latency, and predictable costs outperform a CDN-only architecture for modern applications.
Aleks Haugom
Jan 2026
Blog

Why a Multi-Tier Cache Delivers Better ROI Than a CDN Alone

Learn why a multi-tier caching strategy combining a CDN and mid-tier cache delivers better ROI. Discover how deterministic caching, improved origin offload, lower tail latency, and predictable costs outperform a CDN-only architecture for modern applications.
Aleks Haugom
Blog

Why a Multi-Tier Cache Delivers Better ROI Than a CDN Alone

Learn why a multi-tier caching strategy combining a CDN and mid-tier cache delivers better ROI. Discover how deterministic caching, improved origin offload, lower tail latency, and predictable costs outperform a CDN-only architecture for modern applications.
Aleks Haugom
Tutorial
GitHub Logo

Real-Time Pub/Sub Without the "Stack"

Explore a real-time pub/sub architecture where MQTT, WebSockets, Server-Sent Events, and REST work together with persistent data storage in one end-to-end system, enabling real-time interoperability, stateful messaging, and simplified service-to-device and browser communication.
Harper Learn
Tutorial
Explore a real-time pub/sub architecture where MQTT, WebSockets, Server-Sent Events, and REST work together with persistent data storage in one end-to-end system, enabling real-time interoperability, stateful messaging, and simplified service-to-device and browser communication.
A man with short dark hair, glasses, and a goatee smiles slightly, wearing a black shirt in front of a nature background.
Ivan R. Judson, Ph.D.
Distinguished Solution Architect
Tutorial

Real-Time Pub/Sub Without the "Stack"

Explore a real-time pub/sub architecture where MQTT, WebSockets, Server-Sent Events, and REST work together with persistent data storage in one end-to-end system, enabling real-time interoperability, stateful messaging, and simplified service-to-device and browser communication.
Ivan R. Judson, Ph.D.
Jan 2026
Tutorial

Real-Time Pub/Sub Without the "Stack"

Explore a real-time pub/sub architecture where MQTT, WebSockets, Server-Sent Events, and REST work together with persistent data storage in one end-to-end system, enabling real-time interoperability, stateful messaging, and simplified service-to-device and browser communication.
Ivan R. Judson, Ph.D.
Tutorial

Real-Time Pub/Sub Without the "Stack"

Explore a real-time pub/sub architecture where MQTT, WebSockets, Server-Sent Events, and REST work together with persistent data storage in one end-to-end system, enabling real-time interoperability, stateful messaging, and simplified service-to-device and browser communication.
Ivan R. Judson, Ph.D.
News
GitHub Logo

Harper Recognized on Built In’s 2026 Best Places to Work in Colorado Lists

Harper is honored as a Built In 2026 Best Startup to Work For and Best Place to Work in Colorado, recognizing its people-first culture, strong employee experience, and values of accountability, authenticity, empowerment, focus, and transparency that help teams thrive and grow together.
Announcement
News
Harper is honored as a Built In 2026 Best Startup to Work For and Best Place to Work in Colorado, recognizing its people-first culture, strong employee experience, and values of accountability, authenticity, empowerment, focus, and transparency that help teams thrive and grow together.
Colorful geometric illustration of a dog's head resembling folded paper art in shades of teal and pink.
Harper
News

Harper Recognized on Built In’s 2026 Best Places to Work in Colorado Lists

Harper is honored as a Built In 2026 Best Startup to Work For and Best Place to Work in Colorado, recognizing its people-first culture, strong employee experience, and values of accountability, authenticity, empowerment, focus, and transparency that help teams thrive and grow together.
Harper
Jan 2026
News

Harper Recognized on Built In’s 2026 Best Places to Work in Colorado Lists

Harper is honored as a Built In 2026 Best Startup to Work For and Best Place to Work in Colorado, recognizing its people-first culture, strong employee experience, and values of accountability, authenticity, empowerment, focus, and transparency that help teams thrive and grow together.
Harper
News

Harper Recognized on Built In’s 2026 Best Places to Work in Colorado Lists

Harper is honored as a Built In 2026 Best Startup to Work For and Best Place to Work in Colorado, recognizing its people-first culture, strong employee experience, and values of accountability, authenticity, empowerment, focus, and transparency that help teams thrive and grow together.
Harper
Comparison
GitHub Logo

Harper vs. Standard Microservices: Performance Comparison Benchmark

A detailed performance benchmark comparing a traditional microservices architecture with Harper’s unified runtime. Using a real, fully functional e-commerce application, this report examines latency, scalability, and architectural overhead across homepage, category, and product pages, highlighting the real-world performance implications between two different styles of distributed systems.
Comparison
A detailed performance benchmark comparing a traditional microservices architecture with Harper’s unified runtime. Using a real, fully functional e-commerce application, this report examines latency, scalability, and architectural overhead across homepage, category, and product pages, highlighting the real-world performance implications between two different styles of distributed systems.
Person with short dark hair and moustache, wearing a colorful plaid shirt, smiling outdoors in a forested mountain landscape.
Aleks Haugom
Senior Manager of GTM & Marketing
Comparison

Harper vs. Standard Microservices: Performance Comparison Benchmark

A detailed performance benchmark comparing a traditional microservices architecture with Harper’s unified runtime. Using a real, fully functional e-commerce application, this report examines latency, scalability, and architectural overhead across homepage, category, and product pages, highlighting the real-world performance implications between two different styles of distributed systems.
Aleks Haugom
Dec 2025
Comparison

Harper vs. Standard Microservices: Performance Comparison Benchmark

A detailed performance benchmark comparing a traditional microservices architecture with Harper’s unified runtime. Using a real, fully functional e-commerce application, this report examines latency, scalability, and architectural overhead across homepage, category, and product pages, highlighting the real-world performance implications between two different styles of distributed systems.
Aleks Haugom
Comparison

Harper vs. Standard Microservices: Performance Comparison Benchmark

A detailed performance benchmark comparing a traditional microservices architecture with Harper’s unified runtime. Using a real, fully functional e-commerce application, this report examines latency, scalability, and architectural overhead across homepage, category, and product pages, highlighting the real-world performance implications between two different styles of distributed systems.
Aleks Haugom
Tutorial
GitHub Logo

A Simpler Real-Time Messaging Architecture with MQTT, WebSockets, and SSE

Learn how to build a unified real-time backbone using Harper with MQTT, WebSockets, and Server-Sent Events. This guide shows how to broker messages, fan out real-time data, and persist events in one runtime—simplifying real-time system architecture for IoT, dashboards, and event-driven applications.
Harper Learn
Tutorial
Learn how to build a unified real-time backbone using Harper with MQTT, WebSockets, and Server-Sent Events. This guide shows how to broker messages, fan out real-time data, and persist events in one runtime—simplifying real-time system architecture for IoT, dashboards, and event-driven applications.
A man with short dark hair, glasses, and a goatee smiles slightly, wearing a black shirt in front of a nature background.
Ivan R. Judson, Ph.D.
Distinguished Solution Architect
Tutorial

A Simpler Real-Time Messaging Architecture with MQTT, WebSockets, and SSE

Learn how to build a unified real-time backbone using Harper with MQTT, WebSockets, and Server-Sent Events. This guide shows how to broker messages, fan out real-time data, and persist events in one runtime—simplifying real-time system architecture for IoT, dashboards, and event-driven applications.
Ivan R. Judson, Ph.D.
Dec 2025
Tutorial

A Simpler Real-Time Messaging Architecture with MQTT, WebSockets, and SSE

Learn how to build a unified real-time backbone using Harper with MQTT, WebSockets, and Server-Sent Events. This guide shows how to broker messages, fan out real-time data, and persist events in one runtime—simplifying real-time system architecture for IoT, dashboards, and event-driven applications.
Ivan R. Judson, Ph.D.
Tutorial

A Simpler Real-Time Messaging Architecture with MQTT, WebSockets, and SSE

Learn how to build a unified real-time backbone using Harper with MQTT, WebSockets, and Server-Sent Events. This guide shows how to broker messages, fan out real-time data, and persist events in one runtime—simplifying real-time system architecture for IoT, dashboards, and event-driven applications.
Ivan R. Judson, Ph.D.