Blog | Exposing the Hidden Energy Cost of Multi-System Software Architectures (and a Greener Path Forward)

As Earth Day approaches, we’re reminded to examine how our choices impact the planet. In tech, those choices aren’t just about using renewable power or recycling hardware – they extend to the energy usage that our software design and delivery choices have. Modern applications have brought remarkable convenience and scalability, but beneath the sleek user experiences lies a growing energy appetite. As cloud infrastructure expands and workloads become more compute-intensive, the strain on global power grids continues to rise. For environmentally conscious developers and tech executives, the message is clear: it’s time to consider energy efficiency as a first-class goal in software architecture. This editorial explores how modern application delivery became an energy hog, and how innovations like Harper’s fully fused stack offer a compelling, energy-efficient alternative.

‍Modern Application Delivery’s Energy Problem

Today’s cloud applications are more complex than ever. In the quest for scalability and modularity, we’ve embraced microservices, distributed systems, and multi-layered tech stacks. A typical web application might involve separate services for the UI, API gateway, business logic, database, cache, and message queue – each potentially running on different servers. While this distributed architecture brings flexibility, it also introduces significant overhead that isn’t immediately obvious. Every time one service calls another, data must be packaged (serialized into JSON or another format), sent over a network, and then unpacked on the other side. The service tier, I/O, and serialization combined can have a notable negative impact on system performance – and what chips away at performance is also chipping away at efficiency.

Consider a simple user request in a microservices architecture. It might:

Go through an API gateway to a front-end service.
Trigger calls to several back-end microservices in sequence.
Each internal call requires serialization/deserialization of data (converting objects to a transferable format and back again) and a network hop.
In many cases, data is fetched from a database and passed through a cache layer, adding more overhead.
Multiple copies or transformations of the same data may be created along the way.

All these extra steps make the system more CPU-intensive than it needs to be. Research confirms this intuition: one controlled experiment found that a fine-grained microservice design (many small services) consumed ~13% more energy and added latency compared to a more consolidated design. The latency cost of microservices – every additional 5 ms here, 10 ms there – is also an energy cost, because CPU cycles spent on overhead still draw power. In microservice-heavy systems, “often, a significant chunk of a request’s latency is spent on serialization/deserialization”, especially as each inter-service communication adds that cost repeatedly. In other words, our modern architectural patterns can inadvertently create a lot of digital friction, where servers are busy doing work that isn’t core to the application’s purpose but rather just moving data around and coordinating between components.

This complexity doesn’t only affect a few servers – at scale, it multiplies. Large applications might spawn hundreds of services across thousands of machines. Many of those machines run at low utilization, waiting on network calls or handling repetitive data conversion tasks. The result is an industry where compute cycles – and the energy that powers them – are often wasted on overhead.

But there’s another hidden layer to this energy problem: how we try to mask the latency created by modular architectures. Rather than eliminating the root cause of overhead, teams often throw more hardware at the issue – adding redundancy, spinning up extra instances, or distributing services across geographies. This practice is rooted in the idea of pipelining: you improve perceived latency by increasing bandwidth to let more work happen in parallel. But there's an asymptotic limit to what hardware can solve. Over time, this strategy becomes a game of diminishing returns – one where every millisecond saved costs disproportionately more energy. Geographic proximity is then used as a patch, compensating for performance penalties that were introduced by the system’s own modular design. Instead of continuing to scale infrastructure outward, we can rethink our approach inward: by reducing unnecessary serialization/deserialization and minimizing inter-service chatter, we can target latency where it starts. That’s a far cheaper and more sustainable fix than endlessly scaling up of compute.

No wonder some teams are rethinking the approach; even Amazon Prime Video engineers found that a microservices approach for a critical monitoring service became “too expensive to operate at scale,” leading them to consolidate components back into a simpler monolithic architecture. That move cut out cross-service overhead and slashed their operational costs – a proxy for energy savings as well. It’s a powerful reminder: simpler architectures can mean less overhead, which ultimately means less energy consumed and when architected well, can even mean more performant.

‍

When Compute Cycles Equal Carbon Footprint

Why does shaving a few milliseconds or a few CPU instructions matter for sustainability? Because at the end of the day, every CPU cycle burns energy. Multiply tiny inefficiencies by millions of users and hours of operation, and you get a very real impact on the electricity grid. The connection between compute and carbon is direct: most electricity is still generated from fossil fuels, so the more power our servers draw, the more carbon dioxide is emitted. On average, the global power grid emits on the order of 0.5 kg of CO₂ for every kilowatt-hour of electricity produced. That means if a piece of software causes a server to use an extra 1 kWh through inefficient code or unnecessary processing, it’s roughly like putting an additional half-kilogram of CO₂ into the atmosphere. Now consider the scale of a modern cloud deployment: a fleet of servers consuming, say, 1 MW continuously (common for a large data center) will emit hundreds of tons of CO₂ over a year, unless that energy is fully offset by renewables.

The tech industry’s aggregate footprint is far from trivial. In 2022, data centers used an estimated 240–340 terawatt-hours of electricity, about 1–1.5% of global demand. By some estimates, the carbon emissions of our digital infrastructure are already comparable to those of the aviation industry. And if current trends continue, the sector’s share could approach 14% of worldwide carbon emissions by 2040 – that’s more than half the current contribution of the entire transportation sector. These startling numbers are driven by exploding demand for data and compute. Yes, big cloud providers are pledging green energy and improved cooling efficiency, and indeed hyperscale data centers are more efficient than smaller ones in raw power usage terms. But, efficiency gains at the infrastructure level can easily be outpaced by inefficiency at the software level if we continue to layer on complexity without regard to energy impact. Simply put, the greenest energy is the energy we don’t use. Eliminating wasteful compute cycles is just as important as deploying solar panels on the roof of the data center.

A Fully Fused Stack: Doing More with Less

How can we break the cycle of ever-increasing infrastructure complexity and energy use? One promising approach is to simplify the stack itself – to fuse the layers of technology so tightly that much of the overhead disappears. This is exactly the idea behind Harper’s fully fused stack. Harper is a next-generation platform that combines the database, caching, application logic, and even real-time messaging into a single unified process. Instead of running, say, a separate Node.js server, Redis cache, Kafka queue, and MongoDB database (and having them chat with each other over networks), you can have one integrated system that provides all those capabilities internally. Harper’s design literally embeds application runtimes (like Node.js and Next.js) directly inside the data layer process. In doing so, it removes the overhead between systems, significantly reducing total compute requirements for running an application.

Think of what this means in practice. With a fused stack, when your code needs to read some data, it calls a function in-memory and gets the data – no serialization to JSON, no TCP/IP round-trip, no context-switch to a separate database server process. The data is fetched and delivered within the same process space. Similarly, if you publish a message or query some cached result, it’s handled by the same running engine, not handed off to a separate broker or cache service. By deploying data, application, and messaging functions together as a single package, you eliminate the multiple hand-offs that plague a traditional multi-tier architecture. Harper essentially internalizes what would otherwise be network calls or cross-system calls. The result is a dramatic drop in overhead: less CPU time wasted on packing/unpacking data, fewer context switches, and far fewer network operations per user request.

This approach yields concrete efficiency gains. Fewer independent moving parts mean fewer system layers to manage or secure, which “minimize[s] development, infrastructure, and security overhead.” In other words, there’s less redundant work being done and even less hardware needed overall to support a given workload. Harper’s team has seen this firsthand across various industries. In a real-world streaming media application, switching to a fused architecture led to a 70% reduction in compute spend (a good proxy for compute usage and energy) while also cutting latency by 69%. Another case study in the digital advertising space showed a 75% reduction in infrastructure costs accompanied by 250× faster response times. And a global gaming network was able to slash 90% of its infrastructure footprint (with a 92% drop in latency) after consolidating and simplifying their stack. These are massive improvements – imagine doing the same job with one-tenth the servers – and they highlight how closely performance efficiency and energy efficiency can align. When you remove needless layers, the system not only runs faster for users, it also runs leaner in terms of resources.

Cutting Out the Fat: Why Fused Stacks Save Energy

To truly appreciate the sustainability impact of a fused stack, let’s break down the specific efficiencies gained:

Fewer Network Hops: Every time you avoid a network call between services, you save the energy that would have been used to transmit and receive data, as well as the idle wait time on both ends. A unified stack responds to requests with single-touch processing, often handling an API call to data retrieval in one step. In Harper’s case, they report server response times under 1 millisecond since the API and data access are one operation. This means a dramatic reduction in the networking equipment usage and CPU interrupts that would otherwise be required for inter-service communication. It also cuts out added latency and network overhead, which are known drawbacks of microservices.
Minimal Serialization/Deserialization: In a fused system, data can stay in its native, binary form as it moves through different parts of the stack. Traditional setups may serialize data to JSON or protocol buffers at one layer and parse it at the next, sometimes multiple times per request. All that work of encoding/decoding is pure overhead. By consolidating your API with your database, you largely eliminate those repetitive conversions, directly reducing CPU usage and even memory overhead. It’s like removing a translator between two people who actually speak the same language – suddenly, the conversation is faster and less error-prone. One industry expert noted that in microservice architectures, the overhead of service integration (including serialization) “inherently chip[s] away at the system;” a fused stack avoids that pitfall.
Higher Resource Utilization: A single, multi-functional process can often run at a higher average utilization than many separate, smaller processes. In the cloud, it’s common to see dozens of microservices, each running on separate VMs or containers with plenty of headroom – meaning a lot of wasted capacity. By contrast, an integrated service can fill up that headroom by doing more work per CPU cycle. This aligns with what we see in large-scale cloud operations: running servers at higher utilization is more energy-efficient. With a fused stack, you need fewer total servers, and you can keep each one doing useful work more of the time. The outcome is a lower overall power draw for the same throughput.
Reduced Duplicative Storage and Caching: In many architectures, we maintain caches to speed up data access, but those caches are essentially copies of data that already exists in a database. Maintaining them (and keeping them in sync) costs both extra memory and compute cycles. In a fused architecture, the distinction between “cache” and “database” can blur – the system can serve reads from memory when appropriate, without a separate caching layer. This not only simplifies design but also means less redundant data storage (saving energy on memory and disk IO) and fewer background processes to invalidate or update caches.

All of these factors contribute to leaner execution per transaction. The bottom line: if a traditional stack required N joules of energy for one user request, a well-optimized fused stack might cut that dramatically – and over millions of requests, those savings add up. Importantly, these efficiency gains come with no downside to the user; in fact, users benefit from faster responses. It’s a scenario where the environmentally friendly solution is also a win for performance and even cost savings. For example, after adopting Harper’s efficient stack, one e-commerce site not only reduced infrastructure needs but also saw a meaningful lift in conversion rates due to improved speed. Efficiency isn’t just good for the planet – it’s good for business and users, too.

Innovating for a Sustainable Tech Future

As we celebrate Earth Day, it’s worth reflecting on how we, as developers and tech leaders, can contribute to a more sustainable future through the choices we make in our systems and code. Modern application delivery doesn’t have to be an energy sink. Yes, the demand on our infrastructure is growing, but so is our ability to innovate and optimize. By rethinking entrenched patterns – questioning whether we really need five different services to do one job, or whether our data needs to traverse the globe and back for a simple query – we can find solutions that dramatically cut waste. Harper’s fully fused stack exemplifies the kind of out-of-the-box thinking that can lead to order-of-magnitude improvements in efficiency. It shows that we can maintain the scalability and functionality we need while stripping out the excess baggage of over-engineered architectures.

The broader point is that sustainability in tech isn’t just about hardware or energy sourcing; it’s about software architectures as well. Each layer we add, each abstraction and service, should justify itself not only in terms of developer convenience or theoretical scalability but also in terms of real-world resource impact. The good news is that optimizing for efficiency often aligns with other desirable outcomes: lower cloud bills, faster response times, easier maintenance, and a smaller attack surface for security. In other words, efficient design is sustainable design.

On this Earth Day, let’s challenge ourselves to build software with the planet in mind. That means measuring and understanding the energy impact of our engineering decisions and embracing tools and platforms that help reduce that impact. It could mean diving deeper into performance profiling, consolidating services, or exploring new architectures like the fused stack approach. The next time you start a project or review an architecture, ask: Can this be simpler? Faster? More energy-efficient? Chances are, with some creativity, it can be.

Ultimately, the greenest cloud is the one doing the same work with less electricity. We owe it to ourselves and future generations to pursue that ideal. By reducing serialization, network hops, and excessive compute overhead, we trim not just technical fat but also carbon fat. The technology to do this is emerging now. Harper’s example shows that reinventing the stack can lead to substantial gains in efficiency – a beacon of what’s possible.

As a subtle call to action, we invite you to explore these new ideas further. Whether it’s Harper’s fully fused platform or a similar paradigm, consider how you might apply a “do more with less” mindset in your own tech environment. Every bit of optimization, every layer fused or function streamlined, is a step toward a sustainable digital ecosystem. This Earth Day, let’s commit to writing code and architecting systems that not only serve our users but also respect our planet’s limited resources. Small changes at the micro level can translate into massive impacts at the macro level. The future of computing can be both high-performance and green – and it’s up to us to make it happen.

‍

Click Below to Get the Code

Exposing the Hidden Energy Cost of Multi-System Software Architectures (and a Greener Path Forward)

Exposing the Hidden Energy Cost of Multi-System Software Architectures (and a Greener Path Forward)

‍Modern Application Delivery’s Energy Problem

When Compute Cycles Equal Carbon Footprint

A Fully Fused Stack: Doing More with Less

Cutting Out the Fat: Why Fused Stacks Save Energy

Innovating for a Sustainable Tech Future

Harper is Officially Open Source

Harper’s Stephen Goldberg Named Most Admired CEO by the Denver Business Journal

Harper Launches Official Model Context Protocol (MCP) Server, Expanding Support for LLM-Native Applications

Harper Now Features Vector Indexing for AI-Powered Search

‍Modern Application Delivery’s Energy Problem

When Compute Cycles Equal Carbon Footprint

A Fully Fused Stack: Doing More with Less

Cutting Out the Fat: Why Fused Stacks Save Energy

Innovating for a Sustainable Tech Future

Download

Download

Download

Explore Recent Resources

Building an A.I. Tool in 3 Days

Building an A.I. Tool in 3 Days

Building an A.I. Tool in 3 Days

Building an A.I. Tool in 3 Days

Happy Thanksgiving! Here is an AI-Coded Harper Game for Your Day Off

Happy Thanksgiving! Here is an AI-Coded Harper Game for Your Day Off

Happy Thanksgiving! Here is an AI-Coded Harper Game for Your Day Off

Happy Thanksgiving! Here is an AI-Coded Harper Game for Your Day Off

BigQuery to Harper: Real-Time Data Access Without Redis or Custom APIs

BigQuery to Harper: Real-Time Data Access Without Redis or Custom APIs

BigQuery to Harper: Real-Time Data Access Without Redis or Custom APIs

BigQuery to Harper: Real-Time Data Access Without Redis or Custom APIs

Building a Smarter 311 App

Building a Smarter 311 App

Building a Smarter 311 App

Building a Smarter 311 App

Pub/Sub for AI: The New Requirements for Real-Time Data

Pub/Sub for AI: The New Requirements for Real-Time Data

Pub/Sub for AI: The New Requirements for Real-Time Data

Pub/Sub for AI: The New Requirements for Real-Time Data

Deliver Performance and Simplicity with Distributed Microliths

Deliver Performance and Simplicity with Distributed Microliths

Deliver Performance and Simplicity with Distributed Microliths

Deliver Performance and Simplicity with Distributed Microliths

Start