Introducing Structon: Random-Access Binary Encoding for JavaScript

When Harper evaluates query conditions, it frequently needs to access individual fields from potentially large numbers of stored records. For single-condition queries on an indexed field, Harper can scan the secondary index directly and never touch most records. But for multi-condition AND queries—say, age > 25 AND city == 'Denver'—only the first condition can be satisfied by an index scan; every candidate record still needs the city field checked. And for unindexed fields, there's no choice but to read records directly. Either way, the question becomes: how cheaply can you read one field from a stored record?

The naïve approach—deserialize the entire record into a JavaScript object, read the property, discard the object—works fine at small scale. At large scale it becomes the bottleneck. Every allocation, every field you decode but don't need, every GC pressure event adds up.

Random-access encoding is the solution: store records in a binary format where any individual field can be read by jumping directly to its byte offset, without touching the rest of the record. We've been using this approach inside Harper for years, as a somewhat hidden capability within msgpackr. With the release of structon, we're making it a standalone, reusable package.

How It Works

The core idea is that objects with the same set of keys—the same "shape"—share a structure definition that describes the byte layout. The encoder builds up these structure definitions incrementally as it sees new shapes. Once a shape is known, every subsequent record of that shape writes only values, not field names. The decoder can identify the structure from a short header in the bytes, look up the field positions, and provide lazy getter properties that read directly from the raw buffer on demand.

The binary layout for a struct record looks like:

[header: 1–4 bytes][fixed-width field section][variable ref section]

The fixed-width section has one slot per field:

Numbers get 1, 4, or 8 bytes depending on type
Dates get 8 bytes (stored as a float64)
Strings and nested objects store a small integer offset/length pair pointing into the ref section

The ref section follows: first the raw bytes of all strings, then the encoded bytes of any nested objects.

When you access a property on the decoded object, the getter reads from those fixed offsets in the original buffer. You're not allocating a new string unless you actually access the property. You're not decoding a nested object unless you access it. For a query checking record.city == 'Denver', that's a short string read from the buffer—nothing else.

Why Not JSON, MessagePack, or BSON?

JSON requires full text parsing to reach any field. There's no notion of byte offsets; every character must be scanned. Random access to a field deep in a large JSON object isn't possible without parsing everything before it.

Plain MessagePack encodes fields as key-value pairs, sequentially. To read a specific field, you have to walk through the byte stream until you find it. This is faster than JSON text parsing, but it's still sequential—no fixed offsets, no direct jumps.

BSON (MongoDB's format) does enable random-ish access via a length-prefix-then-skip traversal, but it stores field names inside every document. For a table of 100,000 records that all have the fields { id, name, age, email, createdAt }, BSON repeats those five field name strings in every single stored document. That's a lot of redundant bytes, both on disk and in memory, and you still have to traverse them sequentially to find what you want.

Structon shares structure definitions across all records with the same shape. The field names are stored once; individual records store only values, laid out at fixed offsets. The result is records that are more compact than BSON and faster to read than anything requiring sequential traversal.

Standalone Package, Previously Hidden

This encoding has been part of msgpackr since early on, as the struct.js module with randomAccessStructure: true option. It wasn't designed as a standalone API—it registered global hooks into msgpackr's encode/decode path and wasn't intended for independent use.

With msgpackr 2.0, we've removed struct.js from msgpackr entirely. structon is now where this functionality lives. It's also more general: instead of being coupled to msgpackr's internals, structon provides a createStructon(BaseClass) factory that wraps any compatible encoder class—msgpackr's Packr or cbor-x's Encoder—and adds struct encoding on top.

Usage

Install the package:

npm install structon

Then wrap your base encoder:

import { Packr } from 'msgpackr'; import { createStructon } from 'structon'; const Structon = createStructon(Packr); // Pass structures: [] to enable shared structure persistence const codec = new Structon({ structures: [] }); const encoded = codec.encode({ name: 'Alice', age: 30, score: 98.6 }); const record = codec.decode(encoded); // Property access is lazy — reads directly from the binary buffer console.log(record.age); // 30 console.log(record.name); // 'Alice'

The same factory works with cbor-x:

import { Encoder } from 'cbor-x'; import { createStructon } from 'structon'; const Structon = createStructon(Encoder); const codec = new Structon({ structures: [] });

If you were previously using msgpackr with randomAccessStructure: true, migrating to structon is the path forward with msgpackr 2.0.

Persistence

Structure definitions need to persist across process restarts to remain useful—otherwise every new process starts with no known shapes and can't read previously encoded data. structon integrates with msgpackr's and cbor-x's standard persistence hooks (getStructures/saveStructures for msgpackr, getShared/saveShared for cbor-x). How you persist them—RocksDB, LMDB, a local file—is up to you.

In Harper, structure definitions are stored in table metadata in the database. When a record with a new shape is first encoded, Harper saves that structure definition so all future records with the same shape can reuse it without any overhead. This means the encoding path is effectively the same cost as plain MessagePack after the first encounter of a given shape. And you can update record structures/schemas and use hetorogenous record shapes without impacting existing data (each record retains a reference to the structure it needs), making it easy to migrate to new structures. If you are directly using structon or msgpackr, you would need to track/persist these structures. Tracking stored structures can be complicated, especially intertwined with transactions and concurrency, but Harper handles this for you.

(The only scenario where random-access struct encoding is likely a net negative is highly dynamic data where nearly every record has a different set of property names—which is generally a sign the data would be better modeled as a Map anyway.)

Performance

The performance advantage shows up most clearly in filtered query workloads: multi-condition queries that need to check one or two fields from each candidate record, and full scans of large tables where most fields are never needed. With random-access struct encoding, those access patterns skip deserialization of everything they don't read. The GC pressure from allocating intermediate objects for fields that are immediately discarded disappears.

What's Next

Structon is published on npm. The source is at github.com/HarperFast/structon. The binary format is identical to what msgpackr's struct.js produced, so data encoded with earlier versions of Harper or msgpackr with randomAccessStructure: true is fully readable by structon—and vice versa.

Click Below to Get the Code