BEJSON Intro Documentation
What is BEJSON?
BEJSON stands as a strict, self-describing tabular data format meticulously constructed upon the universally recognized JSON standard. It was designed to address specific challenges in data serialization, particularly concerning consistency, validation, and performance, by introducing a set of rigid rules and structural guarantees.
At its core, BEJSON is engineered for scenarios where data integrity and efficient processing are paramount. Unlike many flexible JSON applications, BEJSON imposes a clear, predictable structure that facilitates automated parsing and validation processes without the need for external schema definitions.
Core Principles of BEJSON
The fundamental design of BEJSON revolves around several key principles that distinguish it from conventional JSON usage:
- Positional Integrity: This is perhaps the most defining characteristic of BEJSON. It mandates that the order of field definitions within the top-level "Fields" array must exactly match the order of corresponding values in every record within the "Values" array. This strict positional mapping ensures that each data point can be unequivocally identified by its index, rather than by a textual key.
- Index-Based Access: A direct consequence of positional integrity is the ability to access data fields by their numerical index. This eliminates the overhead of string-based key lookups, leading to faster parsing and more efficient data retrieval. Applications can read and process BEJSON records with high throughput, as they know precisely where each piece of data resides within a record.
- Embedded Schema Validation: Every BEJSON document is self-describing because its schema is an intrinsic part of the document itself. The "Fields" array defines not only the names of the data points but also their expected "type" (e.g., "string", "integer", "number", "boolean", "array", "object"). This embedded schema allows for robust and immediate validation of the data without requiring separate schema files or external definitions, simplifying data exchange and ensuring data quality at the point of consumption.
- Strong Typing: The explicit declaration of data types for each field in the "Fields" array enforces strong typing. While null is always a valid value for any type (to denote optional or missing data), all other values must conform to their declared type. This predictability helps prevent common data parsing errors and ensures consistency across datasets.
By adhering to these principles, BEJSON provides a robust and efficient framework for managing tabular data, making it particularly well-suited for applications demanding high reliability, performance, and built-in data validation.
Core Components and Common Rules
Every BEJSON document, regardless of its specific version, adheres to a foundational structure defined by six mandatory top-level keys. These keys provide the essential metadata and data containers that make BEJSON a self-describing and strictly organized format. Understanding these core components and the common rules that govern them is crucial for both generating and parsing BEJSON effectively.
Mandatory Top-Level Keys
All BEJSON documents must include the following six keys at their root level:
- "Format": This key must always have the string value "BEJSON". It serves as an immediate identifier, signifying that the JSON document conforms to the BEJSON specification.
- "Format_Version": Specifies the exact version of the BEJSON format being used. Its value must be one of the permitted version strings: "104", "104a", or "104db". This key is critical for parsers to apply the correct version-specific rules and validations.
- "Format_Creator": Identifies the original creator of the BEJSON specification. This key must always hold the string value "Elton Boehnen".
- "Records_Type": An array of strings that declares the entity or entities represented by the records within the "Values" array. For single-entity versions (104 and 104a), this array contains exactly one string. For the multi-entity version (104db), it will contain two or more unique strings.
- "Fields": This key holds an array of objects, where each object describes a single data field (column) present in the records. It defines the name and data type for each field, effectively serving as the document's embedded schema.
- "Values": This key contains an array of arrays. Each inner array represents a single record (row) of data, with its elements corresponding positionally to the field definitions in the "Fields" array.
A typical BEJSON document will therefore begin with a structure similar to this:
{
"Format": "BEJSON",
"Format_Version": "104" | "104a" | "104db",
"Format_Creator": "Elton Boehnen",
"Records_Type": [ ... ],
"Fields": [ ... ],
"Values": [ [ ... ], [ ... ], ... ]
}
Common Rules for the "Fields" Array
The "Fields" array is central to BEJSON's self-describing nature. It is an array where each element is an object that defines a data field. The common rules applicable to all BEJSON versions for this array are:
- Each field object must have a "name" property (a string) and a "type" property (also a string). The "type" specifies the expected data type for values in that field, chosen from: "string", "integer", "number", "boolean", "array", or "object".
- Field names must be unique within the "Fields" array. Duplicate field names are not permitted.
- In BEJSON 104db, field objects also include a "Record_Type_Parent" property, which assigns the field to a specific entity type declared in "Records_Type". This property is not used in versions 104 or 104a.
An example field definition might look like:
{
"name": "timestamp",
"type": "string"
}
Common Rules for the "Values" Array
The "Values" array is where the actual tabular data resides. It is an array of arrays, where each inner array represents a single record. The strict rules governing this array ensure positional integrity:
- Positional Correspondence: The most critical rule is that each inner array (record) in "Values" must contain exactly as many elements as there are field definitions in the "Fields" array. The position of a value within a record directly corresponds to the position of its field definition in the "Fields" array.
- Type Matching: Each value within a record must conform to the "type" declared for its corresponding field in the "Fields" array. For instance, if a field is declared as "type": "integer", its value in the "Values" array must be an integer.
- Handling Missing Data with Null: If a particular data point is optional or genuinely missing for a record, its corresponding position in the "Values" array must be filled with the JSON
nullvalue. This is crucial for maintaining the required fixed length and positional integrity of each record. Thenullvalue is considered valid for any declared field type.
For example, if "Fields" defines three fields, every record in "Values" must have exactly three elements:
"Values": [
["value1_field1", "value1_field2", "value1_field3"],
["value2_field1", null, "value2_field3"] // null used for missing value
]
These common rules form the bedrock of BEJSON's structure, ensuring that all variants maintain a high degree of predictability and consistency, which are vital for its core benefits of fast parsing and strong embedded validation.
BEJSON 104: Single-Entity Data Streams
BEJSON 104 represents the foundational variant of the BEJSON specification, specifically designed for handling homogeneous, high-throughput data streams. It is optimized for scenarios where all records within a document represent the same type of entity, making it ideal for applications such as logging, collecting metrics, or archiving consistent data points.
Primary Characteristics and Use Cases
The core philosophy of BEJSON 104 is simplicity and efficiency for single-entity data. Key characteristics include:
- Single Record Type: A BEJSON 104 document is strictly limited to one type of record. This is reflected in the "Records_Type" array, which must contain exactly one string, representing the name of the entity being described (e.g., "SensorReading", "WebServerLogEntry"). This singular focus ensures that every record in the "Values" array adheres to the exact same schema defined in "Fields".
- Homogeneous Data: This version excels when dealing with large volumes of data where each entry has an identical structure. For example, a continuous stream of sensor readings, system events, or financial transactions can be efficiently stored and processed using BEJSON 104.
- High-Throughput Environments: The strict structure and single-entity nature of BEJSON 104 enable very fast parsing and serialization. This makes it particularly suitable for performance-critical applications that generate or consume data at high rates, such as real-time analytics pipelines or data archiving systems.
Support for Complex Data Types
One significant feature of BEJSON 104 is its comprehensive support for complex data types. Fields can be declared not only as primitive types like "string", "integer", "number", or "boolean", but also as "array" or "object". This flexibility allows for rich, hierarchical data within each record, without compromising the overall tabular structure. For instance, a log entry might include an array of tags or an object containing detailed error information.
Restriction on Custom Top-Level Keys
To maintain its strict and streamlined nature, BEJSON 104 imposes a significant restriction: no custom top-level keys are allowed beyond the six mandatory keys (Format, Format_Version, Format_Creator, Records_Type, Fields, Values). This rule ensures that the document's structure remains predictable and free from application-specific metadata that could complicate universal parsing. Any additional metadata relevant to the dataset must typically be embedded within the records themselves (e.g., as part of an 'object' type field) or managed externally.
The Parent_Hierarchy Exception
There is one specific, built-in exception to the custom top-level key restriction in BEJSON 104: the optional "Parent_Hierarchy" key. This key is intentionally ambiguous in its definition, allowing applications to use it for various organizational purposes, such as signaling a folder path, an index hierarchy, or any other logical grouping. Its presence does not violate the BEJSON 104 specification, but its interpretation is left to the consuming application.
Example of BEJSON 104
Consider an example of sensor readings, a classic use case for BEJSON 104:
{
"Format": "BEJSON",
"Format_Version": "104",
"Format_Creator": "Elton Boehnen",
"Records_Type": ["SensorReading"],
"Fields": [
{"name": "sensor_id", "type": "string"},
{"name": "timestamp", "type": "string"},
{"name": "temperature", "type": "number"},
{"name": "tags", "type": "array"}
],
"Values": [
["S001", "2026-01-10T12:00:00Z", 23.5, ["indoor","ground"]],
["S002", "2026-01-10T12:00:00Z", 19.8, null]
]
}
In this example, the "Records_Type" clearly identifies the data as "SensorReading". The "Fields" array defines four columns, including a "tags" field of "array" type. The "Values" array then provides the actual data, with null correctly used for a missing "tags" value in the second record, maintaining positional integrity.
BEJSON 104a: Metadata and Primitive Configurations
BEJSON 104a is a specialized variant of the BEJSON format tailored for scenarios requiring file-level metadata alongside data primarily composed of primitive types. While still maintaining the core BEJSON principles of positional integrity and embedded schema, 104a introduces specific features that make it highly suitable for configuration files, health check reports, and simple log entries where overarching context is as important as the individual data records.
Allowance for Custom Top-Level Metadata
A key distinguishing feature of BEJSON 104a is its permission to include custom top-level keys. Unlike BEJSON 104, which strictly limits top-level keys to the six mandatory ones (plus the optional Parent_Hierarchy), BEJSON 104a allows users to embed additional metadata directly at the root of the document. This metadata provides contextual information about the entire file, rather than individual records.
- Custom keys must be named using PascalCase. This convention helps differentiate them from the mandatory BEJSON keys and promotes consistency.
- Custom keys must not collide with any of the six mandatory BEJSON keys (
Format,Format_Version,Format_Creator,Records_Type,Fields,Values). This prevents ambiguity and ensures the integrity of the core BEJSON structure. - It's important to note that these custom keys are intended for file-level metadata only; they should never be used for per-record data, which must always reside within the
"Values"array.
Examples of such custom metadata could include "Schema_Version", "Application_Name", "Deployment_Environment", or "Retention_Days", providing valuable context for the data contained within the file.
Restriction to Primitive Data Types
In contrast to BEJSON 104's support for complex types, BEJSON 104a imposes a restriction on the data types allowed within its "Fields" array. Fields in 104a documents are limited to primitive types only:
- "string"
- "integer"
- "number"
- "boolean"
This means that fields cannot be declared as "array" or "object". This simplification makes BEJSON 104a particularly straightforward to parse and validate, as there are no nested structures to navigate within the records themselves. While null is still a valid value for any field, the values must otherwise adhere to these primitive types.
Suitability for Configuration Files and Health Checks
The combination of custom top-level metadata and primitive data type restriction makes BEJSON 104a exceptionally well-suited for:
- Configuration Files: Applications often require configuration parameters that are simple key-value pairs, potentially with descriptions or sensitivity flags. BEJSON 104a allows these parameters to be stored in a structured, self-describing format, with file-level metadata providing context about the configuration set (e.g., which server it applies to, its environment).
- Health Checks and Simple Metrics: For reporting the status of services or collecting basic operational metrics, BEJSON 104a can provide a clear, concise format. The custom headers can denote the service name, timestamp of the check, or system identifier, while the records themselves detail individual health parameters (e.g., "CPU_Load", "Disk_Usage", "Service_Status").
- Simple Logs: When log entries consist of straightforward, non-nested data points, 104a can be a lightweight alternative, especially if file-level metadata is desired to describe the log source or retention policy.
Example of BEJSON 104a
Here is an example demonstrating BEJSON 104a for configuration parameters:
{
"Format": "BEJSON",
"Format_Version": "104a",
"Format_Creator": "Elton Boehnen",
"Server_ID": "WEB-01",
"Environment": "Production",
"Retention_Days": 90,
"Records_Type": ["ConfigParam"],
"Fields": [
{"name": "key", "type": "string"},
{"name": "value", "type": "string"},
{"name": "sensitive", "type": "boolean"}
],
"Values": [
["db_host", "prod-db-01", true],
["max_threads", "32", false]
]
}
In this example, "Server_ID", "Environment", and "Retention_Days" are custom top-level metadata keys, providing context for the configuration parameters. The "Fields" array defines three primitive type fields, and the "Values" array contains the corresponding configuration data.
BEJSON 104db: Multi-Entity Tabular Data
BEJSON 104db (database) is the most sophisticated variant of the BEJSON specification, designed to serve as a lightweight, multi-entity tabular database within a single JSON file. It extends the core BEJSON principles to manage heterogeneous records, enabling the representation of relationships between different types of data entities. This version is particularly useful for small, self-contained datasets where a full-fledged database system might be overkill, but structured relationships are still necessary.
Key Features for Multi-Entity Management
- Multiple Record Types: Unlike BEJSON 104 and 104a, BEJSON 104db explicitly supports two or more unique entity names in its "Records_Type" array. Each string in this array represents a distinct type of record (e.g., "User", "Order", "Product") that can coexist within the same "Values" array.
- No Custom Top-Level Keys: Similar to BEJSON 104, BEJSON 104db strictly prohibits custom top-level keys. All data, including metadata for individual entities, must be represented within the structured "Fields" and "Values" arrays.
- Support for Complex Types: Like BEJSON 104, this version fully supports complex data types such as "array" and "object" within its fields, allowing for rich, nested data structures within each entity's records.
The Critical Role of Record_Type_Parent
The defining characteristic of BEJSON 104db is the introduction and critical role of the Record_Type_Parent field and property. This mechanism allows the document to discriminate between different record types stored within a single, unified "Values" array.
The Record_Type_Parent Field
In BEJSON 104db, the first field in the "Fields" array MUST be a discriminator field defined as:
{"name": "Record_Type_Parent", "type": "string"}
The value at position 0 in every record within the "Values" array must then exactly match one of the entity names declared in the top-level "Records_Type" array. This value acts as the discriminator, indicating which entity type the current record represents.
The Record_Type_Parent Property in Fields
Beyond the discriminator field itself, every other field object in the "Fields" array must include a "Record_Type_Parent" property. This property assigns the field to a specific entity type (e.g., "Record_Type_Parent": "User"). This is a crucial difference from other BEJSON versions, as it means there are no "common fields" shared across all entities; each field belongs exclusively to one declared record type.
Handling Non-Applicable Fields
Due to the single, shared "Fields" array and the multi-entity nature, records of one entity type will have fields defined for other entity types that are not applicable to them. For such non-applicable fields, the corresponding value in the record's "Values" entry must be null. This strict requirement maintains positional integrity and ensures that every record still has the same total number of elements as there are fields defined in "Fields".
Enabling Relationships and Foreign Keys
BEJSON 104db facilitates the representation of relationships between entities. By including shared ID fields (e.g., a "user_id" in a "User" entity and an "owner_user_id_fk" in an "Item" entity), logical primary key/foreign key relationships can be established. While the BEJSON specification does not enforce these relationships, it recommends a convention:
- Use the
_fksuffix on foreign key fields (e.g.,owner_user_id_fk). This convention helps automated mapping tools and human readers understand the intended relationships within the data.
Example of BEJSON 104db
Consider a simple multi-entity dataset for users and posts:
{
"Format": "BEJSON",
"Format_Version": "104db",
"Format_Creator": "Elton Boehnen",
"Records_Type": ["User", "Item"],
"Fields": [
{"name": "Record_Type_Parent", "type": "string"},
{"name": "created", "type": "string", "Record_Type_Parent": "User"},
{"name": "user_id", "type": "string", "Record_Type_Parent": "User"},
{"name": "username", "type": "string", "Record_Type_Parent": "User"},
{"name": "created_at", "type": "string", "Record_Type_Parent": "Item"},
{"name": "item_id", "type": "string", "Record_Type_Parent": "Item"},
{"name": "name", "type": "string", "Record_Type_Parent": "Item"},
{"name": "owner_user_id_fk", "type": "string", "Record_Type_Parent": "Item"}
],
"Values": [
["User", "2026-01-01", "U01", "alice", null, null, null, null],
["User", "2026-01-02", "U02", "bob", null, null, null, null],
["Item", null, null, null, "2026-01-10", "I01", "Report A", "U01"],
["Item", null, null, null, "2026-01-10", "I02", "Report B", "U02"]
]
}
In this example:
- "Records_Type" defines "User" and "Item" entities.
- The first field is "Record_Type_Parent", acting as the discriminator.
- Fields like "created", "user_id", and "username" are assigned to the "User" entity via their "Record_Type_Parent" property.
- Fields like "created_at", "item_id", "name", and "owner_user_id_fk" are assigned to the "Item" entity.
- Notice how "User" records use
nullfor all "Item"-specific fields, and "Item" records usenullfor all "User"-specific fields, maintaining the consistent record length. The "owner_user_id_fk" field in "Item" records logically links back to the "user_id" in "User" records.
BEJSON 104db provides a powerful yet constrained way to manage complex, related tabular data within a single, self-describing file, making it an excellent choice for lightweight database implementations or structured data exchange involving multiple entity types.
Ensuring Integrity: Validation and Best Practices
The strength of BEJSON lies in its strict adherence to a defined structure, which inherently enables robust validation and promotes data integrity. Understanding the validation rules and adopting best practices are crucial for anyone working with BEJSON to ensure data quality, interoperability, and long-term maintainability.
Validation Summary
A BEJSON document is considered valid if it meets the following criteria, encompassing both general JSON validity and BEJSON-specific rules:
General BEJSON Validation Rules (Applicable to All Versions)
- Valid JSON Syntax: The entire document must be syntactically correct JSON.
- Mandatory Top-Level Keys: All six core keys must be present at the document's root:
"Format","Format_Version","Format_Creator","Records_Type","Fields", and"Values". - Correct Key Values:
"Format"must be exactly "BEJSON"."Format_Version"must be one of "104", "104a", or "104db"."Format_Creator"must be exactly "Elton Boehnen".
- "Fields" Array Structure:
"Fields"must be an array of objects.- Each field object must have a "name" (string) and a "type" (string) property.
- There must be no duplicate field names within the
"Fields"array.
- "Values" Array Structure and Positional Integrity:
"Values"must be an array of arrays.- Every inner array (record) in
"Values"must have exactly the same number of elements as there are entries in the"Fields"array. - The type of each value in a record must match its corresponding declared "type" in the
"Fields"array. The JSONnullvalue is always permitted for any type.
Version-Specific Validation Rules
- BEJSON 104:
"Records_Type"array must contain exactly one string.- No custom top-level keys are allowed, with the sole exception of the optional built-in
"Parent_Hierarchy"key. - Supports complex types (
"array","object") in field definitions.
- BEJSON 104a:
"Records_Type"array must contain exactly one string.- Custom top-level keys are allowed for file-level metadata, provided they are PascalCase and do not conflict with the six mandatory BEJSON keys.
- Fields are restricted to primitive types only:
"string","integer","number","boolean".
- BEJSON 104db:
"Records_Type"array must contain two or more unique strings.- No custom top-level keys are allowed.
- The first field in the
"Fields"array must be{"name": "Record_Type_Parent", "type": "string"}. - Every other field object in
"Fields"(i.e., all fields except the discriminator itself) must have a"Record_Type_Parent"property assigning it to one of the entities declared in"Records_Type". - The value at position 0 in every record in
"Values"must exactly match one of the strings in the top-level"Records_Type"array. - For fields not applicable to a given record's
Record_Type_Parent, the corresponding value in the"Values"array must benull. - Supports complex types (
"array","object") in field definitions.
Best Practices for BEJSON Design and Maintenance
Adhering to best practices enhances the readability, maintainability, and forward compatibility of BEJSON documents:
- Naming Conventions:
- Use
snake_casefor field names (e.g.,sensor_id,last_modified_timestamp). - Use
PascalCasefor custom top-level headers in BEJSON 104a (e.g.,Schema_Version,Application_Name).
- Use
- Schema Evolution - Adding New Fields:
- To maintain backward compatibility and positional integrity for existing parsers, always add new fields only at the end of the
"Fields"array. - When adding new fields, ensure that existing records are updated (if applicable) or that
nullvalues are appropriately used to fill the new positions, preserving the fixed record length.
- To maintain backward compatibility and positional integrity for existing parsers, always add new fields only at the end of the
- Schema Evolution - Breaking Changes:
- Removing a field or retyping an existing field (e.g., changing from "string" to "integer") constitutes a major breaking change. Such changes will invalidate existing parsers and should be handled with extreme caution, often requiring a new application schema version or careful migration.
- Handling Missing Data:
- Always use the JSON literal
nullto represent truly missing or non-applicable data. - Avoid using empty strings (
""), empty arrays ([]), or empty objects ({}) unless an empty value has a specific semantic meaning within your application's schema. This distinction helps maintain data clarity and consistency.
- Always use the JSON literal
- Application Schema Versioning:
- It is good practice to version your application's specific schema independently of the BEJSON
"Format_Version". - For BEJSON 104a and 104db, use a custom top-level header like
"Schema_Version": "v1.0". - For BEJSON 104 (where custom headers are forbidden), embed the schema version within the
"Records_Type"string (e.g.,["SensorReading_v1_0"]) or manage it externally.
- It is good practice to version your application's specific schema independently of the BEJSON
- Managing Large Datasets:
- For very large datasets, consider splitting the data into multiple complete BEJSON files. This can improve manageability, parsing performance, and reduce memory footprint.
- Security Considerations:
- When dealing with sensitive information, ensure that BEJSON files are encrypted at rest and in transit, as the format itself does not provide intrinsic encryption.
- 104db Specific Conventions:
- Utilize the
_fksuffix on foreign key fields (e.g.,owner_user_id_fk) in BEJSON 104db to clearly signal relationships between entities. While not enforced by the spec, this is a widely recommended convention for clarity. - Consider the Event/Audit entity pattern in BEJSON 104db: define a dedicated "Event" entity in
"Records_Type"to implement audit trails. Link it to other entities using fields likerelated_entity_id_fk, and store before/after state or change details in an"object"type field (e.g.,change_details).
- Utilize the
By diligently applying these validation checks and best practices, developers and data architects can fully leverage BEJSON's capabilities to create highly reliable, efficient, and well-structured tabular data solutions.

