Taking Another Look at DocumentDB

Updated: This article has been revised for accuracy and includes additional context about the Linux Foundation DocumentDB.
DocumentDB logo
DocumentDB logo

My Experience

I joined a large government healthcare migration project several years ago, which taught me a great deal about database architecture, migration, and how to select and build the right infrastructure. When migrating health-related data, security decisions are critical for protecting sensitive patient information and maintaining compliance with regulations such as HIPAA, GDPR, and other healthcare data protection standards. A security breach during migration can result in private medical information being compromised. The organization I was working with as a consultant encountered a typical problem that affects many large businesses. Their data existed in separate legacy systems that required consolidation, modernization and security for new application access.

The Project

However, this project was layered with complexities. Although I cannot disclose the details regarding the security decisions made, it was a priority throughout the project. The data migration process required us to perform a full “lift and shift” operation. I wrote this blog to highlight some of the learnings and challenges I experienced. Here’s a shortlist of some of the steps we took:

Technical Challenges

Transferring terabytes of government data between different systems was our biggest challenge. These changes stem from an incompatibility with modern platforms, complex and poorly documented structures, and inconsistent data quality. Ideally, we also wanted to minimize the risk of data loss. The large volume of data further amplified these issues. All of these issues are common in government ecosystems.

An essential part of this project was the development of two new applications, utilizing Puppet for configuration management and RabbitMQ for message queuing, which added to the complexity. The scope was larger than just data. We also needed these tools to help manage the applications. Fortunately, I worked with a great team of people who created these customized experiences. Secondly, we deployed Azure infrastructure using ARM templates for Infrastructure as Code, which enabled the modernization of our systems. This was a key requirement as all of these components had to meet the requirements specified in our project.

Our primary goal was to migrate a portion of their database operations to MongoDB by replacing the legacy Azure DocumentDB as their primary document storage system. At the time, the decision to migrate from Azure DocumentDB to MongoDB was the right call. The standardization we needed emerged from MongoDB, which provided us with specialized tools tailored to our needs and operational procedures that matched our team’s skill set. The migration operation achieved its goal, and the new infrastructure operated as required by the government.

The Evolution of DocumentDB

The new Linux Foundation DocumentDB, established in January 2025 and transferred to Linux Foundation governance in August 2025, represents a different approach from the historical Azure DocumentDB (which became CosmosDB in May 2017). The recent announcements and architectural improvements may open the door to a closer examination of this new DocumentDB as a better solution for projects with similar requirements.

It is an open-source solution that receives support from Microsoft, Amazon, Google, and the open-source community to address the challenge of combining NoSQL flexibility with PostgreSQL’s reliability and performance. Highlighting some of the features that checked the box:

  • Connects document-based and relational database management systems.
  • Enables open innovation through vendor-agnostic methods.
  • Provides organizations at all levels with the ability to escape their reliance on cloud services. HUGE for the open source community.

The governance model addresses many of the vendor concerns that influenced our original migration decision. The new position of DocumentDB creates opportunities for government projects that require vendors to be independent and operate sustainably.

The Design of DocumentDB

DocumentDB system depends on PostgreSQL extensions for its operational power. This can be called a Power Play in open source databases.

The architecture of DocumentDB has always been interesting, but new developments have made it even more attractive.

I am always impressed when a database opens its doors to do more than it is expected to do. Here are a few of the highlights I discovered looking through the window of features:

  • As a result, the database system utilizes PostgreSQL’s JSONB format, offering compatibility with MongoDB’s API.
  • High-performance CRUD operations together with single/multi-key indexing and text and
    compound indexing, and advanced features including geospatial and vector search.
  • MongoDB API compatibility through its gateway protocol layer.
  • DocumentDB operates as an extension system for PostgreSQL.

The DocumentDB lineup: the flexibility of a schemaless NoSQL, a full-power ecosystem, and the tooling of PostgreSQL for hybrid workloads. The hybrid method offers significant benefits to government projects that involve complex data connections while complying with stringent regulatory requirements.

Data Migrations in Government

I have firsthand experience with the process of major government infrastructure change. Several considerations are made regarding database selection, but there is more to it than a technical evaluation shortlist.

Data requires multiple security controls and audit trails for its protection. The advanced security framework of PostgreSQL, combined with DocumentDB’s flexible document structure, yields enhanced compliance benefits. Some of these learnings remain applicable today and provide valuable experience in working with the government.

Furthermore, projects need to avoid vendor lock-in. The Linux Foundation addresses this issue through its stewardship model, which provides enterprise-level support.

Large government systems require integration at multiple points, which creates their complexity. Dual API of DocumentDB enables users to work with MongoDB databases and PostgreSQL tools through a single platform.

The IT teams operating within government organizations can leverage historical data to show complete mastery of the PostgreSQL system management. DocumentDB utilizes existing knowledge, eliminating the need to retrain the model completely.

DocumentDB Performance: Sharding in DocumentDB

Elastic scaling functionality provides a significant advancement from our initial project deployment. Sharding in DocumentDB now provides:

  • Distributes data through a shard key system that uses user_id or department_id as the distribution criteria.
  • Divides data between shards through hash functions which enable each shard to handle its own read and write operations.
  • Hard cardinality to achieve a simple distribution using a shard key.
  • Performs all partitioning, routing, balancing, and scaling operations without requiring user intervention.
  • Allows horizontal scaling through the addition of new shards, which enables throughput expansion without requiring any service interruptions.

Where are we now?

Linux Foundation DocumentDB is an open-source document database built on PostgreSQL that offers MongoDB API compatibility, combining document flexibility with PostgreSQL’s reliability while allowing developers to use standard MongoDB drivers. I am exploring so many types of APIs with the help of AI. I can quickly see the syntax of an API to kick start my builds, so now I am adding this to the list. Those are the nuts and bolts of it. This truly makes my open-source techie heart glad. The adoption by the Linux Foundation opens the doors for engineers, developers, explorers, entrepreneurs, students, and hobbyists to build and learn. The timing is ideal as we move through the AI landscape, building applications and agents. DocumentDB could be a viable backend solution to the problem we are trying to resolve.

Future Considerations from the Past

Knowing what we know now, I can’t help but wonder if we revisited those choices with the current capabilities of the Linux Foundation DocumentDB, would we make the same choices?

DocumentDB Evolution Timeline: This timeline clearly shows the progression from Microsoft’s internal Project Florence (2010) through Azure DocumentDB (2014-15), Azure Cosmos DB (2017), to the new open-source DocumentDB (2024-25) and its adoption by the Linux Foundation (August 2025). The key distinction is that Azure Cosmos DB (proprietary) and the Linux Foundation DocumentDB (MIT license) are two completely different projects serving different needs. -alt txt description of image

Try DocumentDB Yourself

Want to see how DocumentDB works? You can get hands-on experience by setting up DocumentDB locally on WSL (Windows Subsystem for Linux) or any Linux server. Please refer to the official Linux Foundation DocumentDB installation guide at github.com/documentdb/documentdb for current installation instructions.

Quickstart: First-Time DocumentDB Setup

Step 1: Install DocumentDB on WSL/Linux

Please refer to the official installation documentation at github.com/documentdb/documentdb for the most current setup instructions.

Step 2: Install Python Dependencies

# Install Python and pip if not already installed
sudo apt update
sudo apt install python3 python3-pip
# Install MongoDB Python driver
pip3 install pymongo

Step 3: Connect and Create Your First Database

import pymongo
from datetime import datetime

# Connect to DocumentDB using MongoDB drivers
client = pymongo.MongoClient(
    'mongodb://admin:password123@localhost:10260/'
)

# Create your first database and collection
db = client['myFirstDB']
collection = db['testCollection']

# Insert a simple document
test_doc = {
    "name": "DocumentDB Test",
    "created_at": datetime.now(),
    "status": "learning"
}
result = collection.insert_one(test_doc)
print(f"Inserted document with ID: {result.inserted_id}")

# Query the document back
found_doc = collection.find_one({"name": "DocumentDB Test"})
print(f"Found document: {found_doc}")

Step 3: Basic Operations

# Insert multiple documents
sample_data = [
    {"user_id": 1, "name": "Alice", "department": "Engineering"},
    {"user_id": 2, "name": "Bob", "department": "Sales"},
    {"user_id": 3, "name": "Carol", "department": "Marketing"}
]
collection.insert_many(sample_data)

# Create an index for better performance
collection.create_index([('user_id', pymongo.ASCENDING)])

# Query with filters
engineers = collection.find({"department": "Engineering"})
for engineer in engineers:
    print(f"Engineer: {engineer['name']}")

Application Integration: Puppet, RabbitMQ, and DocumentDB

Let’s do a little housekeeping, as I am still speaking about the stack I worked with. These terms were new to me when I started, so this is my summary definition of both. RabbitMQ is a message broker that handles communication between different applications. Puppet automates the deployment, configuration, and management of servers using code.

In a MongoDB environment, applications would use RabbitMQ to send messages about database events (like new document insertions) between microservices, while Puppet would automatically configure MongoDB instances, manage replica sets, deploy application code, and ensure consistent server configurations across the entire infrastructure. Automation of these services was key to our implementation.

Our migration project needed Puppet for configuration management and RabbitMQ for messaging to achieve complex application integration. The dual API support of Modern DocumentDB would have made it easier to implement these integration patterns.

Revisiting the Migration Decision

Looking back at our government migration project, the decision to move from Azure DocumentDB to MongoDB was appropriate given the technology landscape and requirements at the time. MongoDB’s standard tools and proven operational approaches contributed to the success of our project.

The Linux Foundation DocumentDB (established in 2025) now offers some significant capabilities:

  • Leverages IT infrastructure and tools through its PostgreSQL integration.
  • Elastic scaling feature improves the performance levels of applications.
  • PostgreSQL’s security model meets most compliance standards.
  • Linux Foundation Governance model provides vendor neutrality.
  • Minimal training needed for teams who already work with PostgreSQL.

Speaking of leveraging PostgreSQL

  • Linear scalability allows it to handle workload patterns effectively.
  • Provides both row-level security and multiple authentication features for security purposes.
  • Built-in logging and monitoring functionality through its Audit Trails feature.
  • Retention policies for mature PostgreSQL backup solutions.
  • High availability through its automatic failover for deployment across multiple regions.
  • Compliance with government frameworks.

My experience with migration projects has shown me that technology selection is based on availability and business needs. Our Azure DocumentDB-to-MongoDB migration was the right choice for our project’s requirements and timeline several years ago. However, technology development and availability can impact and create changes in a project. Furthermore, enhanced technology selection processes now use AI models. We can now do engineered prompting and analysis to review several scenarios. Open source has always been a consistent space for direct testing of systems, providing us with the opportunity to explore system limitations and flexibility across various operational environments. Ultimately, this Linux Foundation DocumentDB delivers operational simplicity, vendor independence, and robust technical functionality.

Related Article

Database Selection for Engineering Projects



 

Discover more from MsTechDiva

Subscribe to get the latest posts sent to your email.

Discover more from MsTechDiva

Subscribe now to keep reading and get access to the full archive.

Continue reading