Tag Archives: #Postgresql

A Good Database BUG

People generally do not think of cockroaches positively, but I have nothing but good feelings about CockroachDB. At its core, CockroachDB is resilient and reliable.

Cockroach Labs, a software company known for its cloud-native SQL databases, has found a home in Bengaluru, India. With a rapidly growing team of over 55 engineers specializing in database and cloud engineering, the company’s journey in India is as much about emotional ties as it is about strategic growth.

Bengaluru’s choice is strategic. It offers unparalleled time zone advantages and access to a rich talent pool. With a population of 1.4 billion and a digitizing economy, India is ideal for testing CockroachDB’s resilience and scalability.

The company plans to expand its Bengaluru office into a first-class R&D hub. Teams are working on innovations like vector data integration for AI, enabling operational databases to evolve into systems capable of real-time intelligence.

Building Blocks of CockroachDB

The founders’ lack of a transactional distributed database forced them to use DynamoDB, leading to inefficiencies in their early startup years. This frustration led to the birth of Cockroach Labs in 2014, with a vision to create an open-source, cloud-native distributed database.


I am a HUGE advocate of open-source databases, so this journey is intriguing. Not sitting with inefficiencies but finding a way to grow beyond them is a significant step for any startup.

True to its name, CockroachDB has built a reputation for resilience. It can run seamlessly across cloud providers, private data centers, and hybrid setups, making it a standout choice. Cockroach Labs focuses on eliminating vendor lock-in and ensuring businesses can operate uninterrupted, even during cloud or data center outages. I can’t say enough how important it is not to be locked into one cloud provider. This is a serious flex for an open-source database NOT to be “vendor dependent.” Staying in the driver’s seat and not becoming a passenger or going along for a ride with a service provider is ideal. Retaining the power of “choice” as a customer is priceless. This adaptability has made Cockroach Labs the operational backbone for global giants like Netflix and ambitious startups like Fi.

Sharing some notes on my explorer experience:

Getting Started

Install CockroachDB on Ubuntu (using Bash Shell):

1. Update Your System: First, update your system packages to the latest version:
   
   sudo apt update -y
   sudo apt upgrade -y
   

2. Install the required dependencies:
   
   sudo apt install -y apt-transport-https ca-certificates curl software-properties-common
  

3. Download the latest version of CockroachDB:

$ curl https://binaries.cockroachdb.com/cockroach-v24.3.1.linux-amd64.tgz | tar -xvz

or
     https://binaries.cockroachdb.com/cockroach-latest.linux-amd64.tgz | tar xvz
   

4. Move the binary to a directory in your PATH:

      sudo cp -i cockroach-latest.linux-amd64/cockroach /usr/local/bin/
   

5. Verify the installation by checking the CockroachDB version:
   
   cockroach version
   

6. Initialize CockroachDB Cluster: Create a directory for CockroachDB data and initialize the cluster:
   
   sudo mkdir -p /var/lib/cockroach
   sudo chown $(whoami) /var/lib/cockroach
   cockroach start-single-node --insecure --store=/var/lib/cockroach --listen-addr=localhost:26257 --http-addr=localhost:8080
   

7. Connect to CockroachDB SQL Shell: Connect to the CockroachDB SQL shell:
   
   cockroach sql --insecure --host=localhost:26257
   

8. Run CockroachDB as a Background Service: Create a systemd service file to run CockroachDB as a background service:
   
   sudo nano /etc/systemd/system/cockroach.service
   
   Add the following configuration:
   ini
   [Unit]
   Description=CockroachDB
   Documentation=https://www.cockroachlabs.com/docs/

   [Service]
   Type=notify
   ExecStart=/usr/local/bin/cockroach start-single-node --insecure --store=/var/lib/cockroach --listen-addr=localhost:26257 --http-addr=localhost:8080
   TimeoutStartSec=0
   Restart=always
   RestartSec=10

   [Install]
   WantedBy=multi-user.target
   

9. Enable and Start the Service: Reload the systemd manager configuration, start the CockroachDB service, and enable it to run on system startup:
   
   sudo systemctl daemon-reload
   sudo systemctl start cockroach
   sudo systemctl enable cockroach
   sudo systemctl status cockroach
   
CockroachDB is now installed and running on your Ubuntu system. 

Cockroach Labs is continuing to invests heavily in AI-specific features, including support for vector similarity searches and operationalizing AI workflows.

Here's an example of how you can use CockroachDB with AI, specifically leveraging vector search for similarity searches:

1. Install CockroachDB: Follow the steps I provided earlier to install CockroachDB on your system.

2. Connect to CockroachDB and create a database and table to store your data:
 
   cockroach sql --insecure --host=localhost:26257
   CREATE DATABASE ai_example;
   USE ai_example;
   CREATE TABLE vectors (id INT PRIMARY KEY, vector FLOAT[] NOT NULL);
 

3. Insert some sample data into the table:(at sql prompt) Steps 3 -5

   INSERT INTO vectors (id, vector) VALUES (1, ARRAY[1.0, 2.0, 3.0]), (2, ARRAY[4.0, 5.0, 6.0]);


4. Enable the `pgvector` extension for vector similarity searches:
sql>
   CREATE EXTENSION IF NOT EXISTS pgvector;


5. Use the `pgvector` extension to perform a similarity search:
sql>
   SELECT id, vector, similarity(vector, ARRAY[2.0, 3.0, 4.0]) AS similarity_score
   FROM vectors
   ORDER BY similarity_score DESC;
 

Create a table to store vectors, and perform similarity searches using the `pgvector` extension.

 "pgvector" enables similarity searches by comparing high-dimensional vectors, making it useful for tasks like finding similar items in recommendation systems, which is an AI tool. 

Yes. CockroachDB is compatible with PostgreSQL, which means you can use many PostgreSQL tools, libraries, and client applications. This can be a bridge in learning about this database, which is also a plus.

pgvector" enables similarity searches by comparing high-dimensional vectors, making it useful for tasks like finding similar items in recommendation systems, which is an AI tool.

Yes. CockroachDB is compatible with PostgreSQL, which means you can use many PostgreSQL tools, libraries, and client applications. This can be a bridge in learning about this database, which is also a plus.

I am looking forward to testing these new developments from Cockroach Labs. There is a wealth information contained in their repository (linked-below) as well as number of repos from the open-source database community. Their investment in AI is key to the company’ sustainable growth.

https://github.com/cockroachlabs

https://www.cockroachlabs.com

Learn more about pgvector in this repo

Take Control Over Your Database Solution

As an engineer, choosing the right database solution is crucial for the success of any project. Let’s compare four popular databases: SQL IaaS, Azure SQL DB, Cosmos DB, and PostgreSQL HDInsight/Hadoop. We will explore their key similarities and advantages to help you make an informed decision when selecting a database for your engineering needs.

SQL IaaS
SQL IaaS (Infrastructure as a Service) is a traditional SQL Server database hosted on a virtual machine in the cloud. It offers a familiar SQL Server environment with control over the underlying infrastructure. Some key advantages of SQL IaaS include the following:

  • Complete control over the operating system and database configurations.
  • Easy migration of existing SQL Server databases to the cloud.
  • Flexibility to scale resources up or down based on workload demands.

Azure SQL DB
Azure SQL DB is a fully managed, intelligent, and scalable relational database service provided by Microsoft Azure. It is built on the SQL Server engine, designed for cloud environments. Critical advantages of Azure SQL DB include:

  • Automatic scaling and performance tuning, minimizing the need for manual management.
  • High availability with automatic backups and built-in disaster recovery options.
  • Integration with other Azure services for seamless application development and deployment.

Cosmos DB
Cosmos DB is a globally distributed, multi-model database service provided by Azure. It supports NoSQL document, key-value, graph, and columnar data models. Critical advantages of Cosmos DB include:

  • With low latency, global distribution allows users to replicate data across multiple regions.
  • Multiple data models for flexible schema design and diverse application requirements.
  • Guaranteed low latency and high throughput for mission-critical workloads.

PostgreSQL

PostgreSQL, an open-source relational database management system (RDBMS), has gained significant popularity among engineers due to its feature-rich nature and strong emphasis on standards compliance. Here are some critical advantages of PostgreSQL:

  • Relational Model: PostgreSQL follows the relational model, making it an excellent choice for structured data storage and complex queries. It supports SQL, allowing engineers to leverage their existing SQL knowledge.
  • ACID Compliance: PostgreSQL guarantees ACID (Atomicity, Consistency, Isolation, Durability) properties, ensuring data integrity and reliability. This makes it suitable for transactional applications that rely on data consistency.
  • Extensibility and Flexibility: PostgreSQL offers a wide range of extensions, allowing engineers to customize and extend its functionality according to specific requirements. It supports various data types, including JSON, arrays, and geospatial data, making it versatile for diverse use cases.

HDInsight/Hadoop
HDInsight is a fully managed, open-source Apache Hadoop service offered by Microsoft Azure. It provides a scalable and reliable platform for processing and analyzing large datasets. Key advantages of HDInsight/Hadoop include:

  • Support for big data processing using Hadoop’s distributed computing framework.
  • Seamless integration with various data sources, including structured, semi-structured, and unstructured data.
  • Advanced analytics capabilities with the integration of popular tools like Apache Spark and Apache Hive.

Key Similarities:
While each of these databases has unique features, they also share some similarities:

  • Integration with Azure: All five databases are part of the Microsoft Azure ecosystem, enabling seamless integration with other Azure services.
  • Scalability: Each database provides scalability options to handle increasing workloads effectively.
  • Security: All databases offer robust security features to protect data, including encryption at rest and in transit.


Choosing the right database solution is crucial for engineering projects. SQL IaaS, Azure SQL DB, Cosmos DB, PostgreSQL, and HDInsight/Hadoop offer various advantages depending on your specific requirements. SQL IaaS provides control and flexibility, Azure SQL DB offers managed scalability, and Cosmos DB excels in global distribution and multi-model capabilities. PostgreSQL is a robust relational database offering data integrity, flexibility, and extensibility. On the other hand, HDInsight/Hadoop provides scalability, fault tolerance, and a rich ecosystem for big data processing and analytics. Consider your project needs, scalability requirements, and data model preferences to make an informed decision. Remember, each database has its own strengths, so choose wisely to ensure optimal performance and efficiency in your engineering endeavors.