Why SQL Will Last

Andrew Meredith
May 31, 2023

❝

The reports of my death are greatly exaggerated.

Mark Twain

As in this popular Mark Twain misquote¹ , the reports of SQL’s death are greatly exaggerated. Some of the latest prophecies place SQL at the losing end of a battle with AI, just waiting for the death blow. Most of the “SQL is dead” hype is pure, delicious clickbait. However, there are a handful of folks out there who honestly believe that SQL is on its last leg. So will it happen? Will SQL finally die?

We are in the middle of a hype cycle surrounding AI. There are alarmists who believe that AI will take all of our jobs and naysayers who hold that the current hype will fizzle out without making a lasting impression. I don’t pretend to know what impacts this AI spring will have five years down the road, but let’s just say that I’m not training for a new career of manual labor just yet.

📄 Remember when Mongo replaced SQL? Me either.

I don’t have any many gray hairs in my beard yet, but I have been in the industry long enough to remember when NoSQL was going to antiquate SQL. I was once on a team that decided to use MongoDB for a new feature instead of the MySQL database that the rest of the app was using. Why? Because MongoDB is webscale and, you know, SQL was dying. Long story short, we recreated a relational model in MongoDB, moving all the joins into a PHP app. We also lost data. Our reporting couldn’t be done in SQL anymore, so we had to turn to the mind-bending aggregation pipeline. Obscenities were hurled. Tears were shed. We delivered the project 6 months late. MongoDB is not a bad system, but unless you have a very simple data model, it requires much more work than a good ol’ SQL db. Eventually, the SQL 2016 standard adopted first-class JSON support, effectively letting you store your semi-structured data alongside your relational data.

SQL survived.

🕸️ What’s old is new again: graph databases

After the NoSQL movement was rebranded as “Not Only SQL,” the next potential SQL killer to appear was the mighty new graph database. Graphs are more general than the relational model, so shouldn’t graph databases be more powerful that SQL ones? Yes and no. SQL with recursive CTEs can do everything that graph databases do, but it kinda sucks. No, it really sucks. Graph databases are much simpler to use than SQL for so many problems…

So why don’t I think that graph DBs will kill SQL? Because SQL killed graph DBs. That’s right - the idea of being able to traverse a graph of objects in your database originated in the 1960s - only back then it was called the “network database model.” Network databases maintained pointers between records that the programmer could traverse, and it allowed for very flexible data storage and retrieval. However, when relational databases gained traction in the 1980s, network databases began fading away because compared to relational databases, they were slow and difficult to work with for typical business applications.

The graph databases of today are much more performant than the old network databases, but optimizing graph queries is just. plain. hard. That doesn’t mean that fast graph databases are impossible, but they are likely to remain niche while SQL continues to thrive. Also, as happened with JSON support, graph query support is probably coming to a SQL database near you soon. The SQL 2023 standard includes a new section on property graph queries that rip off are inspired by Neo4j’s Cypher language. In short, I think that in 10 years, those of us who are using graph data models will be using them in Postgres and SQL Server, not a dedicated graph DB. SQL will still be around, and it’ll be enriched by all of the capabilities it has absorbed from other systems.

SQL survived.

Data Spores: Tiny Ideas That Spread

Tips, news, and pondering from the data world.

🤖 AI: All your SQL are belong to us

Coming back to the present day, I see two contenders put forward as SQL replacements: AI and vector databases. Take this Tweet for example:

SQL is going to die at the hands of an AI. I’m serious.
@mayowaoshin is already doing this. Takes your company’s data and ingests it into ChatGPT. Then, you can create a chatbot for the data and just ask it questions using natural language.
This video demoes the output.
🤯
— Gagan Biyani 🏛 (@gaganbiyani)
2:30 PM • May 18, 2023

Don’t get me wrong - ChatGPT is an amazing technology with so many potential uses in the data world. Replacing a database engine is just not one of those things. People use databases is to get exact answers to precise questions with a high degree of confidence and predictable performance. AI is nowhere close to being able to deliver this. I use Copilot and ChatGPT to help me write queries all the time, but the number of times they give me a suggestion that is just straight up wrong is borderline terrifying. No, AI is not going replace SQL as a language or relational databases as a technology any time soon.

📈 Vector databases - nice, but niche

Let’s look at the other contender: vector databases. Now these are interesting because they are actually databases, and even though many of their use cases support AI applications, they are not intrinsically tied to AI. The innovation of vector DBs is being able to index high-dimensional vectors and find the most similar records using something like cosine similarity. In other words, if you have data that lends itself to being represented as a vector - so basically anything you would feed into a neural network - you can quickly find things that are “close.” For example, if you were creating a tech support chatbot based on ChatGPT, you could take your company’s knowledge base and use an ML model to create a vector embedding of each article. When a message comes into your bot, you could create an embedding from the incoming message and look up the closest article in your vector database then send the article and the user’s message to ChatGPT to get a context-aware response. Sure, there are ways to do something similar in existing SQL databases, but vector dbs can still blow them out of the water in terms of storage requirements and performance. But most importantly, at least one person on Twitter says that they will replace SQL databases:

Tensor and vector databases will replace most legacy databases in this decade. A disruption fueled by natural language interfaces and deep neural representations. In other words:
Natural query languages (NQL) replace the lstructured query language (SQL).
— Jo Kristian Bergum (@jobergum)
6:35 AM • Apr 27, 2023

…except that they won’t. Vector databases are great at giving inexact results to unstructured data that can be vectorized. They are not going to power your CRM, your accounting application, your ERP, or your spreadsheets. Just like SQL did with document databases and graph databases, SQL will probably absorb the best parts of vector databases, and in 5 years, you’ll be storing embeddings in your Postgres database along with your relational, JSON, and graph data.

🎂 SQL for the next fifty years

SQL is not going away, but it is not going stale either. SQL’s path forward is to support new paradigms, as it has done with JSON/document data and with graph data. Before JSON support was standardized, several databases (most notably, Postgres) experimented with JSON in the form of extensions or first-class features. When people used it in anger, they optimized it, and eventually JSON support was standardized. I believe that this will continue to happen, even in light of the current AI hype cycle. The new vector databases are fascinating and - I believe - genuinely novel, but they will bring value to the SQL ecosystem, not kill it.

GIF of Jea-Luc Picard as Locutus the Borg, with the caption "Assimilate"n

Jean-Luc’s Borg implants in the year 2366 are probably still running SQLite.

SQL is a wheel. Don’t reinvent it. Don’t discard it. Build better vehicles on top of it.

Enjoy what you read?

Consider signing up for more weekly Data Spores.

¹ This quote, although popularly attributed to American author Mark Twain, is not something he ever said. It still makes for a hell of a quote.