SQL’s EXISTS and NOT EXISTS: A Comprehensive Guide

Lasha Dolenjashvili
5 min readJun 7, 2023

--

Photo by George Dolgikh: https://www.pexels.com/photo/opened-white-and-red-gift-box-1303084/

Today, we’re going to dive deep into two powerful SQL constructs: EXISTS and NOT EXISTS. Both are incredibly useful when dealing with subqueries and relational data.

What are EXISTS and NOT EXISTS?

The EXISTS operator is used to test for the existence of any record in a subquery. If the subquery returns at least one record, the EXISTS condition is true; if the subquery returns no records, the EXISTS condition is false.

The NOT EXISTS operator is the exact opposite. It checks if a subquery returns any records. If the subquery returns no records, the NOT EXISTS condition is true; if the subquery returns records, the NOT EXISTS condition is false.

Syntax

The syntax for EXISTS and NOT EXISTS is:

-- EXISTS
SELECT column_name(s)
FROM table_name
WHERE EXISTS (subquery);
-- NOT EXISTS
SELECT column_name(Ss)
FROM table_name
WHERE NOT EXISTS (subquery);

EXISTS in Action

Suppose we have two tables: Customers and Orders. We want to find customers who have placed at least one order. We could use the EXISTS operator as follows:

SELECT customer_id, customer_name 
FROM Customers
WHERE EXISTS (
SELECT 1 FROM Orders
WHERE Customers.customer_id = Orders.customer_id
);

In this query, we’re selecting customer_id and customer_name from the Customers table where there are corresponding records in the Orders table for that customer_id.

NOT EXISTS in Action

On the other hand, if we want to find customers who have not placed any orders, we could use the NOT EXISTS operator:

SELECT customer_id, customer_name 
FROM Customers
WHERE NOT EXISTS (
SELECT 1 FROM Orders
WHERE Customers.customer_id = Orders.customer_id
);

This time, we’re selecting customer_id and customer_name from the Customers table where there are no corresponding records in the Orders table for that customer_id.

Use Cases

Identifying active users: Using EXISTS, you can find users who have logged in or interacted with your website.

SELECT user_id FROM Users 
WHERE EXISTS (
SELECT 1 FROM UserActivity
WHERE Users.user_id = UserActivity.user_id
);

Identifying inactive users: Using NOT EXISTS, you can identify users who have never logged in or interacted with your website.

SELECT user_id FROM Users 
WHERE NOT EXISTS (
SELECT 1 FROM UserActivity
WHERE Users.user_id = UserActivity.user_id
);

Finding products that have been sold: With EXISTS, you can find products that have sales records.

SELECT product_id FROM Products 
WHERE EXISTS (
SELECT 1 FROM Sales
WHERE Products.product_id = Sales.product_id
);

Finding products with no sales: Using NOT EXISTS, you can identify products that have never been sold.

SELECT product_id FROM Products 
WHERE NOT EXISTS (
SELECT 1 FROM Sales
WHERE Products.product_id = Sales.product_id
);

NOT EXISTS vs NOT IN

Similar to EXISTS and IN, NOT EXISTS and NOT IN are often comparable in functionality. However, NOT IN can yield unexpected results when NULL values are involved. NOT IN will return an empty set if any NULL values are part of the subquery result, regardless of other values. NOT EXISTS doesn’t get tripped up by NULL values and behaves as expected.

In SQL, a NULL value essentially means “unknown.” It’s not equivalent to zero or an empty string; it’s a special marker used to indicate that a data value doesn’t exist in the database.

When we use the NOT IN operator in SQL, it returns TRUE if a value is not in the list of values provided or returned by a subquery. However, when NULLs come into play, things get a bit more complicated.

Consider the following query:

SELECT customer_id FROM Customers 
WHERE customer_id NOT IN (SELECT customer_id FROM Orders);

In this query, if the subquery (SELECT customer_id FROM Orders) returns a list that includes some NULLs, then the NOT IN operation gets tricky. Why? Because when SQL compares a value to NULL using a comparison operator like “=”, “<>” or “NOT IN”, the result is not TRUE or FALSE but rather NULL (unknown).

Let’s say we have a customer_id of ‘123’. SQL is essentially checking if ‘123’ is NOT IN the list (which includes NULLs). However, when it compares ‘123’ to a NULL in the list, SQL cannot definitively say that ‘123’ is NOT IN the list, because NULL is an unknown value. So, the result is not TRUE, and the record for customer ‘123’ would not be included in the final result set. This happens regardless of whether ‘123’ appears in the list or not. In the presence of NULL, NOT IN essentially returns no results.

On the other hand, when using NOT EXISTS, the situation is different. The NOT EXISTS operator returns TRUE if the subquery returns no rows, and it doesn’t get tripped up by NULL values. It only checks for the existence of rows in the subquery result, without comparing specific values, which makes it a safer choice when NULL values may be involved.

The above behavior of the NOT IN operator in SQL can vary across different database systems and SQL dialects. However, it is generally a common behavior. In most SQL dialects, including popular ones like Oracle, MySQL, and SQL Server, the presence of NULL values in the list used with NOT IN can cause this behavior. The result is often no records returned, even if the value being compared exists in the list but is also accompanied by NULLs.

EXISTS/NOT EXISTS vs LEFT JOIN

When you want to find rows in one table that do or do not have matches in another table, you might consider using a LEFT JOIN or EXISTS/NOT EXISTS.

A LEFT JOIN followed by a “WHERE … IS NULL” clause can be used to find rows from the left table that have no matching rows in the right table. However, this approach can be less efficient than using NOT EXISTS, particularly on large datasets. This is because a LEFT JOIN produces a result set that combines all records from both tables, which can be costly in terms of memory and processing time. In contrast, a NOT EXISTS query stops processing as soon as it finds a matching record.

Performance Considerations and Best Practices

  1. Avoid NULL Pitfalls: If your subquery may return NULL values, favor EXISTS/NOT EXISTS over IN/NOT IN to avoid unexpected results.
  2. Consider Query Efficiency: EXISTS/NOT EXISTS often outperform IN/NOT IN and LEFT JOIN, especially on large datasets. This is because EXISTS/NOT EXISTS stop processing as soon as they find a match, while IN/NOT IN and LEFT JOIN process the entire dataset.

Thanks for reading this article. If you found it interesting, consider giving it some claps 👏 to show your support. If you have any questions or comments, feel free to leave them below.

You can connect with me on Linkedin.

--

--

Lasha Dolenjashvili
Lasha Dolenjashvili

Written by Lasha Dolenjashvili

Data Solutions Architect with a proven track record of delivering solutions that provide long-term value and competitive advantage.

No responses yet