Redshift CASE WHEN: SQL Conditionals
Introduction
When using Amazon Redshift data warehouse, you may need to use conditional logic to query and transform your data. The ‘CASE WHEN’ expression helps evaluate conditions and return different results depending on if the conditions are true or false.
In this article, we’ll dive into the basics of using ‘CASE WHEN’ expressions in Redshift SQL. We’ll also cover ELSE statement during the discussion. You’ll learn the syntax, see examples, and understand common use cases. By the end, you’ll be equipped to leverage this essential SQL conditional logic in your own Redshift queries.
CASE WHEN Syntax
The general syntax for a CASE WHEN expression in Redshift is:
CASE WHEN condition1 THEN result1 WHEN condition2 THEN result2 ... ELSE default_result END
Here’s how it works:
- The CASE keyword signals the start of the conditional expression
- You specify one or more WHEN clauses that each contain a condition to evaluate
- For the first condition that evaluates to true, the corresponding THEN result is returned
- If none of the conditions are true, the ELSE default_result is returned
- The END keyword signals the end of the CASE expression
Let’s look at a simple example to illustrate. Imagine a “users” table with columns for id, name, and age. To categorize users as ‘child’, ‘adult’, or ‘senior’ based on age, you could use:
SELECT id, name, CASE WHEN age < 18 THEN 'child' WHEN age < 65 THEN 'adult' ELSE 'senior' END AS age_category FROM users;
This would evaluate each row, check the user’s age, and return the corresponding age_category.
Creating Example Data
To demonstrate more complex CASE WHEN examples, let’s first create some sample data to work with. Run the following SQL in Redshift to create an “orders” table:
CREATE TABLE orders ( order_id INT, customer_id INT, order_date DATE, total DECIMAL(10,2), status VARCHAR(20) ); INSERT INTO orders VALUES (1, 101, '2023-01-15', 150.00, 'completed'), (2, 102, '2023-02-10', 75.50, 'pending'), (3, 101, '2023-03-01', 200.00, 'completed'), (4, 103, '2023-03-05', 120.00, 'cancelled'), (5, 102, '2023-04-12', 300.00, 'completed');
This simple case expression creates an orders table with some sample records that we can now run CASE expressions on.
Using CASE to Derive New Columns
CASE statements can be used to create new columns in a dataset by categorizing existing data. When organizing orders by total amount, a CASE statement can set the rules for each group.
Orders under $100 are “Low Value”. Range of Orders between $100 and $500 are “Medium Value”. Orders over $500 are “High Value”. You can easily categorize data in a dataset by using a CASE statement, which helps with analysis and understanding.
SELECT order_id, customer_id, total, CASE WHEN total < 100 THEN 'small' WHEN total < 250 THEN 'medium' ELSE 'large' END AS order_size FROM orders;
This would return:
order_id | customer_id | total | order_size ---------------------------------------------- 1 | 101 | 150.00 | medium 2 | 102 | 75.50 | small 3 | 101 | 200.00 | medium 4 | 103 | 120.00 | medium 5 | 102 | 300.00 | large
Aggregating with CASE
CASE expressions are also very useful inside aggregate functions like SUM() and COUNT(). For example, to count the number of small, medium and large orders:
SELECT COUNT(CASE WHEN total < 100 THEN 1 END) AS small_orders, COUNT(CASE WHEN total >= 100 AND total < 250 THEN 1 END) AS medium_orders, COUNT(CASE WHEN total >= 250 THEN 1 END) AS large_orders FROM orders;
This would return:
small_orders | medium_orders | large_orders -------------------------------------------- 1 | 3 | 1
Within each COUNT(), the CASE returns 1 when the condition is met, which the COUNT() then sums up.
CASE in WHERE and HAVING Clauses
You can also use CASE expressions in WHERE and HAVING clauses to filter results based on conditional logic. For instance, to find all customers who have placed a ‘large’ order:
SELECT DISTINCT customer_id FROM orders WHERE CASE WHEN total >= 250 THEN 'large' ELSE 'not large' END = 'large';
This would return customer 102, since they are the only one with a ‘large’ order over $250.
Handling NULL Values
CASE expressions are also useful for handling NULL values. You can use CASE to replace NULLs with a default value, or to apply different logic when a value is NULL.
For example, consider if our “orders” table had a “shipped_date” column that could be NULL if the order hasn’t shipped yet. To return ‘Not Shipped’ for these orders:
SELECT order_id, status, CASE WHEN shipped_date IS NULL THEN 'Not Shipped' ELSE CAST(shipped_date AS VARCHAR(10)) END AS shipped FROM orders;
This checks if shipped_date IS NULL, and if so, returns ‘Not Shipped’. Otherwise, it casts the shipped_date to a string to return. We should note here that using NVL or COALESCE may be a better choice.
Tips for Using CASE WHEN
Here are a few tips to keep in mind when using CASE WHEN expressions in Redshift:
- CASE expressions can be nested within each other for more complex logic
- The ELSE clause is optional, and if omitted, NULL will be returned if no conditions match
- CASE expressions can be used almost anywhere in a SQL statement, including SELECT, WHERE, HAVING, GROUP BY, and ORDER BY clauses
- Be careful with data types – the results of a CASE expression should all be convertible to a common data type
Summary
The CASE WHEN expression is a versatile tool for applying conditional logic in Redshift SQL queries. You can use it to derive new columns, aggregate data, filter results, handle NULLs, and much more.
By understanding the syntax and common use cases, you’ll be able to write more powerful and efficient SQL to analyze your data in Redshift.
DataSunrise offers user-friendly and flexible tools for database security, audit, and compliance that integrate seamlessly with Amazon Redshift. Visit the DataSunrise website to request an online demo from our expert team.
Visit the DataSunrise website to request an online demo from our expert team.