How to make your SQL Query more efficient?

Crafting an effective SQL query is like mastering a culinary art – it’s about blending efficiency, data manipulation, and key principles for top-notch database performance, sustainability, and crystal-clear clarity.

Don’t worry if this sounds a bit complex at first. Think of it as a familiar scenario: you fire off a query and end up waiting so long that you can practically brew, sip, and savor a cup of coffee. Or worse, your computer suddenly sounds like it’s running a marathon due to the query gobbling up every ounce of CPU and memory. And then there’s the nightmare scenario: a poorly written query not only throws off your database’s performance, but can even crash the entire server – yikes!

To dodge these SQL pitfalls and ensure a smoother ride through the world of database management, let’s dive into some practical tips and tried-and-true practices that I’ve gathered from my own experience. Your SQL journey is about to get a whole lot smoother.

1.Use the preview option

If you just want a quick overview of your data, you should use the preview option to check the data and its schema. It returns the result instantly and costs nothing.

2.Avoid SELECT *

When writing queries, it would be better to select only the columns that you need, rather than SELECT *. SELECT * retrieves unnecessary data besides that it increases the network traffic used for your queries.

In best practice, you can apply a LIMIT clause, a WHERE clause or a HAVING clause to a SELECT * query does not affect the amount of data read.
You also can use SELECT * EXCEPT statement to retrieve data. This method enables you to eliminate unwanted columns and significantly reduce the volume of data processed by your query.

Example 1
SELECT gender, age, name 
FROM `Employees`
WHERE Salary > 5000;

Example 2
SELECT * EXCEPT (col1, col4, col6, col8)

3.Use Partitioned tables

If your database holds or will hold a lot of historical data, think about using partitioned tables instead of sharded tables (also called date-named tables) when designing it.

A good tip is to not create too many sharding tables. This is because queries that need to gather data from multiple sharding tables can get complicated and slow. This happens because data has to be collected from different places and maybe even put together. Handling a large number of sharding tables can also become tricky, leading to more work to keep everything running smoothly, potential issues, and extra work to keep things going.

When it comes to performance, partitioned tables tend to work better than date-named tables.

For ingestion-time partitioned tables, use the pseudo-column _PARTITIONTIME
For filtering time partitioned tables, use _PARTITIONTIME to specify a date or range of dates

Example 3
SELECT col1 , col2, col3 FROM `purchase`
WHERE _PARTITIONTIME
BETWEEN TIMESTAMP("20230101") AND TIMESTAMP("20230601")

4. Reduce data before JOIN

The JOIN is very resource-intensive, especially when dealing with large datasets. Joining large tables without reducing data first can lead to high memory and CPU usage, potentially causing performance bottlenecks and slowing down the entire database server. Therefore, by reduce data before joining, you can decrease the number of rows that need to be processed and shorten query execution times.

In best practice, you can use filter, aggregation, subquery and CTE to significantly reduce data before using JOIN

Filtering: Apply appropriate filtering conditions (using WHERE clauses) to limit the number of rows that need to be joined.
Aggregation: Perform aggregation operations like SUM, COUNT, AVG, etc., before joining, so you’re working with summarized data.
Subqueries: Use subqueries to pre-select relevant data before joining, which can result in more efficient processing.
Temporary Tables or Common Table Expressions (CTEs): Create temporary tables to hold reduced data subsets that you’ll then join in your main query.

Example 4
# create a temporary table named comments and reduce the data precisely
WITH
  comments AS (
  SELECT
    user_id,
    COUNT(text) AS comments_count
  FROM
    table1
  WHERE
    user_id IS NOT NULL
  GROUP BY user_id
  ORDER BY comments_count DESC
  LIMIT 10
  )

SELECT
  user_id,
  display_name,
  comments_count
FROM comments
JOIN
  table2 AS u
ON
  user_id = u.id
ORDER BY comments_count DESC;

5. Place ORDER BY at the outermost

You should put the complex operations to the end of the query, otherwise they can easily cause very fast break off your table. The best partner of ORDER BY clause is LIMIT clause, especially when you are ordering a very large number of values but don’t need to have all of them returned.

Example 5
SELECT sales_by_category   
FROM   `myproject.mydatabase.sales`
ORDER BY sales_sum DESC  
LIMIT 10;

Conclusion

In the real world, there’s a bunch of ways to make your queries run smoother and faster. Out of these, the five methods we’ve talked about are the ones you’ll likely use the most. They’re practical and hands-on, and they can really give your SQL performance a boost!

Share on Social Media