Firstly, Let’s create a table called personnel and stored in BigQuery.
CREATE TABLE tutorial.personnel ( name STRING, age INT64, salary FLOAT64 ); INSERT INTO tutorial.personnel(name, age, salary) VALUES ('John', 30, 50000.00), ('Alice', 25, 60000.00), ('Bob', 35, NULL), ('Emma', NULL, 70000.00), (NULL, 28, 55000.00), ('Michael', 40, NULL), ('Sara', 33, 65000.00), (NULL, NULL, NULL), ('David', 45, 75000.00), ('Sophia', 27, 58000.00);
Row | name | age | salary |
---|---|---|---|
1 | null | null | null |
2 | Bob | 35 | null |
3 | Michael | 40 | null |
4 | Emma | null | 70000.0 |
5 | Alice | 25 | 60000.0 |
6 | Sophia | 27 | 58000.0 |
7 | John | 30 | 50000.0 |
8 | Sara | 33 | 65000.0 |
9 | null | 28 | 55000.0 |
10 | David | 45 | 75000.0 |
CASE
In BigQuery, the CASE
statement is used to perform conditional logic within SQL queries. It allows you to evaluate conditions and return different values based on those conditions.
For example, we can set threshold to evaluate the salary.
- If the salary is above 65000, it is high level.
- If the salary is between 55000 and 65000, it is middle level.
- If it is under 55000, it is the low level.
- The rest, we set it as others.
SELECT name, salary, CASE WHEN salary >= 65000 THEN 'high' WHEN salary < 65000 and salary >= 55000 THEN 'middle' WHEN salary < 55000 THEN 'low' ELSE 'others' END AS salary_level FROM `instruction-415216.tutorial.personnel`
Row | name | salary | salary_level |
---|---|---|---|
1 | null | null | others |
2 | Bob | null | others |
3 | Michael | null | others |
4 | Emma | 70000.0 | high |
5 | Alice | 60000.0 | middle |
6 | Sophia | 58000.0 | middle |
7 | John | 50000.0 | low |
8 | Sara | 65000.0 | high |
9 | null | 55000.0 | middle |
10 | David | 75000.0 | high |
IF
The IF
function is used to evaluate a condition and return one value if the condition is true, and another value if the condition is false.
For example, I want to create a new column called is_salary, to verify whether the salary data is not missing. If the data is not missing, then return true; otherwise, return false.
SELECT salary, IF(salary IS NOT NULL, TRUE, FALSE) AS is_salary FROM tutorial.personnel;
Row | salary | is_salary |
---|---|---|
1 | null | false |
2 | null | false |
3 | null | false |
4 | 70000.0 | true |
5 | 60000.0 | true |
6 | 58000.0 | true |
7 | 50000.0 | true |
8 | 65000.0 | true |
9 | 55000.0 | true |
10 | 75000.0 | true |
IFNULL
The IFNULL
function is used to return a specified value if the expression is NULL, otherwise, it returns the expression itself.
For example, I want to impute the missing value with 0 in salary column.
- If the element is not null, then nothing changes;
- If it is null, then fill it with value 0.
SELECT name, salary, IFNULL(salary, 0) AS salary_imputed FROM tutorial.personnel;
Row | name | salary | salary_imputed |
---|---|---|---|
1 | null | null | 0.0 |
2 | Bob | null | 0.0 |
3 | Michael | null | 0.0 |
4 | Emma | 70000.0 | 70000.0 |
5 | Alice | 60000.0 | 60000.0 |
6 | Sophia | 58000.0 | 58000.0 |
7 | John | 50000.0 | 50000.0 |
8 | Sara | 65000.0 | 65000.0 |
9 | null | 55000.0 | 55000.0 |
10 | David | 75000.0 | 75000.0 |
NULLIF
The NULLIF
function is used to compare two expressions. If the expressions are equal, it returns NULL, otherwise, it returns the first expression.
SELECT NULLIF(5, 5) as result # the two are identical, it returns NULL
/*--------*
| result |
+--------+
| NULL |
*--------*/
SELECT NULLIF(5, 2) as result # the two are not identical, returns the first exp
/*--------*
| result |
+--------+
| 5 |
*--------*/