Manage data with CLI in BigQuery

The bq command line tool is part of Google Cloud SDK and is used for interacting with Google BigQuery. We can use it to manage BigQuery resources such as datasets, tables, jobs, and queries from the command line.

Initialize bq tool
Create resources such as datasets, tables, or views
Add a file to your project
Copy a table or data between tables
List resources such as projects, datasets, tables, or jobs
Display details of a table
Execute a SQL query
Load data from Google Cloud Storage into BigQuery
Export data from BigQuery to Google Cloud Storage
Delete resources such as datasets, tables, or views

Initialize bq tool

bq init

This command initializes the bq tool by authenticating the user and setting up the default project.

###@cloudshell:~ (instruction-415216)$ bq init
It looks like you are trying to run "/usr/bin/../lib/google-cloud-sdk/bin/bootstrapping/bq.py init".
The "init" command is no longer needed with Google Cloud CLI.
To authenticate, run gcloud auth.
Really run this command? (y/N) y

Welcome to BigQuery! This script will walk you through the process of initializing your .bigqueryrc configuration file.

First, we need to set up your credentials if they do not already exist.

Setting project_id instruction-415216 as the default.

BigQuery configuration complete! Type "bq" to get started.

Create resources such as datasets, tables, or views

bq mk dataset dataset_id: Creates a dataset
bq mk --table dataset_id.table_id schema.json: Creates a table with a specified schema

# create a dataset called new_dataset
###@cloudshell:~ (instruction-415216)$ bq mk new_dataset
Dataset 'instruction-415216:new_dataset' successfully created.

# create a table called new_table
###@cloudshell:~ (instruction-415216)$ bq mk --table new_dataset.new_table
Table 'instruction-415216:new_dataset.new_table' successfully created.

bigquery webui examine 1 — BigQuery WebUI examine

Add a file to your project

curl -LO [url]

For example, I want to add a file from http://www.ssa.gov/OACT/babynames/names.zip and load it into BigQuery. The new dataset called babynames.

# create a new dataset called babynames
bq mk babynames
# add a file to my project
curl -LO http://www.ssa.gov/OACT/babynames/names.zip
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 7232k  100 7232k    0     0  19.0M      0 --:--:-- --:--:-- --:--:-- 19.0M

# unzip file
unzip names.zip

#The bq load command creates or updates a table and loads data in a single step.
bq load babynames.names2022 yob2022.txt name:string,gender:string,count:integer

Upload complete.
Waiting on bqjob_r4b51bf84c2814453_0000018e70e00ae6_1 ... (1s) Current status: DONE

Copy a table or data between tables

For example, I want to copy the table names2022 to the table new_table. The two tables belong to different datasets.

The source table is names2022 and the destination table is new_table. To copy tables, we need to declare the full address of the source and destination tables.

bq cp project_id:dataset_id.source_table_id project_id:dataset_id.destination_table_id

 bq cp instruction-415216:babynames.names2022 instruction-415216:new_dataset.new_table
Waiting on bqjob_r10fd005afbad0ef8_0000018e70e40df0_1 ... (0s) Current status: DONE   
Table 'instruction-415216:babynames.names2022' successfully copied to 'instruction-415216:new_dataset.new_table'

List resources such as projects, datasets, tables, or jobs

bq ls                           #List all the datasets in your project

bq ls project_id                # Lists datasets in a specific project

bq ls project_id:dataset_id     #Lists tables in a specific dataset

#list all the datasets in your project
bq ls
   datasetId   
 ------------- 
  babynames  
  new_dataset  

# list all the datasets in the public dataset 
bq ls bigquery-public-data:
                 datasetId                  
 ------------------------------------------ 
  america_health_rankings                   
  austin_311                                
  austin_bikeshare                          
  austin_crime                              
  austin_incidents                          
  austin_waste                              
  baseball                                  
  bbc_news                                  
  bigqueryml_ncaa                           
  bitcoin_blockchain                        
  blackhole_database                        
  blockchain_analytics_ethereum_mainnet_us  
  bls                                       
  bls_qcew                                  
  breathe                                   
  broadstreet_adi                           
  catalonian_mobile_coverage                
  catalonian_mobile_coverage_eu             
  census_bureau_acs                         
  census_bureau_construction                
  census_bureau_international               
  census_bureau_usa                         
  census_opportunity_atlas                  
  census_utility                            
  cfpb_complaints                           
  chicago_crime                             
  chicago_taxi_trips                        
  clemson_dice                              
  cloud_storage_geo_index                   
  cms_codes                                 
  cms_medicare                              
  cms_synthetic_patient_data_omop           
  country_codes                             
  covid19_aha                               
  covid19_covidtracking                     
  covid19_ecdc                              
  covid19_ecdc_eu                           
  covid19_genome_sequence                   
  covid19_geotab_mobility_impact            
  covid19_geotab_mobility_impact_eu         
  covid19_google_mobility                   
  covid19_google_mobility_eu                
  covid19_govt_response                     
  covid19_italy                             
  covid19_italy_eu                          
  covid19_jhu_csse                          
  covid19_jhu_csse_eu                       
  covid19_nyt                               
  covid19_open_data                         
  covid19_open_data_eu  

# list all tables in the dataset
bq ls instruction-415216:new_dataset
tableId     Type    Labels   Time Partitioning   Clustered Fields  
 ------------ ------- -------- ------------------- ------------------ 
  new_table    TABLE                                                  
  penguins     TABLE

Display details of a table

bq show project_id:dataset_id.table_id

I want to show the details of a table called shakespeare in a dataset called samples that stored in the public dataset

bq show bigquery-public-data:samples.shakespeare
Table bigquery-public-data:samples.shakespeare

   Last modified                  Schema                 Total Rows   Total Bytes   Expiration   Time Partitioning   Clustered Fields   Total Logical Bytes   Total Physical Bytes   Labels  
 ----------------- ------------------------------------ ------------ ------------- ------------ ------------------- ------------------ --------------------- ---------------------- -------- 
  14 Mar 17:16:45   |- word: string (required)           164656       6432064                                                           6432064               485656                         
                    |- word_count: integer (required)                                                                                                                                        
                    |- corpus: string (required)                                                                                                                                             
                    |- corpus_date: integer (required)

I want to show the details of a table called penguins in a dataset called new_dataset that stored in my project.

bq show instruction-415216:new_dataset.penguins
Table instruction-415216:new_dataset.penguins

   Last modified               Schema               Total Rows   Total Bytes   Expiration   Time Partitioning   Clustered Fields   Total Logical Bytes   Total Physical Bytes   Labels  
 ----------------- ------------------------------- ------------ ------------- ------------ ------------------- ------------------ --------------------- ---------------------- -------- 
  24 Mar 14:04:53   |- species: string              344          19012                                                             19012                 5080                           
                    |- island: string                                                                                                                                                   
                    |- bill_length_mm: float                                                                                                                                            
                    |- bill_depth_mm: float                                                                                                                                             
                    |- flipper_length_mm: integer                                                                                                                                       
                    |- body_mass_g: integer                                                                                                                                             
                    |- sex: string

I want to show only schema of babyname’s table.

bq show dataset_id.table_id

bq show babynames.names2022
Table instruction-415216:babynames.names2022

   Last modified         Schema         Total Rows   Total Bytes   Expiration   Time Partitioning   Clustered Fields   Total Logical Bytes   Total Physical Bytes   Labels  
 ----------------- ------------------- ------------ ------------- ------------ ------------------- ------------------ --------------------- ---------------------- -------- 
  24 Mar 14:31:58   |- name: string     31915        608048                                                            608048                165912                         
                    |- gender: string                                                                                                                                       
                    |- count: integer

Execute a SQL query

You can execute SQL query in the command line. You can use the legacy SQL and the standard SQL. For example, I want to use the standard SQL statement to find out how many “thee” words in the shakespeare table. The result is 42.

bq query --use_legacy_sql=false \
'SELECT
   Count(word)
 FROM
   `bigquery-public-data`.samples.shakespeare
 WHERE
   word = "thee"'

+-----+
| f0_ |
+-----+
|  42 |
+-----+

Another example: I want to the top 5 popular female names in 2022

bq query --use_legacy_sql=false \
'SELECT * FROM `instruction-415216.babynames.names2022`
WHERE gender = "F" 
ORDER BY count DESC
LIMIT 5'
+-----------+--------+-------+
|   name    | gender | count |
+-----------+--------+-------+
| Olivia    | F      | 16573 |
| Emma      | F      | 14435 |
| Charlotte | F      | 12891 |
| Amelia    | F      | 12333 |
| Sophia    | F      | 12310 |
+-----------+--------+-------+

Load data from Google Cloud Storage into BigQuery

bq --location=location load \   # specify the location
--source_format=format \        # specify the the data format
dataset.table \                 # specify the destination
path_to_source \                # specify the source
schema                          # specify the table schema

The schema can be a local JSON file, or it can be typed inline as part of the command. You can also use the --autodetect flag instead of supplying a schema definition.

Export data from BigQuery to Google Cloud Storage

bq extract --location=location \  # specify the location
--destination_format format \     # specify the data  format, e.g. CSV,JSON,PARQUET
--compression compression_type \  # specify the compression type
--field_delimiter delimiter \     # specify the boundary between columns in CSV 
--print_header=boolean \          # specify whether the data format supports headers, the default mode is true
project_id:dataset.table \
gs://bucket/filename              # specify the destination in GCS

Delete resources such as datasets, tables, or views

bq rm -r -f dataset_id

This command will delete a dataset and all its tables recursively. For example, if I want to delete the whole dataset called new_dataset, I can simply use bq rm -r -f new_dataset. Then all the resources in the dataset will be deleted.

Share on Social Media