Skip to content

Evaluate a model and make predictions with BigQuery ML

In the tutorial “Create a basic Regression model with BigQuery ML” we have discussed how to create a dataset and how to create a regression model, today we will go deeper, to learn how to evaluate a model and make predictions with SQL statement.

Evaluate a model

SELECT
  *
FROM
  ml.EVALUATE(MODEL `ga_sample.sample_model`, (
SELECT
  IF(totals.transactions IS NULL, 0, 1) AS label,
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(geoNetwork.country, "") AS country,
  IFNULL(totals.pageviews, 0) AS pageviews
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20170601' AND '20170801'));

Here we use ml.EVALUATE() function. It applies the model to the evaluation dataset to make predictions and compares these predictions to the actual target values in the evaluation dataset. Based on this comparison, it calculates various performance metrics to evaluate the model’s effectiveness, such as accuracy, precision, recall, F1 score, and more, depending on the type of machine learning task.

Typically, we can divide dataset into training and testing subsets using a split ratio of 80:20. In this particular instance, we perform the split based on timestamps.

Training data: the data between 2016/01/01 and 2017/05/31 is training data

Testing data: the data between 2017/06/01 and 2017/08/01 is testing data

Then we will see result like this:

precisionrecallaccuracyf1_scorelog_lossroc_auc
0.450980392156862750.102475247524752470.985030374204398870.167002823719241630.0464862562808372810.98130169830169833

Make Predictions

1. Predict purchases per country

SELECT
  country,
  SUM(predicted_label) as total_predicted_purchases
FROM
  ml.PREDICT(MODEL `ga_sample.sample_model`, (
SELECT
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(totals.pageviews, 0) AS pageviews,
  IFNULL(geoNetwork.country, "") AS country
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20160801' AND '20170801'))
GROUP BY country
ORDER BY total_predicted_purchases DESC
LIMIT 10;

Here we use ML.PREDICT() function, which is a powerful tool for deploying machine learning models in BigQuery. It allows you to easily apply the trained models to new data and obtain predictions without having to write complex prediction code.

In this case, I want to predict the top 10 countries with the highest total predicted purchases.

We will get result like this:


Row
countrytotal_predicted_purchases
1United States3700
2Canada101
3Venezuela66
4Japan35
5Taiwan27
6Australia26
7India20
8Singapore19
9Hong Kong19
10Mexico15

2. Predict purchases per user

SELECT
  fullVisitorId,
  SUM(predicted_label) as total_predicted_purchases
FROM
  ml.PREDICT(MODEL `ga_sample.sample_model`, (
SELECT
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(totals.pageviews, 0) AS pageviews,
  IFNULL(geoNetwork.country, "") AS country,
  fullVisitorId
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
GROUP BY fullVisitorId
ORDER BY total_predicted_purchases DESC
LIMIT 10;

In this case, I want to predict the top 10 users with the highest total predicted purchases.

We will get result like this:

RowfullVisitorIdtotal_predicted_purchases
194178574712951310454
21122883309288959422
329694186761262587982
421051223760168976292
512809936612043474502
683889310329550527462
774203005015230124602
803763940560921891132
980646251500335083962
108069922490326866502