How to use BigQuery for machine learning on Google Analytics data
BigQuery is a serverless, highly scalable, and cost-effective cloud data warehouse that makes it easy to analyze all your data very quickly. Google Analytics is a powerful tool that can help you track and analyze your website traffic. By combining the power of BigQuery and Google Analytics, you can use machine learning to gain valuable insights into your data and make better decisions for your business.
Here are some of the things you can do with machine learning on Google Analytics data in BigQuery:
Predict customer churn: You can use machine learning to predict which customers are likely to churn, so you can take steps to retain them.
Identify trends: You can use machine learning to identify trends in your data, such as which products are most popular or which pages are most visited.
Personalize your website: You can use machine learning to personalize your website for each individual user, based on their past behaviour.
Improve your marketing campaigns: You can use machine learning to improve your marketing campaigns, by targeting the right people with the right message at the right time.
We can use the bqml_ga4_gaming_propensity_to_churn notebook to get a comprehensive overview of how to start using machine learning on Google Analytics data in BigQuery.
To interact with the code, you must first create an access credentials file. To create a Service account credentials JSON file, please refer to the GCP documentation.
Authenticating with a service account key file. Please refer to the GCP documentation.
from google.cloud import bigquery from google.oauth2 import service_account # TODO(developer): Set key_path to the path to the service account key # file. # key_path = “path/to/service_account.json” credentials = service_account.Credentials.from_service_account_file( key_path, scopes=[“https://www.googleapis.com/auth/cloud-platform”], ) client = bigquery.Client(credentials=credentials, project=credentials.project_id,) |
Query the GA4 event data
#Write Query on BQ QUERY = “”” SELECT * FROM `ga4-demo.events_20210130` LIMIT 1 “”” #Run the query and write result to a pandas data frame Query_Results = bigquery_client.query(QUERY) df = Query_Results.to_dataframe() #View top few rows of result df.head() |
You have successfully run Bigquery on the notebook when you are able to query the results!
Please refer to the notebook to combine the label, demographic, and behavioural data into training data.
Training the propensity model with BigQuery ML
Train a logistic regression model
#Write Query on BQ QUERY = “”” CREATE OR REPLACE MODEL bqmlga4.churn_logreg OPTIONS( MODEL_TYPE=”LOGISTIC_REG”, INPUT_LABEL_COLS=[“churned”] ) AS SELECT * FROM bqmlga4.train “”” #Run the query and write result to a pandas data frame Query_Results = bigquery_client.query(QUERY) df = Query_Results.to_dataframe() #View top few rows of result df.head() |
Train an AutoML Tables model (optional)
AutoML Tables enables you to automatically build state-of-the-art machine learning models on structured data at massively increased speed and scale. AutoML Tables automatically searches through Google’s model zoo for structured data to find the best model for your needs, ranging from linear/logistic regression models for simpler datasets to advanced deep, ensemble, and architecture-search methods for larger, more complex ones.
You can train an AutoML model directly with BigQuery ML, as in the code below.
Note that the BUDGET_HOURS parameter is for AutoML Tables training, specified in hours. The default value is 1.0 hour and must be between 1.0 and 72.0. The total query processing time can be greater than the budgeted hours specified in the query.
Note: This may take a few hours to train.
%%bigquery –project $PROJECT_ID CREATE OR REPLACE MODEL bqmlga4.churn_automl OPTIONS( MODEL_TYPE=”AUTOML_CLASSIFIER”, INPUT_LABEL_COLS=[“churned”], BUDGET_HOURS=1.0 ) AS SELECT * EXCEPT(user_pseudo_id) FROM bqmlga4.train |
Model Evaluation
#Write Query on BQ QUERY = “”” SELECT * FROM ML.EVALUATE(MODEL bqmlga4.churn_logreg) “”” #Run the query and write result to a pandas data frame Query_Results = bigquery_client.query(QUERY) df = Query_Results.to_dataframe() #View top few rows of result df.head() |
The result is promising, and we can use a confusion matrix and ROC curve to further examine it in order to fine-tune the model later. Please refer to the notebook for more instructions.
Conclusion
BigQuery and Google Analytics can be used together to gain valuable insights into your data and make better decisions for your business. Machine learning can be used to predict customer churn, identify trends, personalize your website, and improve your marketing campaigns. The bqml_ga4_gaming_propensity_to_churn notebook provides a comprehensive overview of how to start using machine learning on Google Analytics data in BigQuery.
For more information on how to use BigQuery for machine learning, please refer to the following resources:
Admazes helped their client build up their GCP Marketing solution by using BigQuery to:
Store and analyze large amounts of data from multiple sources.
Identify trends and patterns in customer behaviour.
Create targeted marketing campaigns.
Measure the effectiveness of marketing campaigns.
Contact us and start the journey to a more powerful and flexible analytics platform.