BigQuery datasets: Views
This post explores the intricacies of managing logical and materialized views in BigQuery. Logical views, serving as virtual tables, simplify data access with SQL queries without storing data. Materialized views, meanwhile, cache query results for quicker access, updating automatically with base table changes.
We'll cover how to create, update, and effectively use these views in BigQuery, highlighting their roles in optimizing data analysis and query performance. Whether you're new to BigQuery or looking to refine your data management strategies, understanding these views is key to leveraging BigQuery's full analytical power.
Source:
Contents:
Logical views
A logical view is a virtual table in a database that presents data from one or more tables through a SQL query. It does not store data physically but provides a customized representation of data from underlying tables, allowing for simplified access and manipulation without altering the original data.
Create a view
You can create a view in BigQuery in the following ways:
Using the Google Cloud console.
Using the bq command-line tool's
bq mkcommand.Calling the
tables.insertAPI method.Using the client libraries.
Submitting a
CREATE VIEWdata definition language (DDL) statement.
CREATE VIEW your_dataset_name.view_name(column1, column2) AS (
SELECT
column1,
column2
FROM
source_dataset_name.source_table_name
WHERE
condition_column = 'YourCondition'
ORDER BY
order_column DESC
);Get information about views
You can list views in a dataset by querying the INFORMATION_SCHEMA.VIEWS in SQL or using corresponding commands in various languages supported by client libraries.
SELECT table_name
FROM DATASET_ID.INFORMATION_SCHEMA.VIEWS;Manage views
Update a view
After creating a view, you can update the following view properties:
SQL query
Expiration time
ALTER VIEW dataset_id.my_view
SET OPTIONS (
expiration_timestamp = TIMESTAMP('new_timestamp'));Description
ALTER VIEW dataset_id.my_view
SET OPTIONS (
description = 'new_description');Labels
Copy a view
Follow the official instructions to copy a view via console, bq or API.
Rename a view
You can only rename a view by copying it in the Google Cloud console. To do so, check the instructions for copying a view. The bq command-line tool, API, and client libraries don't allow renaming existing views; you'll need to recreate the view with the new name instead.
Delete a view
Use the Google Cloud console, bq command-line tool's bq rm command, or call the tables.delete API method.
DROP VIEW mydataset.myview;Only one view can be deleted at a time.
Set default expiration time at the dataset level or upon view creation for automatic deletion.
Deleting a view removes it from the authorized views list of the source dataset, which may take up to 24 hours.
Caution: This action cannot be undone. Recreating a view requires re-adding it to the authorized views list.
Associated permissions are also deleted; they must be manually reconfigured when recreating the view.
Note: Direct view recovery isn't possible, but the creation statement can be retrieved from audit logs using the log explorer or the projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2F activity path.
Limitations
Views are read-only; modifications are not allowed.
Views and referenced tables must be in the same location.
Views require dataset-qualified references; default datasets are not applicable.
The TableDataList JSON API cannot retrieve view data.
GoogleSQL and legacy SQL cannot be mixed in views.
Query parameters cannot be used in views.
View schemas are static; changes in underlying tables require view updates for accurate schema representation, but query results remain correct.
Legacy SQL views cannot be automatically updated to GoogleSQL; manual updates are needed via the Google Cloud console, bq command-line tool, or BigQuery Client libraries.
Temporary functions or tables cannot define a view.
Views cannot be used in wildcard table queries.
Materialized view
Materialized views in BigQuery are precomputed views that cache query results for improved performance. They're automatically updated with incremental changes from base tables, requiring no user maintenance. Queries utilizing materialized views are typically faster and consume fewer resources than those solely from base tables. Key characteristics include:
Zero maintenance: Materialized views update automatically with changes to base tables, requiring no user action.
Fresh data: Materialized views provide up-to-date data, retrieving directly from base tables if necessary.
Smart tuning: BigQuery optimizes queries by rerouting them to utilize materialized views when possible, enhancing performance.
Use cases
Materialized views optimize queries with high computation cost and small dataset results, benefiting processes like OLAP operations and ETL processes in BI pipelines.
They are valuable for:
Pre-aggregating data from streaming sources.
Pre-filtering data to read specific subsets of a table.
Pre-joining data, especially for large and small tables.
Reclustering data for queries that benefit from different clustering schemes than the base tables.
Compare to other BQ techniques
Interaction with Other BQ Features
Query Plan Explanation: The query plan shows materialized views scanned and bytes read from both materialized views and base tables.
Query Caching: Query results using materialized views can be cached, following usual limitations.
Cost Restriction: Queries exceeding maximum bytes billed fail without charge, regardless of materialized views or base tables.
Cost Estimation with Dry Run: A dry run estimates costs using available materialized views, aiding in testing query utilization.
Data freshness
Once created, the materialized view replica updates data based on the replication interval specified during creation.
Data freshness depends on:
Replication interval set in the CREATE MATERIALIZED VIEW AS REPLICA OF statement.
Refresh frequency of the source materialized view.
Refresh frequency of the metadata cache of the Amazon S3 table used by the materialized view.
Check freshness via the Google Cloud console:
For replica: Check "Last modified" in the replica's Details pane.
For source: Check "Last modified" in the materialized view's Details pane.
For Amazon S3 metadata cache: Check "Max staleness" in the materialized view's Details pane.
Limitations
Direct data manipulation operations like COPY, EXPORT, LOAD, WRITE, or DML statements aren't supported on materialized views.
You can't replace an existing materialized view with one of the same name.
View SQL cannot be updated after creation.
Materialized views must be in the same organization or project as their base tables.
Each base table can only be referenced by a limited number of materialized views.
Smart tuning considers only materialized views within the same dataset.
Materialized views have restricted SQL syntax and aggregation functions.
Nesting materialized views or querying external tables isn't supported.
Only the GoogleSQL dialect is supported.
Descriptions can be set for materialized views but not for individual columns.
Deleting a base table without deleting associated materialized views causes queries and refreshes to fail; recreation of the base table requires recreating the materialized view.
Limitations of materialized views over BigLake tables:
Partitioning of materialized views is unsupported, leading to full refreshes upon any base table deletion.
The -max_staleness option value must be higher than that of the BigLake base table.
Joins between BigQuery managed and BigLake tables aren't supported within a single materialized view definition.
Create a materialized view
Required permissions: bigquery.tables.create IAM permission.
SQL command
CREATE MATERIALIZED VIEW project_id.dataset.materialized_view_name AS (
query_expression
);Query support
Materialized views use a restricted SQL syntax. Queries must use the following pattern:
[ WITH cte [, …]]
SELECT [{ ALL | DISTINCT }]
expression [ [ AS ] alias ] [, ...]
FROM from_item [, ...]
[ WHERE bool_expression ]
[ GROUP BY expression [, ...] ]
from_item:
{
table_name [ as_alias ]
| { join_operation | ( join_operation ) }
| field_path
| unnest_operator
| cte_name [ as_alias ]
}
as_alias:
[ AS ] aliasQuery limitations for materialized views include:
Aggregate requirements:
Aggregates in the query must produce outputs; computing or filtering based on aggregated values isn't supported.
Supported aggregation functions include:
ANY_VALUE(but not overSTRUCT)APPROX_COUNT_DISTINCTARRAY_AGG(but not overARRAYorSTRUCT)AVGBIT_ANDBIT_ORBIT_XORCOUNTCOUNTIFHLL_COUNT.INITLOGICAL_ANDLOGICAL_ORMAXMINMAX_BY(but not overSTRUCT)MIN_BY(but not overSTRUCT)SUM
Unsupported SQL features:
Features such as left/right/full outer joins, self-joins, window functions, ARRAY subqueries, non-deterministic functions, user-defined functions, TABLESAMPLE, and FOR SYSTEM_TIME AS OF are not supported.
Access control restrictions:
Queries may fail if users attempt to access base table columns they don't have permissions for.
If users lack full access to all rows in the base tables, queries will run against the base tables instead of materialized view data, respecting access control constraints, including data masking.
WITH clause and CTEs
Materialized views support WITH clauses and common table expressions (CTEs), but they must adhere to the same limitations as materialized views without them.
For example, you can use a WITH clause in a materialized view to perform operations on the data before aggregating it. However, using a WITH clause with two GROUP BY clauses is not supported.
Partitioning materialized views
Partitioned materialized views align with the partitioning strategy of the underlying tables. This alignment provides benefits when queries access specific partitions frequently. Partitioning can improve performance and behavior, especially during data modifications or deletions.
If the base table is partitioned, the materialized view can also be partitioned using the same partitioning column. Granularity (hourly, daily, etc.) or range specifications must match exactly.
For tables partitioned by ingestion time, the materialized view can be grouped and partitioned by the _PARTITIONDATE column.
Consider partitioning the materialized view to reduce maintenance and query costs, especially if the base table is partitioned.
Partition expiration settings are inherited from the base table. Materialized view partitions expire synchronously with base table partitions. Note that non-partitioned materialized views based on tables with partition expiration require full refresh when a partition expires. Therefore, it's advisable to partition the materialized view to avoid additional costs.
Clustering materialized views
Cluster materialized views by their output columns, adhering to BigQuery's clustered table limitations. Aggregate output columns cannot serve as clustering columns. Adding clustering columns to materialized views can enhance query performance, especially for filtered queries.
Considerations when creating materialized views
When creating materialized views, tailor them to reflect common query patterns on the base tables. Avoid creating views for every query variation due to the limit of 20 views per table. Instead, focus on accommodating broader query sets. For instance, group and cluster by columns frequently used for filtering, and incorporate common date filters directly into the view definition.
For materialized views with JOINs:
Prioritize the most frequently changing table: Ensure the first/leftmost table in the view query is the largest or undergoes frequent changes. This optimizes incremental queries and refreshes.
Avoid joining on clustering keys: Materialized views with joins are best suited for heavily aggregated data or expensive join queries. For selective queries, BigQuery may efficiently handle joins without the need for a materialized view. Test queries with and without the materialized view to determine efficiency.
Query materialized view
Incremental updates: BigQuery updates materialized views incrementally by combining cached data with new data. For single-table views, this happens when the original table hasn't changed much or when only new data is added. But for views with multiple tables, it can only add new data to one table at a time. If the original data changes a lot or if many tables change at once, BigQuery goes back to using the original query instead of the cached view. Operations like changing data or expiration rules can trigger this. If a materialized view can't be updated bit by bit, it just waits until it's refreshed entirely.
Partition Alignment: When a materialized view is partitioned, BigQuery ensures each partition matches the base table's partitions. Data from a base table partition goes to the corresponding materialized view partition. If a base table partition changes, only the affected materialized view partitions are updated. However, materialized views with inner joins align with only one base table, so changes in any non-aligned base table affect the entire view.
Smart Tuning:
BigQuery automatically optimizes queries by utilizing materialized views when possible. This enhances query performance and reduces costs without altering results. However, querying doesn't automatically trigger a materialized view refresh. For a query to be rewritten, the materialized view must:
Belong to the same dataset as one of its base tables.
Use the same set of base tables as the query.
Include all columns and rows being read.
To check if a query was rewritten, inspect the query plan. If rewritten, the plan will include a READ step for the materialized view used.
Scheduled queries vs materialized views
Scheduled queries are suitable for periodic complex calculations but execute the full query each time without reusing previous results. They work well when freshest data isn't critical and you're tolerant of data staleness.
Materialized views are ideal for querying the latest data while reducing latency and cost by reusing previously computed results. They act like pseudo-indexes, speeding up queries without disrupting existing workflows.
Use materialized views whenever possible, especially if your calculations aren't overly complex.
Manage materialized views
Alter: Modify materialized views
ALTER MATERIALIZED VIEW project.dataset.materialized_view
SET OPTIONS (enable_refresh = true);List: View materialized views
SELECT * FROM project.dataset.INFORMATION_SCHEMA.TABLES
WHERE table_type = 'MATERIALIZED VIEW';Delete: Remove materialized views
DROP MATERIALIZED VIEW project.dataset.materialized_view;Caution: Deleting a materialized view cannot be undone.
Refresh: Update materialized views with latest data
Refreshing a materialized view updates its cached results to reflect the latest base table data. Automatic refresh is enabled by default, with intervals managed by BigQuery. Manual refresh allows immediate updates. Here's how to configure both:
Automatic Refresh:
By default, materialized views are automatically refreshed within 5 minutes of base table changes, up to every 30 minutes.
Enable or disable automatic refresh at any time.
Adjust the refresh frequency cap as needed.
Refreshes are performed on a best-effort basis, with timing subject to system capacity.
CREATE MATERIALIZED VIEW project.dataset.materialized_view OPTIONS (enable_refresh = true, refresh_interval_minutes = 60) AS SELECT ...
Manual Refresh:
Manually refresh a materialized view at any time.
Required permissions: bigquery.tables.getData, bigquery.tables.update, and bigquery.tables.updateData.
Use the BQ.REFRESH_MATERIALIZED_VIEW system procedure to update the view's data.
Avoid concurrent refreshes to prevent conflicts and ensure successful updates.
CALL BQ.REFRESH_MATERIALIZED_VIEW('project.dataset.materialized_view');


