Partitions of the analytical view do not have to match the partitions of the underlying base table. However, when refreshing the analytical views, having the same partitions significantly reduces computational time and memory usage in many scenarios.
Consider the following case:
default.events
.
CREATE EXTERNAL TABLE default.events ( event_id STRING, app_id STRING, time TIMESTAMP, user_id STRING, device_id STRING, platform STRING, sdk_version STRING, version_code INT ) PARTITIONED BY ( year INT, month INT, day INT ) location '/warehouse/sales/events';
events_aview1
is not partitioned. Therefore, it cannot be refreshed incrementally. Any changes to the base table data results in a full refresh of the analytical view, and this recomputes the entire associated query. For very large base tables, it may cause serious problems, such as 'out of memory' errors.
create analytical view events_aview1 stored as parquet as (select count(*), platform, sdk_version from events group by 2, 3);
The analytical view events_aview2
partitions the data along the same dimensions as the base table, and in the same order. This facilitates incremental refresh of the analytical view. Only the partitions of the analytical view that correspond to changed partitions of the base table must be recomputed.
create analytical view events_aview2 partitioned by (year, month, day) stored as parquet as (select count(*), platform, sdk_version, year, month, day from events group by 2, 3, 4, 5, 6);