CREATE ANALYTICAL VIEW; Partition-Based

To take advantage of Smart Acceleration, Eager Aggregation, and other performance enhancements native to Arcadia Engine, you must create analytical views.

This article describes how to create partition-based analytical views. For sequence-based analytical views, see CREATE ANALYTICAL VIEW; Sequence-Based.

Developer Note. Before creating analytical views, consider the restrictions on Impala Identifiers.

The CREATE ANALYTICAL VIEW command has the following syntax.

This topic contains the following sections:

Syntax

CREATE ANALYTICAL VIEW [if not exists] database_name.analytical_view_name 
  [COMMENT table_comment]
  [PARTITIONED BY (column_name, ... )]
  STORED AS PARQUET
  [LOCATION location]
  AS select_statement;

Parameters

database_name
The name of the database where the analytical view is stored. If not specified, the system creates the analytical view in the current database.
analytical_view_name
The name of the new analytical view
table_comment
The optional comment that describes the analytical view
column_name
The partitioning column(s) of the base table
location
The HDFS directory path, or S3 bucket name, that stores analytical view data
select_statement

The actual statement that describes the structure of the analytical view:

SELECT columns FROM table [WHERE conditions] [GROUP BY g_columns] [ORDER BY o_columns ASC|DESC]

Notes

COMMENT
You can add an optional comment that describes the purpose of the analytical view, and provides other information.
PARTITIONED BY
Note that PARTITIONED BY columns must be the final columns in the SELECT list.
STORED AS PARQUET
Parquet file format provides the best compression and performance.
AS
The AS select_statement mandatory clause defines the structure of the analytical view.

Example

Simple Partition-Based Analytical View

Here is an example usage of create analytical view. Note the select(...) statement becomes associated with the analytical view.

CREATE ANALYTICAL VIEW events_aview
  STORED AS PARQUET
  AS (SELECT count(*), platform, sdk_version
         FROM events
         GROUP BY platform, sdk_version);