Using Partition-Based Analytical Views

A typical sequence of operations for analytical views is:

  1. Creating Base Tables
  2. Loading or Inserting Data into Base Tables
  3. Creating Analytical Views
  4. Refreshing Analytical Views (populating data into analytical views)
  5. Processing User Queries

Creating Base Tables

The following script is an example of how to create a base table events, which contains columns event_id, app_id, time, user_id, device_id, platform, sdk_version, and version_code. Note that the table is partitioned on year, month, and day columns.


               CREATE EXTERNAL TABLE default.events (
               event_id STRING, 
               app_id STRING,  
               time TIMESTAMP,
               user_id STRING, 
               device_id STRING,  
               platform STRING, 
               sdk_version STRING, 
               version_code INT     
               )
               
               PARTITIONED BY (
               year INT, 
               month INT, 
               day INT    
               )
               
               location '/warehouse/sales/events';
            

Loading or Inserting Data into Base Tables

Load the data into the base table in one of the following ways:

  • Issue an INSERT command.
  • Add files to HDFS path for the external table, and issue ALTER TABLE ADD PARTITION command whenever relevant.

Creating Analytical Views

This is an example analytical view. Let's call the query it represents QA.

CREATE ANALYTICAL VIEW events_aview
               STORED AS PARQUET
               LOCATION 'hdfs://my_warehouse.com:8020/user/warehouse/event.db/'
               AS (SELECT   count(*), platform, sdk_version
               FROM     events    
               GROUP BY platform, sdk_version);

Refreshing Analytical Views

To ensure valid results with analytical views, you must first check the state of analytical views by running the following command:

SHOW ANALYTICAL VIEWS;

If the state of the views is either STALE or UNUSABLE, refresh the analytical views, as described in REFRESH ANALYTICAL VIEW; Partition-Based. There are several options, depending on your business requirements; we describe them in Examples.

If the state of the analytical view is INVALID, recreate the analytical view after correcting its definition.

Processing User Queries

Arcadia Engine automatically routes queries to the analytical view, if it finds a match.

When we created the analytical view, we based it on query QA (see Creating Analytical Views). Now, let us consider other queries: QP, QD, and QPD.

Comparing Three Queries
QP QD QPD
This query is issued by a bar chart that shows totals of events, by platform. This query is issued by a line chart that shows totals of events, by day. This query is issued by a line chart that shows totals of events, by platform by day.
SELECT (*),  platform
               FROM events
               GROUP BY platform
SELECT count(*), day
                  FROM events
                  GROUP BY day
SELECT count(*), platform, day
                     FROM events
                     GROUP BY platform, day

QP matches QA.

It is processed using the analytical view.

QD does not match QA.

It is processed using the base table.

QPD does not match QA.

It is processed using the base table.

Note that while platform is in QA, day is not.