Routing and Compensation Examples

Consider that we have the following data table, where each row tracks actions of cell phone users: the application they are using, time o the action, the user id, device id, type of the device, and so on.

CREATE EXTERNAL TABLE default.events (
        event_id STRING,
        app_id STRING,
        time TIMESTAMP,
        user_id STRING,
        device_id STRING,
        platform STRING,
        sdk_version STRING,
        version_code INT
      )
      PARTITIONED BY (
        year INT,
        month INT,
        day INT
      )

location '/warehouse/events';

In this section, we demonstrate typical queries that analysts would expect to run against the events table, and how a single analytical view can accelerate the performance of these queries.

We are using the analytical view av_event_counts, based on the events table:

CREATE ANALYTICAL VIEW av_event_counts partitioned by (year, month, day) 
  STORED AS PARQUET AS 
  SELECT name, sum(1), year, month, day 
  FROM events group by 1, 3, 4, 5;
    

Event Counts by Day

This query asks for a number of events for each day.

SELECT year, month, day, name, sum(1) 
  FROM events 
  GROUP BY 1, 2, 3, 4
  ORDER BY 5 DESC
  LIMIT 10;

This query would successfully route to the analytical view av_event_counts. Arcadia Engine sorts the data from the analytical view, and then applies the LIMIT clause.

Event counts by year ordered by name

This query asks for event counts by year. Additionally, it requests that the results be arranged by event name, in descending order.

SELECT year, name, sum(1) 
  FROM events 
  GROUP BY by 1, 2 
  ORDER BY 2 DESC 
  LIMIT 10;

This query would successfully route to the analytical view av_event_counts. Arcadia Engine rolls-up the data from the analytical view, recomputes the sums, and orders the data as specified by the query.

Events counts for specific year and month

This query requests data for the specified year (2014) and month (6).

SELECT year, name, sum(1) 
FROM events 
WHERE year = 2014 and month = 6 group by 1, 2 
ORDER BY 2 desc limit 10;

This query would successfully route to the analytical view av_event_counts. Arcadia Engine applies the year and month predicates to the data from the analytical view. Because the analytical view is partitioned on year and month, and the query uses these predicates, Arcadia Engine can prune the unnecessary partitions.

Build expressions

In this query, the concat_ws function builds a STRING from the component year, month and day columns. The CASE statement demonstrates the use of a complex expression in the SELECT list.

SELECT concat_ws('-', cast(year as string), cast(month as string), cast(day as string)) my_date,
CASE when day between 1 and 7 then sum(1) * -0.5
     when day between 8 and 15 then sum(1)
     else sum(1) * 2
END AS my_sum
FROM events
GROUP BY year, month, day
ORDER BY 2 desc
LIMIT 10;

This query would successfully route to the analytical view av_event_counts. Arcadia Engine computes the date string and the complex expression in the select list based on data retrieved from the analytical view.

Filter out negative sums

This query extends the query in Build expressions by filtering out negative sums.

SELECT concat_ws('-', cast(year as string), cast(month as string), cast(day as string)) my_date,
CASE when day between 1 and 7 then sum(1) * -0.5
     when day between 8 and 15 then sum(1)
     else sum(1) * 2
END AS my_sum
FROM events
GROUP BYyear, month, day
HAVING my_sum > 0
ORDER BY by 2 desc
LIMIT 10;

This query would successfully route to the analytical view av_event_counts. Arcadia Engine applies the function in the HAVING predicate, my_sum > 0, to the to the data retrieved from the analytical view.