Consider that we have the following data table, where each row tracks actions of cell phone users: the application they are using, time o the action, the user id, device id, type of the device, and so on.
CREATE EXTERNAL TABLE default.events (
event_id STRING,
app_id STRING,
time TIMESTAMP,
user_id STRING,
device_id STRING,
platform STRING,
sdk_version STRING,
version_code INT
)
PARTITIONED BY (
year INT,
month INT,
day INT
)
location '/warehouse/events';
In this section, we demonstrate typical queries that analysts would expect to run against the
events
table, and how a single analytical view can accelerate the
performance of these queries.
We are using the analytical view av_event_counts
, based on the
events
table:
CREATE ANALYTICAL VIEW av_event_counts partitioned by (year, month, day)
STORED AS PARQUET AS
SELECT name, sum(1), year, month, day
FROM events group by 1, 3, 4, 5;
This query asks for a number of events for each day.
SELECT year, month, day, name, sum(1)
FROM events
GROUP BY 1, 2, 3, 4
ORDER BY 5 DESC
LIMIT 10;
This query would successfully route to the analytical view av_event_counts
.
Arcadia Engine sorts the data from the analytical view, and then applies the
LIMIT
clause.
This query asks for event counts by year. Additionally, it requests that the results be arranged by event name, in descending order.
SELECT year, name, sum(1)
FROM events
GROUP BY by 1, 2
ORDER BY 2 DESC
LIMIT 10;
This query would successfully route to the analytical view
av_event_counts
. Arcadia Engine rolls-up the data from the analytical view,
recomputes the sums, and orders the data as specified by the query.
This query requests data for the specified year (2014) and month (6).
SELECT year, name, sum(1)
FROM events
WHERE year = 2014 and month = 6 group by 1, 2
ORDER BY 2 desc limit 10;
This query would successfully route to the analytical view
av_event_counts
. Arcadia Engine applies the year and month predicates to
the data from the analytical view. Because the analytical view is partitioned on year and
month, and the query uses these predicates, Arcadia Engine can prune the unnecessary
partitions.
In this query, the concat_ws function builds a STRING
from the component
year, month and day columns. The CASE
statement demonstrates the use of a
complex expression in the SELECT
list.
SELECT concat_ws('-', cast(year as string), cast(month as string), cast(day as string)) my_date,
CASE when day between 1 and 7 then sum(1) * -0.5
when day between 8 and 15 then sum(1)
else sum(1) * 2
END AS my_sum
FROM events
GROUP BY year, month, day
ORDER BY 2 desc
LIMIT 10;
This query would successfully route to the analytical view
av_event_counts
. Arcadia Engine computes the date string and the complex
expression in the select list based on data retrieved from the analytical view.
This query extends the query in Build expressions by filtering out negative sums.
SELECT concat_ws('-', cast(year as string), cast(month as string), cast(day as string)) my_date,
CASE when day between 1 and 7 then sum(1) * -0.5
when day between 8 and 15 then sum(1)
else sum(1) * 2
END AS my_sum
FROM events
GROUP BYyear, month, day
HAVING my_sum > 0
ORDER BY by 2 desc
LIMIT 10;
This query would successfully route to the analytical view
av_event_counts
. Arcadia Engine applies the function in the
HAVING
predicate, my_sum > 0
, to the to the data
retrieved from the analytical view.