The arcadia_catalog.completed_queries
table relies on the parsing profile
files that the arcengined
(ArcEngine daemon) process generates. These files
reside on the coordinator node, where the query ran originally. They are not replicated, and
take much longer to process than parquet
tables.
In scenarios that rely on extensive use of these tables, we strongly recommend that you extract this information into a parquet-formated destination table. Running regularly-scheduled extraction jobs to update this table improves the speed for analytics, replication, and durability; it also mitigates the persistence and performance issues we identify in the discussion on Limitations of Arcadia Catalog Tables.
For example, to copy data into a new table, use the following approach to extract information on completed tables into more permanent storage:
To create a parquet destination table (historical_queries
), we use the same
schema as arcadia_catalog.completed_queries
, and use an additional
partitioning column start_date
.
Issue the following command:
CREATE TABLE historical_queries
PARTITIONED BY (start_date)
SORT BY (start_time, end_time)
STORED AS PARQUET
AS SELECT *, to_date(start_time)
AS start_date
FROM arcadia_catalog.completed_queries;
Note that this statement fully populates the new table with all data in the
arcadia_catalog.completed_queries
table. Also, the SORT
BY
clause enables you to run more efficient queries by referencing
start_time
and end_time
columns.
To update the historical_queries
table periodically with new queries in the
arcadia_catalog.completed_queries
table, run the following
command:
INSERT INTO historical_queries
PARTITION (start_date)
SELECT *, to_date(start_time)
AS start_date
FROM arcadia_catalog.completed_queries
WHERE end_time > (SELECT max(end_time) FROM historical_queries);
The end_time
predicate copies all queries that ran after the previous
data upload into the historical_queries
table.
Note
that for simple comparison predicates greater-than, equal, or less-than
(>
, =
, <
) on the
end_time
column, Arcadia Enterprise optimizes queries
through predicate push down, allowing the arcengined
process to skip over
profile files that do not match the end_time
predicate.