Because Arcadia does not store catalog tables in persistent data storage, we recommend that
you do not access the arcadia_catalog.completed_queries
table directly for
analytical operations. Instead, you must create a corresponding parquet table, and
periodically update it with entries from the
arcadia_catalog.completed_queries
file. See Best Practices for Arcadia Catalog Tables.
To achieve this, create a job in the ArcViz job scheduler; it will look something like the following query:
insert into completed_queries_parq select * from arcadia_catalog.completed_queries where
end_time > <max_end_time_from_completed_queries_parq_table>'.
There are multiple benefits to this approach:
completed_queries_parq
table. In contrast, catalog tables do not support analytical views.arcadia_catalog.completed_queries
table, Arcadia
Engine reads profiles from the local file system of ArcEngine data nodes. Using a parquet
table ensures better query performance, and also guarantees persistent storage for the
data of interest.In general, the catalog tables do not persist any data. When you query these tables through Impala or Hive, they do not return any results. We recommend that you issue queries against catalog tables through a separate data connection, which does not have the fallback feature enabled.
When the Beeswax protocol connects Arcadia shell (arcadia-shell) or other connections to
ArcEngine, it uses tabs to separate record fields. This becomes problematic when tab
characters appear in the data, resulting in an excess of fields. Arcadia shell subsequently
detects this excess of fields, switches to tab delimited mode, and stops "pretty-printing"
the output. When using the CREATE TABLE AS SELECT
command, it defaults to
delimiting rows by new line characters.
ArcViz uses the HS2 protocol,
To work around these inconsistencies, follow our recommendations:
When running a CREATE TABLE AS SELECT
command against
arcadia_catalog
tables, specify the STORED AS
PARQUET
option. For
example,
CREATE TABLE completed_queries_copy
STORED AS PARQUET
AS SELECT *
FROM arcadia_catalog.completed_queries;
Arcadia shell may display the following message:
Prettytable cannot resolve string columns values that have embedded tabs.
Reverting to tab delimited text output.
This means that pretty printing is turned off because a tab character in the data created row-width inconsistency. This does not affect the query results. However, the printing is turned off.
When planning to write query results to a file through Arcadia shell, or when using third-party
tools that connect through the beeswax interface, we recommend that you remove tab
characters using the regex_replace
function, as in the following
example:
SELECT regexp_replace(query_status, '\t', ' ')
AS query_status
FROM arcadia_catalog.completed_queries;