Recommendations for Arcadia Catalog Tables

Arcadia Enterprise recommends that you follow these recommendations when using Arcadia Catalog tables:

Data Persistence on Arcadia Catalog Tables

Because Arcadia does not store catalog tables in persistent data storage, we recommend that you do not access the arcadia_catalog.completed_queries table directly for analytical operations. Instead, you must create a corresponding parquet table, and periodically update it with entries from the arcadia_catalog.completed_queries file. See Best Practices for Arcadia Catalog Tables.

To achieve this, create a job in the ArcViz job scheduler; it will look something like the following query:

insert into completed_queries_parq select * from arcadia_catalog.completed_queries where
        end_time > <max_end_time_from_completed_queries_parq_table>'. 

There are multiple benefits to this approach:

  • It is possible to create Analytical views on the completed_queries_parq table. In contrast, catalog tables do not support analytical views.
  • To run a query against arcadia_catalog.completed_queries table, Arcadia Engine reads profiles from the local file system of ArcEngine data nodes. Using a parquet table ensures better query performance, and also guarantees persistent storage for the data of interest.

Data Persistence on Hive and Impala Catalog Tables

In general, the catalog tables do not persist any data. When you query these tables through Impala or Hive, they do not return any results. Therefore, ArcEngine never allows queries against catalog tables to execute on the fallback engine; such queries always execute in ArcEngine.

Supporting Tab Characters in Data Inside Beeswax

When the Beeswax protocol connects Arcadia shell (arcadia-shell) or other connections to ArcEngine, it uses tabs to separate record fields. This becomes problematic when tab characters appear in the data, resulting in an excess of fields. Arcadia shell subsequently detects this excess of fields, switches to tab delimited mode, and stops "pretty-printing" the output. When using the CREATE TABLE AS SELECT command, it defaults to delimiting rows by new line characters.

ArcViz uses the HS2 protocol,

To work around these inconsistencies, follow our recommendations:

  • When running a CREATE TABLE AS SELECT command against arcadia_catalog tables, specify the STORED AS PARQUET option. For example,

    CREATE TABLE completed_queries_copy 
      STORED AS PARQUET 
      AS SELECT * 
      FROM arcadia_catalog.completed_queries;
  • Arcadia shell may display the following message:

    Prettytable cannot resolve string columns values that have embedded tabs. 
    Reverting to tab delimited text output.

    This means that pretty printing is turned off because a tab character in the data created row-width inconsistency. This does not affect the query results. However, the printing is turned off.

  • When planning to write query results to a file through Arcadia shell, or when using third-party tools that connect through the beeswax interface, we recommend that you remove tab characters using the regex_replace function, as in the following example:

    SELECT regexp_replace(query_status, '\t', ' ') 
      AS query_status 
      FROM arcadia_catalog.completed_queries;