Updating ArcEngine Metadata Automatically with Hive Notification

When you enable the Hive Notification feature, all changes made in Hive propagate automatically to the ArcEngine.

Consider the following two scenarios:

  • Request against Arcadia Engine
    1. DDL Request 1 goes directly to the Arcadia Engine, which processes the change.
    2. Arcadia Engine notifies Hive of changes in metadata storage.

    This behavior is in place at all times, even with Hive Notification disabled.

  • Request against Hive
    1. DDL Request 2 goes directly to Hive, which processes the change.
    2. When configured to work with a message queue server, Hive issues a notification of the change.
    3. Message Queue Server listens for metadata changes on Hive, and sends relevant change notifications to Arcadia Engine.

    This behavior is only possible when you explicitly enable the notification feature.

The architecture of Hive Notification with Message Queue Server

To enable this feature, perform the following tasks:

  1. Setting up Message Queue Server
  2. Configuring Hive with MQ Server
  3. Configuring ArcEngine with MQ Server

To review the possible effect of Hive notification on various DDL and DML changes, see Notification Types.

To understand the limitations of this feature, see Limitations of the Hive Notification Mechanism.

Setting up Message Queue Server

You must install an active Message Queue (MQ) server that listens for relevant notifications and routes them to ArcEngine.

Follow these steps to set up the MQ server:

  1. Log into the Cloudera machine as arcadia user.
  2. Download the activemq MQ files from the following URL: http://activemq.apache.org/activemq-5153-release.html.

    This site offers both Windows and Unix/Linux/Cygwin distributions.

  3. Unzip the downloaded file.

    For Unix/Linux/Cygwin platforms, run the following statement on the command line. This sets up the listener for Hive notifications on port 61616, by default:
    > bin/activemq start

Configuring Hive with MQ Server

After you successfully install the MQ server, you must configure Hive to point to it.

Please contact support@arcadiadata.com for help with this task.

Configuring ArcEngine with MQ Server

Cloudera Manager

In the Arcadia Catalog Cache Advanced Configuration Snippet (Safety Valve) for flagfile, add the following information:

arcadia_hive_notify_mq_host=localhost
arcadia_hive_notify_mq_port=61616
arcadia_invalidate_metadata_on_restart=true

Ambari Stacks

In Ambari Stacks, set the following parameters:

  • Set Hive Notification MQ Host option to localhost.
  • Set Hive Notification MQ Port option to 61616.
  • Add to Optional Parameters for Arcadia Catalog Cache option the following information: arcadia_invalidate_metadata_on_restart=true.

Notification Types

Automatic Hive Notification supports the following types of events:

CREATE TABLE

A CREATE TABLE statement produces a notification to ArcEngine as an invalidate metadata table_name command.

DROP TABLE
The DROP TABLE statement produces a notification to ArcEngine as an invalidate metadata table_name command. Additionally, any analytical view that use the table becomes invalid after a query attempts to use it.
LOAD DATA … INTO TABLE table_name [PARTITION …]
INSERT INTO table_name …

When LOADing data to a table, either as rows or as an entire partition, or when INSERTing rows into a table, the notification to ArcEngine appears as a refresh table_name command. The next time ArcEngine routes a query to analytical views based on that table, it marks the views as stale, unless the analytical views refresh before then, through a manual or scheduled refresh.

ALTER TABLE table_name ...

Depending on the nature of the ALTER TABLE statement, there may or may not be an impact on ArcEngine. Some of the following statements only affect the metadata.

  • RENAME

    The RENAME table_name clause is equivalent to creating a new table, and issuing an invalidate metadata command. This affects not just analytical views, but also datasets and any dependant datasets and apps. Arcadia Enterprise does not support ALTER TABLE RENAME command.

  • SET TBLPROPERTIES

    Some table properties affect ArcEngine.

  • ADD SERVEPROPERTIES

    This clause deals with processing data, and does not affect ArcEngine.

  • CLUSTER BY

    This clause does not affect the directories where the data resides, so does not affect ArcEngine.

  • SKEWED/ NOT SKEWED

    This clause applies to future storage of data, so it has no effect on ArcEngine.

  • SKEWED LOCATION

    Similarly, this clause does not affect the directories where the data resides, only the future data, so it can be ignored.

  • NOT STORED AS DIRECTORIES

    Does not affect directories where the data resides, only the future data, so it can be ignored.

  • ADD CONSTRAINT or DROP CONSTRAINT

    These clauses do not affect data, and may be ignored.

  • CHANGE

    The change (column_name) clause results in an invalidate metadata table_name notification. This notification also leads to invalidating analytical views associated with the entire table, not just the column.

  • ADD PARTITION, DROP PARTITION, or PARTITION ... RENAME TO

    These clauses generate notifications to refresh table_name. The next time ArcEngine routes a query to analytical views based on that table, it marks the views as stale, unless the analytical views refresh before then, through a manual or scheduled refresh.

  • RECOVER PARTITIONS or MSCK REPAIR TABLE

    These clauses require a change to the metadata, and result in a refresh table_name notification to ArcEngine.

  • ARCHIVE PARTITION or UNARCHIVE PARTITION

    This clause removes files, and therefore affects data. It results in a refresh table_name notification to ArcEngine.

  • SET FILEFORMAT

    This clause does not affect data, and so it does not issue notifications..

  • TOUCH

    This clause does not affect data, and so it does not issue notifications.

  • ENABLE NO_DROP or DISABLE NO_DROP

    This clause does not affect data, and so it does not issue notifications.

  • ENABLE or DISABLE OFFLINE

    This clause send a refresh table_name notification to ArcEngine.

  • COMPACT

    This clause does not affect data, and so it does not issue notifications.

  • CONCATENATE

    This clause does not affect data, and so it does not issue notifications.