Post-Installation Tasks for Amazon EMR

After deploying or upgrading Arcadia Enterprise on Amazon EMR, you must complete several post-installation tasks.

Perform these tasks after completing the installation process:

Deploy Hive-site on Core Nodes Running ArcEngine

This step is mandatory.

For complete Arcadia Enterprise functionality, it is mandatory to deploy Hive-site on core nodes running ArcEngine. The /etc/hive/conf/hive-site.xml file must be present on each core node in the cluster at the same location. EMR deployments do not push client configurations to core nodes, so this step must occur after Arcadia Enterprise deployment is completed and all arcengined services must be restarted. See Perform Ongoing Operations.

Backup Local Settings File

Backup any changes to Arcadia Service init scripts or settings_local.py file outside of the EMR cluster, whenever they are manually changed. When EMR clusters are terminated, all data on the systems are lost.

Arcadia boostrap.py deploys cron jobs on the EMR cluster for the following action:

Backing up SQLite database

Due to the ephemeral nature of EMR deployments, the default SQLite ArcViz metastore is periodically dumped and backed up in the installation folder in the S3 installation bucket. This dump can be used to recreate the SQLite database when attempting to recover from an EMR cluster failure or when switching to an EMR deployment with a newer Arcadia installation during upgrade.

Connect to an External ArcViz Metastore

We recommend that you connect to an external ArcViz metastore. Connecting a newer version of Arcadia Enterprise on EMR to an external metastore that contains an ArcViz metastore, automatically upgrades the ArcViz metastore at startup.

Perform Ongoing Operations

You can perform the following service actions on the core nodes to start, stop, or restart Arcadia Service:

service arcengined [status|start|stop|restart] 

You can perform the following service actions on the master node:

service catalogd [status|start|stop|restart]
service statestored [status|start|stop|restart]
service arcviz [status|start|stop|restart]

For general operational guidance, and assistance with maintaining the cluster, see AWS: Manage Clusters.