This article describes how to deploy Arcadia Enterprise on an Amazon EMR Cluster.
After completing the prerequisites for EMR installation, run the following commands on AWS:
Extract the EMR deployment package that contains all Arcadia Enterprise scripts and binaries. Use the following command:
tar -xf ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1.tar.gz
Copy the entire deployment package to an Amazon S3 bucket. A bucket exclusively used for Arcadia deployments and backups. It is important to sync the package in a folder with the same name as the package name. This helps our deployment scripts to find the installable binaries.
aws s3 sync ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1
s3://arc-emr-test-alex/ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1
The scripts extract the following three files and install them on the machine that runs the AWS CLI.
./ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1/local/config_template.json
./ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1/local/run_emr_arc
./ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1/local/sample_deployment.conf
You may choose to delete the other files, as they are not required on the AWS CLI machine.
Complete the config_template.json
template file, or create your
own template that uses the values required for the deployment. At the minimum,
we recommend you to enter the following values in the template:
LOCATION s3://...
as a Hive clause.If using an external hive metastore, fill out the Hive-site section in the template.
Remove this section from the template if you are not using an external metastore.
Arcadia Enterprise works even if you do not configure an external hive metastore.
Execute the
run_emr_arc
command from the top level directory,
and enter the relevant information from the template in the EMR wizard.
This generates an AWS CLI command which deploys Arcadia Enterprise on
EMR.
When choosing the instance type, note our recommendations for Sizing in the Prerequisites for Deploying Arcadia Enterprise on Amazon EMR
The CLI wizard has the following format:
[arcuser@local]$ ./run_emr_arc
Provisioning EMR with Arcadia Enterprise <version>
Arcadia Enterprise Version?: <version>
S3 bucket which contains the directory[]: <bucket-name>
s3://arc-emr-test-alex Access Key? []: <Access key of S3 bucket>
s3://arc-emr-test-alex Secret Key? []: <Secret key of S3 bucket>
Location of additional configurations to supply (file://)? []: <Path of the file generated from the config_template.json template file>>
Instance count?: <Number of EMR nodes>
Instance type?: <AWS EC2 instance type>
AWS ssh key to use for ec2 instances? []: <Preconfigured SSH Key to use for EMR spawned instances>
Cluster Name?: <Name of Arcadia Enterprise EMR Cluster>
Skip deployment and save aws cli output to a file (y/n)?:<Yes/No>
[arcuser@local]$ ./run_emr_arc
Provisioning EMR with Arcadia Enterprise <version> (ver. ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1)
Arcadia Enterprise Version? [ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1]:
S3 bucket which contains the directory ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1? []: arc-emr-test-alex
s3://arc-emr-test-alex Access Key? []: AKIAJOAJINGNOKZVPWNQ
s3://arc-emr-test-alex Secret Key? []: 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
Location of additional configurations to supply (file://)? []: file://./config.json
Instance count? [3]:
Instance type? [m5.2xlarge]:
AWS ssh key to use for ec2 instances? []: devops
Cluster Name? [Arcadia Cluster ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1]:
Skip deployment and save aws cli output to a file (y/n)?[n]:n
j-29M75IXAOT0LP
[Optional]
If you have an EMR deployment script of your own and want to use the
generated bootstrap action as a part of a larger EMR deployment, you can
skip the following deployment step, and save the AWS CLI command to the
following output file,
emr_deployment.txt
:
Skip deployment and save aws cli output to a file (y/n)? [n]:y
Deployment command saved in emr_deployment.txt
Run the deployment script saved in the emr_deployment.txt file.
[arcuser@local]$ ./run_emr_arc -f "./sample_deployment.conf" -d
Provisioning EMR with Arcadia Enterprise (script ver. ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1)
Loading configuration file: ./sample_deployment.conf
ARCADIA-ENTERPRISE Version: ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1
INSTALL BUCKET: emr-deployment-bucket
CONFIG PATH: file:///tmp/config.json
INSTANCE COUNT: 3
INSTANCE TYPE: m4.xlarge
SSH KEY NAME: devops
CLUSTER NAME:[Arcadia Cluster ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1]
j-29M75IXAOT0LP
. This ID can
be used to check the status of the EMR cluster deployment and Arcadia installation
from the EMR console.Login to the EMR console in AWS. The cluster status changes from
Starting
to Bootstrapping
. Wait for the
instance to build. This process may take a few minutes. Arcadia Enterprise
deployment is complete, when the Waiting message appears.
Click Arcadia Cluster
ARCADIA-ENTERPRISE-4.5.0.1_1547851339-1.amzn1
to view the summary
and configuration details of the Arcadia Cluster.
The following image shows the summary and configuration details of the Arcadia Cluster:
To check the status of a bootstrapping EMR cluster, view the details in the
/tmp/bootrap.log
file on each node. By default, the EMR
deployment command generated by run_emr_arc
archives EMR logs
to the install bucket.
For more information on troubleshooting an EMR cluster, see Troubleshoot a Cluster.