Deploying Arcadia Enterprise on EMR Cluster

Deploying Arcadia Enterprise on an Amazon EMR Cluster.

Developer Notes:
  • You can only run a single process ArcViz on EMR.
  • For complete Arcadia Enterprise functionality, it is mandatory to perform post-installation tasks for EMR. See Post Installation Tasks for EMR.

After completing the prerequisites for EMR installation, run the following commands on AWS:

  1. Download the Arcadia Enterprise EMR deployment package provided by Arcadia Data support team to a local directory. See Arcadia Enterprise Deployment Package.
  2. Extract the EMR deployment package, which has all the Arcadia Enterprise scripts and binaries, using the following command:
    tar -xf ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1.tar.gz
  3. Copy the entire deployment package to an Amazon S3 bucket. A bucket exclusively used for Arcadia deployments and backups. It is important to sync the package in a folder with the same name as the package name. This helps our deployment scripts to find binaries to install.

    aws s3 sync ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1 
    s3://arc-emr-test-alex/ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1
  4. The following three files that are extracted from the package, are used on the machine running AWS CLI commands.

    (Optional) Delete rest of the files as they are not required on the AWS CLI machine.

    ./ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1/local/config_template.json
    ./ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1/local/run_emr_arc
    ./ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1/local/sample_deployment.conf
  5. Fill out the config_template.json template or create your own template with the values required for the deployment. At the minimum, we recommend you to enter the following values in the template:

    • Specify the access and secret keys for storing data in S3 buckets when using EMRFS (EMR File System). These access and secret keys are used when specifying LOCATION s3://... as a Hive clause.
    • If you need to configure an external hive metastore, fill out the Hive-site section in the template. Remove the section from the template if you are not using this information. If you do not configure an external hive metastore, Arcadia Enterprise will still work.
  6. To deploy Arcadia Enterprise on EMR, you can use either of the following methods:
    1. Execute run_emr_arcCommand

      Execute the run_emr_arc command from the top level directory, and enter the relevant information from the template in the EMR wizard. This generates an AWS CLI command which deploys Arcadia Enterprise on EMR. The CLI wizard has the following format:

      [arcuser@local]$ ./run_emr_arc
      Provisioning EMR with Arcadia Enterprise <version>
      Arcadia Enterprise Version?: <version>
      S3 bucket which contains the directory[]: <bucket-name> 
      s3://arc-emr-test-alex Access Key? []: <Access key of S3 bucket>
      s3://arc-emr-test-alex Secret Key? []: <Secret key of S3 bucket>
      Location of additional configurations to supply (file://)? []: <Path of the file generated from the config_template.json template file>>
      Instance count?: <Number of EMR nodes>
      Instance type?: <AWS EC2 instance type>
      AWS ssh key to use for ec2 instances? []: <Preconfigured SSH Key to use for EMR spawned instances>
      Cluster Name?: <Name of Arcadia Enterprise EMR Cluster>
      Skip deployment and save aws cli output to a file (y/n)?:<Yes/No>
      For example:
      [arcuser@local]$ ./run_emr_arc
      Provisioning EMR with Arcadia Enterprise <version> (ver. ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1)
      Arcadia Enterprise Version? [ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1]:
      S3 bucket which contains the directory ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1? []: arc-emr-test-alex
      s3://arc-emr-test-alex Access Key? []: AKIAJOAJINGNOKZVPWNQ
      s3://arc-emr-test-alex Secret Key? []: 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
      Location of additional configurations to supply (file://)? []: file://./config.json
      Instance count? [3]:
      Instance type? [m4.xlarge]:
      AWS ssh key to use for ec2 instances? []: devops
      Cluster Name? [Arcadia Cluster ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1]:
      Skip deployment and save aws cli output to a file (y/n)?[n]:n
      j-29M75IXAOT0LP

      [Optional] If you have an EMR deployment script of your own and want to use the generated bootstrap action as a part of a larger EMR deployment, you can skip the following deployment step, and save the AWS CLI command to the following output file, emr_deployment.txt:

      Skip deployment and save aws cli output to a file (y/n)? [n]:y
      Deployment command saved in emr_deployment.txt
    2. Run Deployment Script

      Run the deployment script saved in the emr_deployment.txt file.

      [arcuser@local]$ ./run_emr_arc -f "./sample_deployment.conf" -d
      Provisioning EMR with Arcadia Enterprise (script ver. ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1)
      Loading configuration file: ./sample_deployment.conf
      ARCADIA-ENTERPRISE Version: ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1
      INSTALL BUCKET: emr-deployment-bucket
      CONFIG PATH: file:///tmp/config.json
      INSTANCE COUNT: 3
      INSTANCE TYPE: m4.xlarge
      SSH KEY NAME: devops
      CLUSTER NAME:[Arcadia Cluster ARCADIA-ENTERPRISE-5.0.0.0_1547496452-1.amzn1]
  7. If you do not skip the deployment, AWS CLI output shows the EMR cluster ID at the end. In the above example, EMR ID is j-29M75IXAOT0LP. This ID can be used to check the status of the EMR cluster deployment and Arcadia installation from the EMR console.
  8. Login to the EMR console in AWS. The cluster status changes from Starting to Bootstrapping. Wait for the instance to build. This process may take a few minutes. Arcadia Enterprise deployment is complete, when the Waiting message appears.

  9. Click Arcadia Cluster ARCADIA-ENTERPRISE-4.5.0.1_1547851339-1.amzn1 to view the summary and configuration details of the Arcadia Cluster.

    Displaying EMR console in AWS with 'Arcadia Cluster' status as 'Starting'
    Display Arcadia Cluster Status

    The following image shows the summary and configuration details of the Arcadia Cluster:

    Displaying EMR console in AWS with summary and configuration details of the Arcadia Cluster
    Display Summary and Configuration Details of Arcadia Cluster

    To check the status of a bootstrapping EMR cluster, view the details in the /tmp/bootrap.log file on each node. By default, the EMR deployment command generated by run_emr_arc archives EMR logs to the install bucket.

    For more information on troubleshooting an EMR cluster, see Troubleshoot a Cluster.

  10. To verify your Arcadia Enterprise installation on EMR, connect to ArcViz.