Prerequisites for Deploying Arcadia Enterprise on Amazon EMR

Arcadia Enterprise has several prerequisites to deploying on Amazon EMR.

Before starting the Arcadia Enterprise deployment process on Amazon EMR, obtain the following:

EMR Access Policy

An AWS Identity and Access Management (IAM) account to configure permissions.

This account must have an AmazonElasticMapReduceFullAccess policy attached, available here: IAM Managed Policy for Full Access. You must also specify that the install script and procedure use these credentials.

After installation, Arcadia Data software no longer requires these escalated permissions.

AWS CLI
The AWS Command Line Interface (AWS CLI) installed and configured with the admin account. See Installing the AWS CLI.
S3 Bucket Policy

At least one S3 bucket for storing installation files, SQLite backups, and access and secret keys with the following permissions:

{
     "Effect": "Allow",
     "Action": [ "s3:*" ],
     "Resource": [
        "arn:aws:s3:::__ARCADIA_BUCKET__",
        "arn:aws:s3:::__ARCADIA_BUCKET__/*"]
}

See Creating and Configuring an Amazon S3 Bucket.

SSH Key
The name of a pre-configured SSH Key to use for EMR spawned instances.
Instance Types
Arcadia Data recommends general-purpose instance types, such as m5.2xlarge. Consult with our support team (mailto:support@arcadiadata.com) to determine optimal sizing requirements for your workload and configuration.
Other Buckets
(Optional) Other buckets and appropriate access and secret keys for EMRFS (EMR File System) data. See Work with Storage and File Systems.
Metastore
(Optional) We highly recommend you to configure an external hive metastore. See Using an External Database.