Creating Joins

This article demonstrates how to create new data joins in a dataset.

To demonstrate how to create new data joins, we used dataset Flight Delays, based on data previously imported into Arcadia from the datafile flights-2014.zip

The additional datafiles are airline-id.csv, airport-codes.csv, cancellation-code.csv, airport-lat-long.csv, and state-abbreviations.csv.

The following steps demonstrate how to make new joins:

  1. On the main navigation bar, click Data.

    Click DATA on main navigation bar

    The Data view appears, open on the Datasets tab.

    Main landing page of DATA
  2. Create a new dataset based on the datafile flights_2014.csv (compressed).

  3. Find the dataset in the list of datasets, either by scrolling or by using search, and click on it.

    Dataset side navigation appears, open at Dataset Detail view.

  4. In the side navigation menu, click Data Model.

    The Data Model view appears, and shows the name of the only table in the dataset. You may click Show Data to display the data of that table.

    data model, showing sample data for one table
    Data Model for Dataset 'Flight Delays', with a Singe Table and Sample Data Showing
  5. Click Edit Data Model to edit the data model.

    Editing the Data Model
    Editing Data Model
  6. Click the sign on the table representation.

    Clicking the table to add a Table Join
    Adding a Table Join

    The Table Browser modal window appears.

  7. In the Table Browser modal window, make the following selections:

    • In the Database Name selector, choose the database documentation.

      Note that you can join tables from different databases. This value is pre-populated to match the dataset's existing table, but it may be changed.

    • In the Table Name selector, choose the table name airline_id.

      This value is pre-populated to match the existing table of the dataset, but it may be changed.

    • Click Select.
    Selecting a table for a join
    Selecting a Table to Join

    The Edit Join modal window appears.

  8. In the Edit Join modal window, the following options are available:

    1. [Optional] Click Clear Fields to clear all already defined joins between the two tables.
    2. [Optional] Click sample data to preview the data. Click again to hide sample data.
    3. [Optional] Click Add Join Pair to add another column connection between the same two tables.
    4. [Optional] Click Add Join Expression to add a join between the two tables based on a custom SQL expression.
    5. [Optional] Click icon (minus) to remove an existing join pair or an existing join expression.
    6. [Optional] Under Join Expressions, click the text box to open the Join Expression interface. There, specify or update a custom SQL expression that defines the join conditions.
    7. Click Apply to save the changes.
    Viewing options in the 'Edit Join' Interface
    Edit Join Interface
  9. In the Edit Join modal window, do the following:
    • Select the matching columns for both tables. On the left side, select the field airlineid. On the right side, select the field code.
    • [Optional] Click Sample Data to view some data in both columns, and verify that the join makes sense. Click again to hide sample data.
  10. Click Apply.

    Specifying the join key
    Specifying the Join Key
  11. Repeat the previous two steps to create seven more joins as follows:

    • The table airport_codes has two joins to the main table, and you must create each join separately as follows:

      • Left column origin = right column code.
      • Left column dest = right column code.
    • The table cancellation_code has a join for left column cancellationcode = right column code.
    • The table airport_lat_long has two joins to the main table, and you must create each join separately as follows:
      • Left column origin = right column locationid.
      • Left column dest = right column locationid.
    • The table state_abbreviations has two joins to the main table, and you must create each join separately:
      • Left column deststate = right column abbreviation.
      • Left column originstate = right column abbreviation.

      This step is optional, and depends on whether your flights_* table has fully extended state names.

  12. Click Save.
    Data model with 8 joins to the primary table
    Data Model with Distinct Joins
  13. [Optional] Click (link) icon to edit joins or to change join type.