Migrating Analytical Views

During normal operations, it may become necessary to create back-ups of analytical views, and to migrate them from one environment to another. Currently, this is supported through the command line interface.

ArcViz creates several artifacts, including analytical views (aggregates). You may also create analytical views on the command line. Arcadia stores these objects as tables directly in the HDFS, making the metadata available through the Hive metastore.

The following steps demonstrate how to migrate analytical views.

  1. Log into ArcEngine through the beeline in the source system.

  2. Specify the data source.

    [localhost:31050] > use tvdata;
  3. Run show [all] analytical views to retrieve the list of all analytical views.

    [localhost:31050] > show analytical views;

    The output that appears is similar to the following:

    +--------------------------------------------------------------------------------------------------------+
    | name                         |base                    | select                              | state    |
    +--------------------------------------------------------------------------------------------------------+
    | tvdata.tv_viewing_data_av_1  | tvdata.tv_viewing_data | SELECT hour((TA_0.ts)) '_c0', TA_0. | FRESH    |
    |                              |                        | channel channel, TA_0.program progr |          |
    |                              |                        | am, sum(1) '_c3', datestring datest |          |
    |                              |                        | ring FROM tvdata.tv_viewing_data TA |          |
    |                              |                        | _0 GROUP BY 1, 2, 3, 5              |          |
    | tvdata.tv_viewing_data_av_10 | tvdata.tv_viewing_data | SELECT TA_0.channel channel, TA_0.p | UNUSABLE |
    |                              |                        | rogram program, sum(1) '_c2', dates |          |
    |                              |                        | tring datestring FROM tvdata.tv_vie |          |
    |                              |                        | wing_data TA_0 GROUP BY 1, 2, 4     |          |
    | tvdata.tv_viewing_data_av_11 | tvdata.tv_viewing_data | SELECT TA_0.channel channel, TA_0.p | FRESH    |
    |                              |                        | rogram program, sum(1) '_ c2', date |          |
    |                              |                        | string datestring FROM tvdata.tv_vi |          |
    |                              |                        | ewing_data TA_0 GROUP BY 1, 2, 4    |          |
    +--------------------------------------------------------------------------------------------------------+
    
  4. Choose the first item in the list, tvdata.tv_viewing_data_av_1.

    Note the SQL SELECT statement that defines it:

    SELECT hour((TA_0.ts)) '_c0', 
      TA_0.channel channel, 
      TA_0.program program, 
      sum(1) '_c3', 
      datestring datestring 
    FROM tvdata.tv_viewing_data TA_0 GROUP BY 1, 2, 3, 5  
  5. Run the show statement for analytical view tvdata.tv_viewing_data_av_1, and note the partitioning information.

    [localhost:31050] > show create table tv_viewing_data_av_1;
    +--------------------------------------------------------------+
    | result                                                       |
    +--------------------------------------------------------------+
    | CREATE TABLE tvdata.tv_viewing_data_av_1 (  
      _c0 	INT COMMENT  'SQL Expression: 'hour((TA_0.ts))'', 
      channel STRING COMMENT 'SQL Expression: 'TA_0.channel'', 
      program STRING COMMENT 'SQL Expression: 'TA_0.program'', 
      _c3 BIGINT COMMENT 'SQL Expression: 'sum(1)''
    )
    PARTITIONED BY (
      datestring STRING COMMENT 'SQL Expression: 'datestring''
    )
    STORED AS PARQUET
    LOCATION 'hdfs://larry1:8020/apps/hive/warehouse/tvdata.db/tv_viewing_data_av_1'
  6. On the target system, combine the information from the two previous steps to create a new analytical view by issuing the following CREATE ANALYTICAL VIEW command:

    CREATE ANALYTICAL VIEW  tv_viewing_data_av_1
    PARTITIONED BY (datestring)
    STORED AS PARQUET
    AS
    (SELECT hour((TA_0.ts)) '_c0', 
    TA_0.channel channel, 
    TA_0.program program, sum(1) '_c3', 
    datestring datestring 
    FROM 	tvdata.tv_viewing_data TA_0 
    GROUP BY 1, 2, 3, 5)
  7. Although created, the new analytical view is in UNUSABLE state. Run the refresh command to materialize the analytical view.

    refresh analytical view tv_viewing_data_av_1