Upgrade Considerations for Analytical Views

There are many improvements in the new release, and here are some upgrade/migration notes:

Refresh Analytical Views

You must refresh the following types of analytical views when migrating to ArcEngine 2.9:

  • Appx_Median

    The storage format for approx_median() function has changed. The old storage format is no longer compatible with the new ArcEngine.

  • Sampling functions

    The storage format for sampling functions has changed. The old format is no longer compatible with the new ArcEngine. This applies to stddev_samp, variance_samp, and var_samp.

  • Sum on Decimals

    Prior to Impala Release 2.9, the summing function on decimal data type had an error because it did not use sufficient precision during the calculation. Under certain conditions, it produced incorrect output. We corrected the error, and the summing function works correctly.

Issue the REFRESH ANALYTICAL VIEW command to force a comprehensive refresh of these analytical views.

Deprecation of Refresh Parallelism

This release deprecates the refresh parallelism feature, controlled by the query option ANALYTICAL_VIEW_MAX_REFRESH_THREADS. Our article on Incremental Refresh of Analytical Views discusses the new approach for tuning the performance of refresh operations on analytical views.

Migrating Analytical View Metadata

If this is your first installation of Arcadia Enterprise, these steps are not necessary.

Total time necessary to load all metadata depends on the number of databases, tables, and the number of files that constitute the tables in the system. For systems with a large number of tables and files, we have previously observed metadata load time as high as two hours. Arcadia Enterprise changed the startup process to significantly improve performance, where the metadata for the same tables loads in under 30 seconds.

To realize this improvement, you must migrate the existing analytical views after upgrading to the Arcadia Enterprise Release 4.1.1.0 or later. Follow these steps:

  1. During the initial start after update to the new release, start the catalogd process with the following command:

    --load_catalog_in_background=true

    This loads all the metadata for all the objects, and migrates it into the new metadata representation.

    Notice that this migration operation takes a long time to run, and results in a slow start.

  2. To ensure that all the metadata loads before issuing the next statement, monitor the logging to catalogd.INFO. When catalogd initially starts, it logs messages to this file. When the metadata loading completes, it does not write to catalogd.INFO until the system receives a ddl statement.

    Also, note that the contents of the catalogd.INFO should look something like this after loading metadata completes:

    I0720 19:51:04.566013 18594 HdfsTable.java:339] load block md for tab2_av_stale file 41471208ed374ca8-8e4dccc4bfc0f9ae_885057525_data.0.parq ...
    I0720 19:51:04.640791 18603 HdfsTable.java:1041] load table from Hive Metastore: xdb_1.traditional_phones...
    I0720 19:51:04.642539 18616 Table.java:167] Loading column stats for table: tab3_av_invalid... 
    I0720 19:51:05.822299 18592 catalog-server.cc:313] Publishing update: TABLE:xdb_1.tab2_av_unusable@815...          
    I0720 19:51:05.822319 18592 catalog-server.cc:313] Publishing update: TABLE:xdb_1.h@800...          
    I0720 19:51:05.822371 18592 catalog-server.cc:313] Publishing update: CATALOG:aef12bf8b40b4b2c:aaec9d243d25591f@829...          
    I0720 20:01:02.110677 18602 catalog-server.cc:229] Catalog Version: 830 Last Catalog Version: 830

    The last two lines often signal that metadata has been loaded successfully.

  3. After all metadata loads, and all older analytical views are successfully migrated, start the catalogd with the following command, for better ongoing performance at start time:

    --load_catalog_in_background=false

This operation is only necessary when migrating analytical views due to upgrades from earlier releases of Arcadia Enterprise. The default behavior in the current release is equivalent to load_catalog_in_background=false, and this option works for fresh installs.