There are many improvements in the new release, and here are some upgrade/migration notes:
You must refresh the following types of analytical views when migrating to ArcEngine 2.9:
The storage format for approx_median()
function has changed. The old storage format is no longer compatible with the new
ArcEngine.
The storage format for sampling functions has changed.
The old format is no longer compatible with the new ArcEngine. This applies to
stddev_samp
, variance_samp
, and
var_samp
.
Prior to Impala Release 2.9, the summing function on decimal data type had an error because it did not use sufficient precision during the calculation. Under certain conditions, it produced incorrect output. We corrected the error, and the summing function works correctly.
Issue the REFRESH ANALYTICAL VIEW
command to force a comprehensive refresh
of these analytical views.
This release deprecates the refresh parallelism feature, controlled by the query option
ANALYTICAL_VIEW_MAX_REFRESH_THREADS
. Our article on Incremental Refresh of Analytical Views discusses the new
approach for tuning the performance of refresh operations on analytical views.
If this is your first installation of Arcadia Enterprise, these steps are not necessary.
Total time necessary to load all metadata depends on the number of databases, tables, and the number of files that constitute the tables in the system. For systems with a large number of tables and files, we have previously observed metadata load time as high as two hours. Arcadia Enterprise changed the startup process to significantly improve performance, where the metadata for the same tables loads in under 30 seconds.
To realize this improvement, you must migrate the existing analytical views after upgrading to the Arcadia Enterprise Release 4.1.1.0 or later. Follow these steps:
During the initial start after update to the new release, start the
catalogd
process with the following command:
--load_catalog_in_background=true
This loads all the metadata for all the objects, and migrates it into the new metadata representation.
Notice that this migration operation takes a long time to run, and results in a slow start.
To ensure that all the metadata loads before issuing the next statement, monitor the
logging to catalogd.INFO
. When catalogd
initially starts, it logs messages to this file. When the metadata loading completes, it
does not write to catalogd.INFO
until the system receives a
ddl
statement.
Also, note that the contents of the catalogd.INFO
should look
something like this after loading metadata completes:
I0720 19:51:04.566013 18594 HdfsTable.java:339] load block md for tab2_av_stale file 41471208ed374ca8-8e4dccc4bfc0f9ae_885057525_data.0.parq ...
I0720 19:51:04.640791 18603 HdfsTable.java:1041] load table from Hive Metastore: xdb_1.traditional_phones...
I0720 19:51:04.642539 18616 Table.java:167] Loading column stats for table: tab3_av_invalid...
I0720 19:51:05.822299 18592 catalog-server.cc:313] Publishing update: TABLE:xdb_1.tab2_av_unusable@815...
I0720 19:51:05.822319 18592 catalog-server.cc:313] Publishing update: TABLE:xdb_1.h@800...
I0720 19:51:05.822371 18592 catalog-server.cc:313] Publishing update: CATALOG:aef12bf8b40b4b2c:aaec9d243d25591f@829...
I0720 20:01:02.110677 18602 catalog-server.cc:229] Catalog Version: 830 Last Catalog Version: 830
The last two lines often signal that metadata has been loaded successfully.
After all metadata loads, and all older analytical views are successfully migrated,
start the catalogd
with the following command, for better ongoing
performance at start time:
--load_catalog_in_background=false
This operation is only necessary when migrating analytical views due to upgrades from
earlier releases of Arcadia Enterprise. The default behavior in the current
release is equivalent to load_catalog_in_background=false
, and this option
works for fresh installs.