Using the Hive Schema Tool in CDH
Use the Hive command-line schematool to upgrade or validate the Hive metastore database schema for unmanaged clusters.
Continue reading:
Schema Version Verification and Validation
Hive records the schema version in the metastore database and verifies that the metastore schema version is compatible with the Hive binaries that are going to access the metastore. The Hive configuration properties that implicitly create or alter the existing schema are disabled by default. Consequently, Hive does not attempt to change the metastore schema implicitly. When you execute a Hive query against a metastore where the schema is not initialized or the schema is old, it fails to access the metastore and an entry similar to the following example appears in the error log:
... Caused by: MetaException(message:Version information not found in metastore. ) at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:5638) ...
Use Hive schematool to repair the condition that causes this error by either initializing the schema or upgrading it.
Using schematool
Use the Hive schematool to initialize the metastore schema for the current Hive version or to upgrade the schema from an older version. The tool tries to find the current schema from the metastore if it is available there.
The schematool determines the SQL scripts that are required to initialize or upgrade the schema and then executes those scripts against the metastore database. The metastore database connection information such as JDBC URL, JDBC driver, and database credentials are extracted from the Hive configuration. You can provide alternate database credentials if needed.
$ schematool -help usage: schemaTool -dbType <databaseType> Metastore database type -dryRun List SQL scripts (no execute) -help Print this message -info Show config and schema details -initSchema Schema initialization -initSchemaTo <initTo> Schema initialization to a version -passWord <password> Override config file password -upgradeSchema Schema upgrade -upgradeSchemaFrom <upgradeFrom> Schema upgrade from a version -userName <user> Override config file user name -validate Validate the database -verbose Only print SQL statementsThe dbType option must always be specified and can be one of the following:
derby|mysql|postgres|oracle
Prerequisite Configuration
- javax.jdo.option.ConnectionURL
- javax.jdo.option.ConnectionDriverName
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://my_cluster.com:3306/hive1?useUnicode=true&characterEncoding=UTF-8</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property>
Usage Examples
To use the schematool command-line tool, navigate to the directory where it is located:
-
If you installed CDH using parcels, schematool is usually located at:
/opt/cloudera/parcels/CDH/lib/hive/bin/schematool
-
If you installed CDH using packages, schematool is usually located at:
/usr/lib/hive/bin/schematool
After you locate the executable, you can use schematool to perform the following actions:
-
Initialize your metastore to the current schema for a new Hive setup using the initSchema option.
$ schematool -dbType mysql -initSchema -passWord <db_user_pswd> -userName <db_user_name> Metastore connection URL: jdbc:mysql://<cluster_address>:3306/<user_name>?useUnicode=true&characterEncoding=UTF-8 Metastore Connection Driver : com.mysql.jdbc.Driver Metastore connection User: <user_name> Starting metastore schema initialization to <new_version> Initialization script hive-schema-<new_version_number>.mysql.sql Initialization script completed schemaTool completed
-
Get schema information using the info option.
$ schematool -dbType mysql -info -passWord <db_user_pswd> -userName <db_user_name> Metastore connection URL: jdbc:mysql://<cluster_address>:3306/<user_name>?useUnicode=true&characterEncoding=UTF-8 Metastore Connection Driver : com.mysql.jdbc.Driver Metastore connection User: <user_name> Hive distribution version: <new_version> Required schema version: <new_version> Metastore schema version: <new_version> schemaTool completed
-
If you attempt to get schema information from older metastores that did not store version information or if the schema is not initialized, the tool reports an error as follows.
$ schematool -dbType mysql -info -passWord <db_user_pswd> -userName <db_user_name> Metastore connection URL: jdbc:mysql://<cluster_address>:3306/<user_name>?useUnicode=true&characterEncoding=UTF-8 Metastore Connection Driver : com.mysql.jdbc.Driver Metastore connection User: <user_name> Hive distribution version: <new_version> Required schema version: <new_version> org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version, Cause:<cause_description> *** schemaTool failed ***
-
You can upgrade the schema from a specific release by specifying the -upgradeSchemaFrom option. The -upgradeSchemaFrom option requires the Hive version and not the CDH version. See CDH 5 Packaging and Tarball Information for information about which Hive version ships with each CDH release. The following example shows how to upgrade from CDH 5.2/Hive 0.13.1:
$ schematool -dbType mysql -passWord <db_user_pswd> -upgradeSchemaFrom 0.13.1 -userName <db_user_name> Metastore connection URL: jdbc:mysql://<cluster_address>:3306/<user_name>?useUnicode=true&characterEncoding=UTF-8 Metastore Connection Driver : com.mysql.jdbc.Driver Metastore connection User: <user_name> Starting upgrade metastore schema from version 0.13.1 to <new_version> Upgrade script upgrade-0.13.1-to-<new_version>.mysql.sql Completed pre-0-upgrade-0.13.1-to-<new_version>.mysql.sql Completed upgrade-0.13.1-to-<new_version>.mysql.sql schemaTool completed
-
Use the -validate option to verify the metastore schema. The following example shows the types of validations that are performed against the metastore schema when you use this option with schematool:
$ schematool -dbType mysql -passWord <db_user_pswd> -userName <db_user_name> -validate Starting metastore validation Validating schema version Succeeded in schema version validation. [SUCCESS] Validating sequence number for SEQUENCE_TABLE Succeeded in sequence number validation for SEQUENCE_TABLE [SUCCESS] Validating metastore schema tables Succeeded in schema table validation. [SUCCESS] Validating database/table/partition locations Succeeded in database/table/partition location validation [SUCCESS] Validating columns for incorrect NULL values Succeeded in column validation for incorrect NULL values [SUCCESS] Done with metastore validation: [SUCCESS] schemaTool completed
-
If you want to find out all the required scripts for a schema upgrade, use the dryRun option.
$ schematool -dbType mysql -upgradeSchemaFrom 0.10.0 -dryRun -passWord <db_user_pswd> -userName <db_user_name> Metastore connection URL: jdbc:mysql://<cluster_address>:3306/<user_name>?useUnicode=true&characterEncoding=UTF-8 Metastore Connection Driver : com.mysql.jdbc.Driver Metastore connection User: <user_name> Starting upgrade metastore schema from version 0.10.0 to <new_version> Upgrade script upgrade-0.10.0-to-0.11.0.mysql.sql Upgrade script upgrade-0.11.0-to-0.12.0.mysql.sql Upgrade script upgrade-0.12.0-to-0.13.0.mysql.sql Upgrade script upgrade-0.13.0-to-0.14.0.mysql.sql Upgrade script upgrade-0.14.0-to-1.1.0.mysql.sql Upgrade script upgrade-1.1.0-to-<new_version>.mysql.sql schemaTool completed