Install and Configure PostgreSQL for Cloudera Software
To use a PostgreSQL database, follow these procedures. For information on compatible versions of the PostgreSQL database, see CDH and Cloudera Manager Supported Databases.
Installing PostgreSQL Server
- Use one or more of the following commands to set the locale:
export LANGUAGE=en_US.UTF-8 export LANG=en_US.UTF-8 export LC_ALL=en_US.UTF-8 locale-gen en_US.UTF-8 dpkg-reconfigure locales
- Install PostgreSQL packages:
- RHEL
sudo yum install postgresql-server
- SLES
sudo zypper install postgresql92-server
- Ubuntu
sudo apt-get install postgresql
- RHEL
Configuring and Starting the PostgreSQL Server
By default, PostgreSQL only accepts connections on the loopback interface. You must reconfigure PostgreSQL to accept connections from the fully qualified domain names (FQDN) of the hosts hosting the services for which you are configuring databases. If you do not make these changes, the services cannot connect to and use the database on which they depend.
- Initialize the PostgreSQL database. For some versions of PostgreSQL, this occurs automatically the first time that you start the PostgreSQL server. In this case, issue the command:
sudo service postgresql start
In other versions, you must explicitly initialize the database using one of the following commands:- SLES 12:
sudo service postgresql initdb --pgdata=/var/lib/pgsql/data --encoding=UTF-8
- All other operating systems:
sudo service postgresql initdb
- SLES 12:
- Enable MD5 authentication. Edit pg_hba.conf, which is usually found in /var/lib/pgsql/data or /etc/postgresql/<version>/main. Add the following line:
host all all 127.0.0.1/32 md5
If the default pg_hba.conf file contains the following line:host all all 127.0.0.1/32 ident
then the host line specifying md5 authentication shown above must be inserted before this ident line. Failure to do so may cause an authentication error when services connect to the database. You can modify the contents of the md5 line shown above to support different configurations. For example, if you want to access PostgreSQL from a different host, replace 127.0.0.1 with your IP address and update postgresql.conf, which is typically found in the same place as pg_hba.conf, to include:listen_addresses = '*'
- Configure settings to ensure your system performs as expected. Update these settings in the /var/lib/pgsql/data/postgresql.conf or /var/lib/postgresql/data/postgresql.conf file. Settings vary based on cluster size and resources as follows:
- Small to mid-sized clusters - Consider the following settings as starting points. If resources are limited, consider reducing the buffer sizes and checkpoint segments further. Ongoing
tuning may be required based on each host's resource utilization. For example, the following values may be acceptable:
- shared_buffers - 256MB
- wal_buffers - 8MB
- checkpoint_segments - 16
- checkpoint_completion_target - 0.9
- Large clusters - Can contain up to 1000 hosts. Consider the following settings as starting points.
- max_connection - For large clusters, each database is typically hosted on a different host. In general, allow each database on a host 100 maximum connections and then add 50 extra connections. You may have to increase the system resources available to PostgreSQL, as described at Connection Settings.
- shared_buffers - 1024 MB. This requires that the operating system can allocate sufficient shared memory. See PostgreSQL information on Managing Kernel Resources for more information on setting kernel resources.
- wal_buffers - 16 MB. This value is derived from the shared_buffers value. Setting wal_buffers to be approximately 3% of shared_buffers up to a maximum of approximately 16 MB is sufficient in most cases.
- checkpoint_segments - 128. The PostgreSQL Tuning Guide recommends values between 32 and 256 for write-intensive systems, such as this one.
- checkpoint_completion_target - 0.9.
- Small to mid-sized clusters - Consider the following settings as starting points. If resources are limited, consider reducing the buffer sizes and checkpoint segments further. Ongoing
tuning may be required based on each host's resource utilization. For example, the following values may be acceptable:
- Configure the PostgreSQL server to start at boot.
OS Command RHEL 7 compatible sudo systemctl enable postgresql
RHEL 6 compatible sudo chkconfig postgresql on
SLES sudo chkconfig --add postgresql
Ubuntu sudo chkconfig postgresql on
- Restart the PostgreSQL database:
sudo service postgresql restart
Creating Databases for Cloudera Software
- Each Hive metastore
- Sentry Server
- Sqoop Server
- Hue
- Oozie
The databases must be configured to support the PostgreSQL UTF8 character set encoding.
Record the values you enter for database names, usernames, and passwords. You will need them to configure the individual services later.
- Connect to PostgreSQL:
sudo -u postgres psql
- Create databases for each service you are using from the below table:
postgres=# CREATE ROLE <user> LOGIN PASSWORD '<password>'; postgres=# CREATE DATABASE <database> OWNER <user> ENCODING 'UTF8';
You can use any value you want for <database>, <user>, and <password>. The following examples are the suggested names, but you are not required to use them:Databases for Cloudera Software Service Database User Hue hue hue Hive Metastore Server metastore hive Sentry Server sentry sentry Oozie oozie oozie Record the databases, usernames, and passwords chosen because you will need them later.
- For PostgreSQL 8.4 and higher, set standard_conforming_strings=off for the Hive Metastore and Oozie databases:
postgres=# ALTER DATABASE <database> SET standard_conforming_strings=off;
Install CDH Packages
After completing the above instructions to install and configure PostgreSQL databases for Cloudera software, continue to Step 4: Install CDH Packages.