Setting Up Hue Using the Command Line
Hue Configuration
This section describes the Hue configuration file, hue.ini. The location of hue.ini varies depending on how Hue is installed and is displayed in Cloudera Manager at
.You can configure the Hue apps using the properties described in the following sections:
- Viewing the Hue Configuration
- Hue Server Configuration
- Beeswax Configuration
- Impala Query UI Configuration
- DB Query Configuration
- Pig Editor Configuration
- Sqoop Configuration
- Job Browser Configuration
- Job Designer
- Oozie Editor/Dashboard Configuration
- Search Configuration
- HBase Configuration
- User Admin Configuration
- Hadoop Configuration
- Liboozie Configuration
- Sentry Configuration
- ZooKeeper Configuration
Viewing the Hue Configuration
When you log in to Hue, the start-up page displays information about any misconfiguration detected.
To view the Hue configuration, do one of the following:
- Visit http://myserver:port and click the Configuration tab.
- Visit http://myserver:port/desktop/dump_config.
Hue Server Configuration
This section describes Hue Server settings.
Specifying the Hue Server HTTP Address
These configuration properties are under the [desktop] section in the Hue configuration file.
# Webserver listens on this address and port http_host=0.0.0.0 http_port=8888
Specifying the Secret Key
For security, you should specify the secret key that is used for secure hashing in the session store:
- Open the Hue configuration file.
- In the [desktop] section, set the secret_key property to a long series of random characters (30 to 60 characters is
recommended). For example,
secret_key=qpbdxoewsqlkhztybvfidtvwekftusgdlofbcfghaswuicmqp
Authentication
In a non-secure deployment, the first user who logs in to Hue can choose any username and password and automatically becomes an administrator. This user can create other user and administrator accounts. Hue users should correspond to the Linux users who use Hue; make sure you use the same name as the Linux username.
By default, user information is stored in the Hue database. However, the authentication system is pluggable. You can authenticate Hue with LDAP (Active Directory or OpenLDAP), or you can import users and groups from an LDAP directory.
Configuring the Hue Server for TLS/SSL
You can optionally configure Hue to serve over HTTPS. As of CDH 5, pyOpenSSL is now part of the Hue build and does not need to be installed manually. To configure TLS/SSL, perform the following steps from the root of your Hue installation path:
- Configure Hue to use your private key by adding the following options to the Hue configuration file:
ssl_certificate=/path/to/certificate ssl_private_key=/path/to/key
- On a production system, you should have an appropriate key signed by a well-known Certificate Authority. If you're just testing, you can create a self-signed key using the openssl command that may be installed on your system:
# Create a key $ openssl genrsa 1024 > host.key # Create a self-signed certificate $ openssl req -new -x509 -nodes -sha1 -key host.key > host.cert
Authentication Backend Options for Hue
The table below gives a list of authentication backends Hue can be configured with including the recent SAML backend that enables single sign-on authentication. The backend configuration property is available in the [[auth]] section under [desktop].
backend |
django.contrib.auth.backends.ModelBackend |
This is the default authentication backend used by Django. |
desktop.auth.backend.AllowAllBackend |
This backend does not require a password for users to log in. All users are automatically authenticated and the username is set to what is provided. |
|
desktop.auth.backend.AllowFirstUserDjangoBackend |
This is the default Hue backend. It creates the first user that logs in as the super user. After this, it relies on Django and the user manager to authenticate users. |
|
desktop.auth.backend.LdapBackend |
Authenticates users against an LDAP service. |
|
desktop.auth.backend.PamBackend |
Authenticates users with PAM (pluggable authentication module). The authentication mode depends on the PAM module used. |
|
desktop.auth.backend.SpnegoDjangoBackend |
SPNEGO is an authentication mechanism negotiation protocol. Authentication can be delegated to an authentication server, such as a Kerberos KDC, depending on the mechanism negotiated. |
|
desktop.auth.backend.RemoteUserDjangoBackend |
Authenticating remote users with the Django backend. |
|
desktop.auth.backend.OAuthBackend |
Delegates authentication to a third-party OAuth server. |
|
libsaml.backend.SAML2Backend |
Secure Assertion Markup Language (SAML) single sign-on (SSO) backend. Delegates authentication to the configured Identity Provider. |
Beeswax Configuration
In the [beeswax] section of the configuration file, you can optionally specify the following:
hive_server_host |
The fully qualified domain name or IP address of the host running HiveServer2. |
hive_server_port |
The port of the HiveServer2 Thrift server. Default: 10000. |
hive_conf_dir |
The directory containing hive-site.xml, the HiveServer2 configuration file. |
Impala Query UI Configuration
In the [impala] section of the configuration file, you can optionally specify the following:
server_host |
The hostname or IP address of the Impala Server. Default: localhost. |
server_port |
The port of the Impalad Server. Default: 21050 |
impersonation_enabled |
Turn on/off impersonation mechanism when talking to Impala. Default: False |
DB Query Configuration
The DB Query app can have any number of databases configured in the [[databases]] section under [librdbms]. A database is known by its section name (mysql, postgresql, and oracle as in the list below).
Database Type | Configuration Properties |
---|---|
MySQL, Oracle or PostgreSQL: [[[mysql]]] |
# Name to show in the UI. ## nice_name="My SQL DB" # For MySQL and PostgreSQL, name is the name of the database. # For Oracle, Name is instance of the Oracle server. For express edition # this is 'xe' by default. ## name=mysqldb # Database backend to use. This can be: # 1. mysql # 2. postgresql # 3. oracle ## engine=mysql # IP or hostname of the database to connect to. ## host=localhost # Port the database server is listening to. Defaults are: # 1. MySQL: 3306 # 2. PostgreSQL: 5432 # 3. Oracle Express Edition: 1521 ## port=3306 # Username to authenticate with when connecting to the database. ## user=example # Password matching the username to authenticate with when # connecting to the database. ## password=example |
Pig Editor Configuration
In the [pig] section of the configuration file, you can optionally specify the following:
remote_data_dir |
Location on HDFS where the Pig examples are stored. |
Sqoop Configuration
In the [sqoop] section of the configuration file, you can optionally specify the following:
server_url |
The URL of the sqoop2 server. |
Job Browser Configuration
By default, any user can see submitted job information for all users. You can restrict viewing of submitted job information by optionally setting the following property under the [jobbrowser] section in the Hue configuration file:
share_jobs |
Indicate that jobs should be shared with all users. If set to false, they will be visible only to the owner and administrators. |
Job Designer
remote_data_dir |
Location in HDFS where the Job Designer examples and templates are stored. |
Oozie Editor/Dashboard Configuration
By default, any user can see all workflows, coordinators, and bundles. You can restrict viewing of workflows, coordinators, and bundles by configuring either of the following properties under the [oozie] section of the Hue configuration file:
oozie_jobs_count |
Maximum number of Oozie workflows or coordinators or bundles to retrieve in one API call. |
remote_data_dir |
The location in HDFS where Oozie workflows are stored. |
As of CDH 5.4, Hue uses a new editor for Oozie documents. If documents were created in the old editor, they won't immediately be available to users other than the document owner. To resolve this problem, the document owner can share any documents again. Alternatively, you can revert to the old editor by setting the flag use_new_editor=false in the [oozie] section of the Hue configuration file.
Also see Liboozie Configuration.
Search Configuration
security_enabled |
Indicate whether Solr requires clients to perform Kerberos authentication. |
empty_query |
Query sent when no term is entered. Default: *:*. |
solr_url |
URL of the Solr server. |
HBase Configuration
truncate_limit |
Hard limit of rows or columns per row fetched before truncating. Default: 500 |
hbase_clusters |
Comma-separated list of HBase Thrift servers for clusters in the format of "(name|host:port)". Default: (Cluster|localhost:9090) |
- Ensure you have a secure HBase Thrift server.
- Enable impersonation for the Thrift server by adding the following properties to hbase-site.xml on each Thrift gateway:
<property> <name>hbase.regionserver.thrift.http</name> <value>true</value> </property> <property> <name>hbase.thrift.support.proxyuser</name> <value>true/value> </property>
See: Configure doAs Impersonation for the HBase Thrift Gateway.
- Configure Hue to point to a valid HBase configuration directory. You will find this property under the [hbase] section of the hue.ini file.
hbase_conf_dir
HBase configuration directory, where hbase-site.xml is located.
Default: /etc/hbase/conf
User Admin Configuration
In the [useradmin] section of the configuration file, you can optionally specify the following:
default_user_group |
The name of the group to which a manually created user is automatically assigned. Default: default. |
Configuring an LDAP Server for User Admin
See Authenticate Hue Users with LDAP and Synchronize Hue with LDAP Server.
User Admin can interact with an LDAP server, such as Active Directory, in one of two ways:
- You can import user and group information from your current Active Directory infrastructure using the LDAP Import feature in the User Admin application. User authentication is then performed by User Admin based on the imported user and password information. You can then manage the imported users, along with any users you create directly in User Admin.
- You can configure User Admin to use an LDAP server as the authentication back end, which means users logging in to Hue will authenticate to the LDAP server, rather than against a username and password kept in User Admin. In this scenario, your users must all reside in the LDAP directory.
Enabling Import of Users and Groups from an LDAP Directory
User Admin can import users and groups from an Active Directory using the Lightweight Directory Authentication Protocol (LDAP). In order to use this feature, you must configure User Admin with a set of LDAP settings in the Hue configuration file.
- In the Hue configuration file, configure the following properties in the [[ldap]] section:
Property
Description
Example
base_dn
The search base for finding users and groups.
base_dn="DC=mycompany,DC=com"
nt_domain
The NT domain to connect to (only for use with Active Directory).
nt_domain=mycompany.com
ldap_url
URL of the LDAP server.
ldap_url=ldap://auth.mycompany.com
ldap_cert
Path to certificate for authentication over TLS (optional).
ldap_cert=/mycertsdir/myTLScert
bind_dn
Distinguished name of the user to bind as – not necessary if the LDAP server supports anonymous searches.
bind_dn="CN=ServiceAccount,DC=mycompany,DC=com"
bind_password
Password of the bind user – not necessary if the LDAP server supports anonymous searches.
bind_password=P@ssw0rd
- Configure the following properties in the [[[users]]] section:
Property
Description
Example
user_filter
Base filter for searching for users.
user_filter="objectclass=*"
user_name_attr
The username attribute in the LDAP schema.
user_name_attr=sAMAccountName
- Configure the following properties in the [[[groups]]] section:
Property
Description
Example
group_filter
Base filter for searching for groups.
group_filter="objectclass=*"
group_name_attr
The username attribute in the LDAP schema.
group_name_attr=cn
Enabling the LDAP Server for User Authentication
You can configure User Admin to use an LDAP server as the authentication back end, which means users logging in to Hue will authenticate to the LDAP server, rather than against usernames and passwords managed by User Admin.
- In the Hue configuration file, configure the following properties in the [[ldap]] section:
Property
Description
Example
ldap_url
URL of the LDAP server, prefixed by ldap:// or ldaps://
ldap_url=ldap://auth.mycompany.com
search_bind_ authentication
Search bind authentication is now the default instead of direct bind. To revert to direct bind, the value of this property should be set to false. When using search bind semantics, Hue will ignore the following nt_domain and ldap_username_pattern properties.
search_bind_authentication= false
nt_domain
The NT domain over which the user connects (not strictly necessary if using ldap_username_pattern.
nt_domain=mycompany.com
ldap_username_ pattern
Pattern for searching for usernames – Use <username> for the username parameter. For use when using LdapBackend for Hue authentication
ldap_username_pattern= "uid=<username>,ou=People,dc=mycompany,dc=com"
- If you are using TLS or secure ports, add the following property to specify the path to a TLS certificate file:
Property
Description
Example
ldap_cert Path to certificate for authentication over TLS.
ldap_cert=/mycertsdir/myTLScert
- In the[[auth]] sub-section inside [desktop] change the following:
backend
Change the setting of backend frombackend=desktop.auth.backend.AllowFirstUserDjangoBackend
tobackend=desktop.auth.backend.LdapBackend
Hadoop Configuration
The following configuration variables are under the [hadoop] section in the Hue configuration file.
HDFS Cluster Configuration
Hue currently supports only one HDFS cluster, which you define under the [[hdfs_clusters]] sub-section. The following properties are supported:
[[[default]]] |
The section containing the default settings. |
fs_defaultfs |
The equivalent of fs.defaultFS (also referred to as fs.default.name) in a Hadoop configuration. |
webhdfs_url |
The HttpFS URL. The default value is the HTTP port on the NameNode. |
YARN (MRv2) and MapReduce (MRv1) Cluster Configuration
Job Browser can display both MRv1 and MRv2 jobs, but must be configured to display one type at a time by specifying either [[yarn_clusters]] or [[mapred_clusters]] sections in the Hue configuration file.
The following YARN cluster properties are defined under the under the [[yarn_clusters]] sub-section:
[[[default]]] |
The section containing the default settings. |
resourcemanager_host |
The fully qualified domain name of the host running the ResourceManager. |
resourcemanager_port |
The port for the ResourceManager IPC service. |
submit_to |
If your Oozie is configured to use a YARN cluster, then set this to true. Indicate that Hue should submit jobs to this YARN cluster. |
proxy_api_url |
URL of the ProxyServer API. Default: http://localhost:8088 |
history_server_api_url |
URL of the HistoryServer API Default: http://localhost:19888 |
[[[default]]] |
The section containing the default settings. |
jobtracker_host |
The fully qualified domain name of the host running the JobTracker. |
jobtracker_port |
The port for the JobTracker IPC service. |
submit_to |
If your Oozie is configured with to use a 0.20 MapReduce service, then set this to true. Indicate that Hue should submit jobs to this MapReduce cluster. |
Liboozie Configuration
security_enabled |
Indicate whether Oozie requires clients to perform Kerberos authentication. |
remote_deployment_dir |
The location in HDFS where the workflows and coordinators are deployed when submitted by a non-owner. |
oozie_url |
The URL of the Oozie server. |
Sentry Configuration
In the [libsentry] section of the configuration file, specify the following:
hostname |
Hostname or IP of server. Default: localhost |
port |
The port where the Sentry service is running. Default: 8038 |
sentry_conf_dir |
Sentry configuration directory, where sentry-site.xml is located. Default: /etc/sentry/conf |
Hue will also automatically pick up the HiveServer2 server name from Hive's sentry-site.xml file at /etc/hive/conf.
<property> <name>sentry.service.allow.connect</name> <value>impala,hive,solr,hue</value> </property>
ZooKeeper Configuration
host_ports |
Comma-separated list of ZooKeeper servers in the format "host:port". Example: localhost:2181,localhost:2182,localhost:2183 |
rest_url |
The URL of the REST Contrib service (required for znode browsing). Default: http://localhost:9998 |
Setting up REST Service for ZooKeeper
ZooKeeper Browser requires the ZooKeeper REST service to be running. Follow the instructions below to set this up.
Step 1: Git and build the ZooKeeper repository
git clone https://github.com/apache/zookeeper cd zookeeper ant Buildfile: /home/hue/Development/zookeeper/build.xml init: [mkdir] Created dir: /home/hue/Development/zookeeper/build/classes [mkdir] Created dir: /home/hue/Development/zookeeper/build/lib [mkdir] Created dir: /home/hue/Development/zookeeper/build/package/lib [mkdir] Created dir: /home/hue/Development/zookeeper/build/test/lib …
Step 2: Start the REST service
cd src/contrib/rest nohup ant run&
Step 3: Update ZooKeeper configuration properties (if required)
If ZooKeeper and the REST service are not on the same machine as Hue, update the Hue configuration file and specify the correct hostnames and ports as shown in the sample configuration below:
[zookeeper] ... [[clusters]] ... [[[default]]] # Zookeeper ensemble. Comma separated list of Host/Port. # e.g. localhost:2181,localhost:2182,localhost:2183 ## host_ports=localhost:2181 # The URL of the REST contrib service ## rest_url=http://localhost:9998
You should now be able to successfully run the ZooKeeper Browser app.
Configuring CDH Components for Hue
To enable communication between the Hue Server and CDH components, you must make minor changes to your CDH installation by adding the properties described in this section to your CDH configuration files in /etc/hadoop/conf/. If you are installing on a cluster, make the following configuration changes to your existing CDH installation on each node in your cluster.
WebHDFS or HttpFS Configuration
Hue can use either of the following to access HDFS data:
- WebHDFS provides high-speed data transfer with good locality because clients talk directly to the DataNodes inside the Hadoop cluster.
- HttpFS is a proxy service appropriate for integration with external systems that are not behind the cluster's firewall.
Both WebHDFS and HttpFS use the HTTP REST API so they are fully interoperable, but Hue must be configured to use one or the other. For HDFS HA deployments, you must use HttpFS.
To configure Hue to use either WebHDFS or HttpFS, do the following steps:
- For WebHDFS only:
- Add the following property in hdfs-site.xml to enable WebHDFS in the NameNode and DataNodes:
<property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>
- Restart your HDFS cluster.
- Add the following property in hdfs-site.xml to enable WebHDFS in the NameNode and DataNodes:
- Configure Hue as a proxy user for all other users and groups, meaning it may submit a request on behalf of any other user:
WebHDFS: Add to core-site.xml:
<!-- Hue WebHDFS proxy user setting --> <property> <name>hadoop.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hue.groups</name> <value>*</value> </property>
HttpFS: Verify that /etc/hadoop-httpfs/conf/httpfs-site.xml has the following configuration:
<!-- Hue HttpFS proxy user setting --> <property> <name>httpfs.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>httpfs.proxyuser.hue.groups</name> <value>*</value> </property>
If the configuration is not present, add it to /etc/hadoop-httpfs/conf/httpfs-site.xml and restart the HttpFS daemon. - Verify that core-site.xml has the following configuration:
<property> <name>hadoop.proxyuser.httpfs.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.httpfs.groups</name> <value>*</value> </property>
If the configuration is not present, add it to /etc/hadoop/conf/core-site.xml and restart Hadoop. - With root privileges, update hadoop.hdfs_clusters.default.webhdfs_url in hue.ini to point to the address of either WebHDFS
or HttpFS.
[hadoop] [[hdfs_clusters]] [[[default]]] # Use WebHdfs/HttpFs as the communication mechanism.
WebHDFS:... webhdfs_url=http://FQDN:50070/webhdfs/v1/
HttpFS:
... webhdfs_url=http://FQDN:14000/webhdfs/v1/
Oozie Configuration
To run DistCp, Streaming, Pig, Sqoop, and Hive jobs in Job Designer or the Oozie Editor/Dashboard application, see Installing the Oozie ShareLib in Hadoop HDFS for instructions.
<!-- Default proxyuser configuration for Hue --> <property> <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name> <value>*</value> </property>
Search Configuration
See Search Configuration for details on how to configure the Search application for Hue.
HBase Configuration
Hive Configuration
The Beeswax daemon has been replaced by HiveServer2. Hue should therefore point to a running HiveServer2. This change involved the following major updates to the [beeswax] section of the Hue configuration file, hue.ini.
[beeswax] # Host where Hive server Thrift daemon is running. # If Kerberos security is enabled, use fully-qualified domain name (FQDN). ## hive_server_host=<FQDN of HiveServer2> # Port where HiveServer2 Thrift server runs on. ## hive_server_port=10000
Existing Hive Installation
In the Hue configuration file hue.ini, modify hive_conf_dir to point to the directory containing hive-site.xml.
No Existing Hive Installation
Familiarize yourself with the configuration options in hive-site.xml. See Hive Installation. Having a hive-site.xml is optional but often useful, particularly on setting up a metastore. You can locate it using the hive_conf_dir configuration variable.