Setting Up Cloudera Search Using the Command Line
This documentation describes how to install Cloudera Search powered by Solr. It also explains how to install and start supporting tools and services such as the ZooKeeper Server, MapReduce tools for use with Cloudera Search, and Flume Solr Sink.
After installing Cloudera Search as described in this document, you can configure and use Cloudera Search as described in the Cloudera Search Guide. The user guide includes the Cloudera Search Tutorial, as well as topics that describe extracting, transforming, and loading data, establishing high availability, and troubleshooting.
Initializing Solr
Configure ZooKeeper Quorum Addresses
After the ZooKeeper service is running, configure each Solr host with the ZooKeeper quorum addresses. This can be a single address if you have only one ZooKeeper server, or multiple addresses if you are using multiple servers.
Configure the ZooKeeper Quorum addresses in /etc/solr/conf/solr-env.sh on each Solr server host. For example:
$ cat /etc/solr/conf/solr-env.sh export SOLR_ZK_ENSEMBLE=zk01.example.com:2181,zk02.example.com:2181,zk03.example.com:2181/solr
Configure Solr for Use with HDFS
To use Solr with your established HDFS service, perform the following configurations:
- Configure the HDFS URI for Solr to use as a backing store in /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr. On every Solr Server host, edit the following property to configure the location of Solr index data in HDFS:
SOLR_HDFS_HOME=hdfs://nn01.example.com:8020/solr
Replace nn01.example.com with the hostname of your HDFS NameNode (as specified by fs.default.name or fs.defaultFS in your /etc/hadoop/conf/core-site.xml file). You might also need to change the port number from the default (8020) if your NameNode runs on a non-default port. On an HA-enabled cluster, ensure that the HDFS URI you use reflects the designated name service used by your cluster. This value must be reflected in fs.default.name (for example, hdfs://nameservice1 or something similar).
- In some cases, such as configuring Solr to work with HDFS High Availability (HA), you might want to configure the Solr HDFS client by setting the
HDFS configuration directory in /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr. On every Solr Server host, locate the
appropriate HDFS configuration directory and edit the following property with the absolute path to this directory :
SOLR_HDFS_CONFIG=/etc/hadoop/conf
Replace the path with the correct directory containing the proper HDFS configuration files, core-site.xml and hdfs-site.xml.
Configuring Solr to Use Secure HDFS
If security is enabled, perform the following steps:
- Create the Kerberos principals and Keytab files for every host in your cluster:
- Create the Solr principal using either kadmin or kadmin.local.
kadmin: addprinc -randkey solr/fully.qualified.domain.name@YOUR-REALM.COM
kadmin: xst -norandkey -k solr.keytab solr/fully.qualified.domain.name
For more information, see Step 4: Create and Deploy the Kerberos Principals and Keytab Files
- Create the Solr principal using either kadmin or kadmin.local.
- Deploy the Kerberos Keytab files on every host in your cluster:
- Copy or move the keytab files to a directory that Solr can access, such as /etc/solr/conf.
$ sudo mv solr.keytab /etc/solr/conf/
$ sudo chown solr:hadoop /etc/solr/conf/solr.keytab $ sudo chmod 400 /etc/solr/conf/solr.keytab
- Copy or move the keytab files to a directory that Solr can access, such as /etc/solr/conf.
- Add Kerberos-related settings to /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr on every host in your cluster, substituting appropriate values. For a package based installation, use something similar to the
following:
SOLR_KERBEROS_ENABLED=true SOLR_KERBEROS_KEYTAB=/etc/solr/conf/solr.keytab SOLR_KERBEROS_PRINCIPAL=solr/fully.qualified.domain.name@YOUR-REALM.COM
Create the /solr Directory in HDFS
Before starting the Cloudera Search server, you must create the /solr directory in HDFS. The Cloudera Search service runs as the solr user by default, so it does not have the required permissions to create a top-level directory.
$ sudo -u hdfs hdfs dfs -mkdir /solr $ sudo -u hdfs hdfs dfs -chown solr /solr
If you are using a Kerberos-enabled cluster, you must authenticate with the hdfs account or another superuser before creating the directory:
$ kinit hdfs@EXAMPLE.COM $ hdfs dfs -mkdir /solr $ hdfs dfs -chown solr /solr
Initialize the ZooKeeper Namespace
$ solrctl init
Start Solr
$ sudo service solr-server restart
$ sudo jps -lm 31407 sun.tools.jps.Jps -lm 31236 org.apache.catalina.startup.Bootstrap start
Install Hue Search
You must install and configure Hue before you can use Search with Hue.
- Follow the instructions for Installing Hue.
- Use one of the following commands to install Search applications on the Hue machine:
- RHEL compatible:
sudo yum install hue-search
- Ubuntu/Debian:
sudo apt-get install hue-search
- SLES:
sudo zypper install hue-search
- RHEL compatible:
- Update the configuration information for the Solr Server:
Cloudera Manager Environment Environment without Cloudera Manager - Connect to Cloudera Manager.
- Select the Hue service.
- Click the Configuration tab.
- Search for the word "safety".
- Add information about your Solr host to Hue Server Advanced Configuration Snippet (Safety Valve) for hue_safety_valve_server.ini. For example, if your
hostname is SOLR_HOST, you might add the following:
[search] # URL of the Solr Server solr_url=http://SOLR_HOST:8983/solr
- (Optional) To enable Hue in environments where Kerberos authentication is required, update the security_enabled
property as follows:
# Requires FQDN in solr_url if enabled security_enabled=true
Update configuration information in /etc/hue/hue.ini. - Specify the Solr URL. For example, to use localhost as your Solr host, you would add the following:
[search] # URL of the Solr Server, replace 'localhost' if Solr is running on another host solr_url=http://localhost:8983/solr/
- (Optional) To enable Hue in environments where Kerberos authentication is required, update the security_enabled
property as follows:
# Requires FQDN in solr_url if enabled security_enabled=true
- Configure secure impersonation for Hue.
- If you are using Search in an environment that uses Cloudera Manager 4.8 and higher, secure impersonation for Hue is automatically configured. To
review secure impersonation settings in the Cloudera Manager home page:
- Go to the HDFS service.
- Click the Configuration tab.
- Select .
- Select .
- Type hue proxy in the Search box.
- Note the Service-Wide wild card setting for Hue Proxy Hosts and Hue Proxy User Groups.
- If you are not using Cloudera Manager or are using a version earlier than Cloudera Manager 4.8, configure Hue to impersonate any user that makes
requests by modifying /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr. The changes you make may vary according to the
users for which you want to configure secure impersonation. For example, you might make the following changes:
SOLR_SECURITY_ALLOWED_PROXYUSERS=hue SOLR_SECURITY_PROXYUSER_hue_HOSTS=* SOLR_SECURITY_PROXYUSER_hue_GROUPS=*
For more information about Secure Impersonation or to set up additional users for Secure Impersonation, see Enabling Secure Impersonation.
- If you are using Search in an environment that uses Cloudera Manager 4.8 and higher, secure impersonation for Hue is automatically configured. To
review secure impersonation settings in the Cloudera Manager home page:
- (Optional) To view files in HDFS, ensure that the correct webhdfs_url is included in hue.ini and WebHDFS is properly configured as described in Configuring CDH Components for Hue.
- Restart Hue:
$ sudo /etc/init.d/hue restart
- Open http://hue-host.com:8888/search/ in your browser.