Deploying Cloudera Search
When you deploy Cloudera Search, SolrCloud partitions your data set into multiple indexes and processes, and uses ZooKeeper to simplify management, which results in a cluster of coordinating Apache Solr servers.
Installing and Starting ZooKeeper Server
SolrCloud mode uses Apache ZooKeeper as a highly available, central location for cluster management. For a small cluster, running ZooKeeper collocated with the NameNode is recommended. For larger clusters, use multiple ZooKeeper servers. For more information, see Installing ZooKeeper in a Production Environment.
If you do not already have a ZooKeeper service added to your cluster, add it using the instructions in Adding a Service for Cloudera Manager installations. For package-based unmanaged clusters, see ZooKeeper Installation.
Initializing Solr
For Cloudera Manager installations, if you have not yet added the Solr service to your cluster, do so now using the instructions in Adding a Service. The Add a Service wizard automatically configures and initializes the Solr service.
For unmanaged clusters, you must do this manually:
Configuring ZooKeeper Quorum Addresses
After the ZooKeeper service is running, configure each Solr host with the ZooKeeper quorum addresses. This can be a single address if you have only one ZooKeeper server, or multiple addresses if you are using multiple servers.
Configure the ZooKeeper Quorum addresses in /etc/solr/conf/solr-env.sh on each Solr server host. For example:
$ cat /etc/solr/conf/solr-env.sh export SOLR_ZK_ENSEMBLE=zk01.example.com:2181,zk02.example.com:2181,zk03.example.com:2181/solr
Configuring Solr for Use with HDFS
To use Solr with your established HDFS service, perform the following configurations:
- Configure the HDFS URI for Solr to use as a backing store in /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr. On every Solr Server host, edit the following property to configure the location of Solr index data in HDFS:
SOLR_HDFS_HOME=hdfs://nn01.example.com:8020/solr
Replace nn01.example.com with the hostname of your HDFS NameNode (as specified by fs.default.name or fs.defaultFS in your /etc/hadoop/conf/core-site.xml file). You might also need to change the port number from the default (8020) if your NameNode runs on a non-default port. On an HA-enabled cluster, ensure that the HDFS URI you use reflects the designated name service used by your cluster. This value must be reflected in fs.default.name (for example, hdfs://nameservice1 or something similar).
- In some cases, such as configuring Solr to work with HDFS High Availability (HA), you might want to configure the Solr HDFS client by setting the
HDFS configuration directory in /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr. On every Solr Server host, locate the
appropriate HDFS configuration directory and edit the following property with the absolute path to this directory :
SOLR_HDFS_CONFIG=/etc/hadoop/conf
Replace the path with the correct directory containing the proper HDFS configuration files, core-site.xml and hdfs-site.xml.
Configuring Solr to Use Secure HDFS
If security is enabled, perform the following steps:
- Create the Kerberos principals and Keytab files for every host in your cluster:
- Create the Solr principal using either kadmin or kadmin.local.
kadmin: addprinc -randkey solr/fully.qualified.domain.name@YOUR-REALM.COM
kadmin: xst -norandkey -k solr.keytab solr/fully.qualified.domain.name
For more information, see Step 4: Create and Deploy the Kerberos Principals and Keytab Files
- Create the Solr principal using either kadmin or kadmin.local.
- Deploy the Kerberos Keytab files on every host in your cluster:
- Copy or move the keytab files to a directory that Solr can access, such as /etc/solr/conf.
$ sudo mv solr.keytab /etc/solr/conf/
$ sudo chown solr:hadoop /etc/solr/conf/solr.keytab $ sudo chmod 400 /etc/solr/conf/solr.keytab
- Copy or move the keytab files to a directory that Solr can access, such as /etc/solr/conf.
- Add Kerberos-related settings to /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr on every host in your cluster, substituting appropriate values. For a package based installation, use something similar to the
following:
SOLR_KERBEROS_ENABLED=true SOLR_KERBEROS_KEYTAB=/etc/solr/conf/solr.keytab SOLR_KERBEROS_PRINCIPAL=solr/fully.qualified.domain.name@YOUR-REALM.COM
Creating the /solr Directory in HDFS
Before starting the Cloudera Search server, you must create the /solr directory in HDFS. The Cloudera Search service runs as the solr user by default, so it does not have the required permissions to create a top-level directory.
$ sudo -u hdfs hdfs dfs -mkdir /solr $ sudo -u hdfs hdfs dfs -chown solr /solr
If you are using a Kerberos-enabled cluster, you must authenticate with the hdfs account or another superuser before creating the directory:
$ kinit hdfs@EXAMPLE.COM $ hdfs dfs -mkdir /solr $ hdfs dfs -chown solr /solr
Initializing the ZooKeeper Namespace
$ solrctl init
Starting Solr
$ sudo service solr-server restart
$ sudo jps -lm 31407 sun.tools.jps.Jps -lm 31236 org.apache.catalina.startup.Bootstrap start
Generating Collection Configuration
To start using Solr and indexing data, you must configure a collection to hold the index. A collection requires the following configuration files:
- solrconfig.xml
- schema.xml
- Any additional files referenced in the xml files
The solrconfig.xml file contains all of the Solr settings for a given collection, and the schema.xml file specifies the schema that Solr uses when indexing documents. For more details on how to configure a collection, see http://wiki.apache.org/solr/SchemaXml.
$ solrctl instancedir --generate $HOME/solr_configs
You can customize a collection by directly editing the solrconfig.xml and schema.xml files created in $HOME/solr_configs/conf.
$ solrctl instancedir --create <collection_name> $HOME/solr_configs
$ solrctl instancedir --list
For example, if you used the --create command to create a collection named weblogs, the --list command should return weblogs.
Creating Collections
$ solrctl collection --create <collection_name> -s <shard_count>To use the configuration that you provided to Solr in previous steps, use the same collection name (weblogs in our example). The -s <shard_count> parameter specifies the number of SolrCloud shards you want to partition the collection across. The number of shards cannot exceed the total number of Solr servers in your SolrCloud cluster.
To verify that the collection is active, go to http://search01.example.com:8983/solr/<collection_name>/select?q=*%3A*&wt=json&indent=true in a browser. For example, for the collection weblogs, the URL is http://search01.example.com:8983/solr/weblogs/select?q=*%3A*&wt=json&indent=true. Replace search01.example.com with the hostname of one of the Solr server hosts.
You can also view the SolrCloud topology using the URL http://search01.example.com:8983/solr/#/~cloud.
For more information on completing additional collection management tasks, see Managing Solr Using solrctl.