Deploying Cloudera Search

Cloudera Search is built on a SolrCloud deployment of Apache Solr (not standalone Solr). SolrCloud partitions your data set into multiple indexes and processes, and uses ZooKeeper to simplify management, which results in a distributed cluster of Solr servers.

Installing and Starting ZooKeeper Server

Cloudera Search uses Apache ZooKeeper as a highly available, central system for cluster management. For a small cluster, running ZooKeeper colocated with the NameNode is recommended. For larger clusters, use multiple ZooKeeper servers. For more information, see Initialize Multiple ZooKeeper Servers in a Production Environment.

If you do not already have a ZooKeeper service added to your cluster, add it using the instructions in Adding a Service for Cloudera Manager installations. For package-based unmanaged clusters, see Setting Up Apache ZooKeeper Using the Command Line.

Initializing Solr

For Cloudera Manager installations, if you have not yet added the Solr service to your cluster, do so now using the instructions in Adding a Service. The Add a Service wizard automatically configures and initializes the Solr service.

Generating Collection Configuration

To start using Solr and indexing data, you must configure a collection to hold the index. A collection requires the following configuration files:

solrconfig.xml
schema.xml
Any additional files referenced in the xml files

The solrconfig.xml file contains all of the Solr settings for a given collection, and the schema.xml file specifies the schema that Solr uses when indexing documents. For more details on how to configure a collection, see http://wiki.apache.org/solr/SchemaXml.

Configuration files for a collection are contained in a directory called an instance directory. To generate a template instance directory, run the following command:

solrctl instancedir --generate $HOME/solr_configs

You can customize a collection by directly editing the solrconfig.xml and schema.xml files created in $HOME/solr_configs/conf.

After you completing the configuration, you can make it available to Solr by running the following command, which uploads the contents of the instance directory to ZooKeeper:

solrctl instancedir --create <collection_name> $HOME/solr_configs

Use the solrctl utility to verify that your instance directory uploaded successfully and is available to ZooKeeper. List the current instance directories as follows:

solrctl instancedir --list

For example, if you used the --create command to create a collection named weblogs, the --list command should return weblogs.

Creating Collections

The Solr server does not include any default collections. Create a collection using the following command:

solrctl collection --create <collection_name> -s <shard_count>

To use the configuration that you provided to Solr in previous steps, use the same collection name (weblogs in our example). The -s <shard_count> parameter specifies the number of SolrCloud shards you want to partition the collection across. The number of shards cannot exceed the total number of Solr servers in your SolrCloud cluster.

To verify that the collection is active, go to http://search01.example.com:8983/solr/<collection_name>/select?q=*%3A*&wt=json&indent=true in a browser. For example, for the collection weblogs, the URL is http://search01.example.com:8983/solr/weblogs/select?q=*%3A*&wt=json&indent=true. Replace search01.example.com with the hostname of one of the Solr server hosts.

You can also view the SolrCloud topology using the URL http://search01.example.com:8983/solr/#/~cloud.

For more information on completing additional collection management tasks, see Managing Cloudera Search.

Schemaless Mode

Using Search through a Proxy for High Availability