Deploying Cloudera Search
Cloudera Search is built on a SolrCloud deployment of Apache Solr (not standalone Solr). SolrCloud partitions your data set into multiple indexes and processes, and uses ZooKeeper to simplify management, which results in a distributed cluster of Solr servers.
Installing and Starting ZooKeeper Server
Cloudera Search uses Apache ZooKeeper as a highly available, central system for cluster management. For a small cluster, running ZooKeeper colocated with the NameNode is recommended. For larger clusters, use multiple ZooKeeper servers. For more information, see Initialize Multiple ZooKeeper Servers in a Production Environment.
If you do not already have a ZooKeeper service added to your cluster, add it using the instructions in Adding a Service for Cloudera Manager installations. For package-based unmanaged clusters, see Setting Up Apache ZooKeeper Using the Command Line.
Initializing Solr
For Cloudera Manager installations, if you have not yet added the Solr service to your cluster, do so now using the instructions in Adding a Service. The Add a Service wizard automatically configures and initializes the Solr service.
Generating Collection Configuration
To start using Solr and indexing data, you must configure a collection to hold the index. A collection requires the following configuration files:
- solrconfig.xml
- schema.xml
- Any additional files referenced in the xml files
The solrconfig.xml file contains all of the Solr settings for a given collection, and the schema.xml file specifies the schema that Solr uses when indexing documents. For more details on how to configure a collection, see http://wiki.apache.org/solr/SchemaXml.
solrctl instancedir --generate $HOME/solr_configs
You can customize a collection by directly editing the solrconfig.xml and schema.xml files created in $HOME/solr_configs/conf.
solrctl instancedir --create <collection_name> $HOME/solr_configs
solrctl instancedir --list
For example, if you used the --create command to create a collection named weblogs, the --list command should return weblogs.
Creating Collections
solrctl collection --create <collection_name> -s <shard_count>To use the configuration that you provided to Solr in previous steps, use the same collection name (weblogs in our example). The -s <shard_count> parameter specifies the number of SolrCloud shards you want to partition the collection across. The number of shards cannot exceed the total number of Solr servers in your SolrCloud cluster.
To verify that the collection is active, go to http://search01.example.com:8983/solr/<collection_name>/select?q=*%3A*&wt=json&indent=true in a browser. For example, for the collection weblogs, the URL is http://search01.example.com:8983/solr/weblogs/select?q=*%3A*&wt=json&indent=true. Replace search01.example.com with the hostname of one of the Solr server hosts.
You can also view the SolrCloud topology using the URL http://search01.example.com:8983/solr/#/~cloud.
For more information on completing additional collection management tasks, see Managing Cloudera Search.