SolrCloud Cluster (Single Collection) Deployment

Update: revised an article up to Solr v4.9.x

Objective of this article is to provide insight & an idea on how to design SolrCloud Cluster (Single Collection) deployment.  I recommend you to check out an post SolrCloud (aka Apache Solr) to know Solr Distributed Terminology, Bird Eye: SolrCloud vs Classic Solr, Articles and upcoming Articles.


Designing SolrCloud Collection

Design phase is considered to be a vital & crucial phase of the project.  We are going to execute the following design plan for designing SolrCloud cluster.  Collection & Shard are logical elements in the SolrCloud cluster, multiple Solr core(s) collective brings SolrCloud formation. Underneath it just a physical Solr Core.

  • Creation of single SolrCloud collection named europe-collection in Europe Data Center
  • ZooKeeper ensemble with 5 replicated ZooKeeper server(s)
  • 3 Node(s) of SolrCloud Instance(s) with replication factor 3
  • 3 Shard(s) distributed among 3 SolrCloud Instance(s)
  • Manual distribution of Shard replica(s) among 3 Shard(s) on 3 Node(s)

Technical Elements:

  • Apache Tomcat 7.0.x (same steps applies other version too)
  • Apache ZooKeeper 3.3.x
  • Apache Solr 4.x

Kindly Note: The above described design plan is going to be executed in a single box as for as this article is concerned. The server instances will be simulated within the same box. However the design plan can be executed / simulated cross multiple boxes.

Directory structures are –

The described design has been represented in below diagram:


ZooKeeper Ensemble Deployment

Check out an article ZooKeeper Cluster (Multi-Server) Setup to get acquaintance with the steps to deploy a ZooKeeper ensemble.  Go ahead and deploy ZooKeeper server(s).


Handy SolrCloud ZkCLI Commands

SolrCloud comes with really handy ZooKeeper CLI commands for upload, download, linking configuration set to Collection.  Let’s prepare, it’s very simple.  All you have to do is

  • Extract the apache-solr-4.x.x.war
  • Get all the jar(s) from lib directory and place it in local directory

Let’s perform these steps:

Download Apache Solr 4 artifact to tmp directory

Extract zip file and war file in to tmp directory

Create a directory and copy it over there

Copying Logger libraries to solr-cli-lib [this step is applicable only to Solr 4.3 & above]

Now, we are ready to take advantage SolrCloud ZooKeeper CLI handy commands in sub sequent article sections.


Creating Solr Configuration schema.xml, solrconfig.xml, etc

Every organization is unique and has its own requirement and hence create Solr configuration files adhering to the  unique business requirements of the the organization.  Of-course every Solr deployment is unique in nature around the world with its schema (fields and data structure), solrconfig (analyzers, tokenizers, etc), stopwords, Boosting & Blocking, synonyms, etc.

Create your own solr configurations and place it in a directory.  For this article I’m planning to use example configuration shipped with Solr 4 artifacts.

Listing out copied Solr configuration


Uploading Solr Configuration into ZooKeeper ensemble

We have already prepared the required Solr CLI libs so let us use it now.  Listing down the points that we should be aware of –

  • ZooKeeper host addresses and client port numbers for zkhost param
  • Solr Configuration directory for confdir param
  • Configuration name in the ZooKeeper binding, let’s say myconf

Uploading a Solr Configuration

Linking Uploaded Solr configuration with collection

We have uploaded and linked solr configuration, lets verify it.

Connecting to ZooKeeper (we have five servers, let’s connect to one among the 5 servers)

Querying uploaded configuration and collection path in ZooKeeper

Looks good move on!


Deploying SolrCloud in Tomcat

In this section we will be setting up tomcat instance(s) and SolrCloud Cluster.  It would take more number of steps to understand the SolrCloud cluster deployment.  I’m going to describe those steps in simplified way.  Stay with me!

Step 1

Directory Structure creation for our deployment

Step 2

Downloading Tomcat 7 & Solr 4 artifacts

Step 3

Extracting downloaded artifacts and place it in respective directories

Step 4

Creating a setenv.sh for respective tomcat(s) serves for our customization.  I’m describing for one tomcat instance and similarly follow the same steps for other two tomcat servers with appropriate values (Solr home, app server port no.)

And place the following configuration snippet in it and FORGET NOT to save  setenv.sh

Incorporated notes from Flavio Pompermaier (via comment):

  • It is important to specify in the solr.xml the host parameter and than in SOLR_OPTS too otherwise every node will advertise itself with its IP & hostname (127.0.0.1 & localhost). Since its defaults to first localhost address found
  • If you keep solr configs in specific directory on ZooKeeper, kindly specify/append that directory name in the last node of the -DzkHost param.  For e.g.: SOLR_OPTS will look like-

Step 5

Creating solr.xml for Solr Home

And place following lines into it and save  solr.xml

Configuration for Solr v4.4.0 and above

Configuration up to Solr v4.3.1

Now copy above solr.xml to home2 and  home3

Step 6

Applying necessary permissions

Step 7

Updating tomcat server.xml for port number, shutdown port number. and comment out AJP connector (we don’t need it here).  After modification don’t forget to save server.xml

Tomcat 1:  /Users/jeeva/dc-1/tomcat1/conf/server.xml

  • Port No. => 7070
  • Shutdown port no. => 7005
  • comment out Java AJP connector

Tomcat 2:  /Users/jeeva/dc-1/tomcat2/conf/server.xml

  • Port No. => 8080
  • Shutdown port no. => 8005
  • comment out Java AJP connector

Tomcat 3:  /Users/jeeva/dc-1/tomcat3/conf/server.xml

  • Port No. => 9090
  • Shutdown port no. => 9005
  • comment out Java AJP connector

Step 8

Start the Tomcat(s) servers one by one and then access any of the tomcat server for e.g. http://localhost:8080/solr; and be surprised to see the page similar to one shown below

Starting Tomcat(s)

Screenshot – Solr Admin UI without Cores:

Solr Admin UI without Cores

Solr Admin UI without Cores

Now we are ready use Collections and CoreAdmin API’s to create our SolrCloud Collection.


Creating Collection, Shard(s), Replica(s) in SolrCloud

Let’s make use of Solr Collections and CoreAdmin API’s to create collection, shard(s), Replica(s) and replication factor.  These handy API lets you control solr core for replica’s on specific Solr node.

Creation a Collection ‘europe-collection’ and passing following parameters

  • action => CREATE
  • collection name => europe-collection
  • Number of Shards => 3
  • Replicator Factor => 3 (no. of document copies in the collection)
  • maxShardsPerNode => 3 (Since Solr v4.2, thanks to Ariel Lieberman)

Keenly notice to observe that we are not providing a collection.configName param because we have already linked the Solr configuration with Collection.

Above command creates collection, shard in each SolrCloud node and Solr core in each Shard.

Creating Shard replica’s and distributing into 3 Solr Node(s) as per above design

We are specifically mentioning Shard Name and choosing particular Solr node for creating a replica

Shard 1 Replica’s –

Shard 2 Replica’s

Shard 3 Replica’s

We have achieved SolrCloud replica distribution in a fashion as shown in following diagram

Now take a look at following solr.xml, and the newly created solr core configuration persisted in that.


Let’s Perform few documents indexing

Let us use the exampledocs shipped in Solr 4 artifcats for indexing and we will use each solr node we have created above.

Querying an indexed document by hitting the following URL in the browser and the result would display 5 documents.


Exploring Newly Created SolrCloud Cluster Availability

As per above design we have evenly distributed Shard and it’s replica among 3 Solr Node(s).  Hence we can expect good availability; because we have deployed the shard replica’s across Solr Node(s). Design used in this article is kind of startup; and this can be scaled further to deploy high availability clusters.  Let’s test it.

Before checking the availability verify the number of documents in the europe-collection =>  http://localhost:9090/solr/europe-collection/select?q=*:*

And this results in total 5 documents in number.

Stop/Kill Tomcat 1 (running on port 7070) – Solr Node 1

Query the Solr: http://localhost:9090/solr/europe-collection/select?q=*:*

This will provide 5 documents as search results.

Stop/Kill Tomcat 2 (running on port 8080) – Solr Node 2

Query the Solr: http://localhost:9090/solr/europe-collection/select?q=*:*

This will still provide 5 documents as search results.

Take a look at Graph in the Solr Node 3 (http://localhost:9090/solr/#/~cloud), and find the following representation of cluster.  Replica becomes leader and takes care of user queries.

solr-cluster-availability-check

Test it again with some indexing some more documents in Solr which is up and running, start the Solr Node 1 & Node 2 it will get sync with Node 3 in Recovery process of Solr cluster.


Your Journey Starts Here

I hope this article gives an idea & insight of deploying your SolrCloud Cluster (Single Collection) customized for your needs.  All the best!

Please leave a comment if you have any queries 🙂