ZooKeeper Cluster (Multi-Server) Setup

ZooKeeper is a Distributed Coordination Service for Distributed Applications.  ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchal namespace which is organized similarly to a standard file system.  The name space consists of data registers – called znodes, in ZooKeeper parlance – and these are similar to files and directories, unlike a typical file system.  ZooKeeper runs in Java and has bindings for both Java and C.


ZooKeeper Cluster – Terminology

ZooKeeper Service is replicated over a sets of hosts called an ensemble.  A replicated group of servers in the same application is called a quorum.  All servers in the quorum have copies of the same configuration file.  QuorumPeers will form a ZooKeeper ensemble.  Zookeeper requires a majority, it’s recommended to use an odd number of machines/servers. For example: Five machines ZooKeeper can handle the failure of two machines.


Designing ZooKeeper Deployment

Designing, yes – few things we have to be clear enough before we begin the deployment of ZooKeeper, let’s have answers for following questions-

  • Identify # of ZooKeeper Server planned to deploy odd numbers are best?
  • Identify # of Physical machine Box will be participating in deployment?
  • Prepare ZooKeeper port #’s for deployment
    • Client port #
    • Quorum port #
    • Leader election port #
  • Where to put Data directory dataDir for ZooKeeper Server?
  • Where to put Data Log directory dataLogDir for ZooKeeper Server?
  • Memory allocation for ZooKeeper?

Deployment Diagram of this Article:

Zookeeper Cluster (Multi-Server) - Deployment Diagram

Of course, let’s have answers for this article, shall we:

Q1: Identify # of ZooKeeper Server planned to deploy (odd numbers are best)?
Answer: Planing to deploy 5 ZooKeeper servers
Q2: Identify # of Physical machine (Box) will be participating in deployment?
Answer: This uses the one physical box for ZooKeeper deployment
Q3: Prepare ZooKeeper port #'s for deployment?

Answer: As I said above for this article 5 ZooKeeper servers on one Box, so I have to choose unique port #’s for each.  Ensure selected port #’s are open in your Box, ZooKeeper uses TCP to communicate each other.

Note: Plan out appropriately on port #’s, otherwise ZooKeeper may not communicate each other.  It may delay a deployment success!

Q4: Where to put Data directory (dataDir) for ZooKeeper Server?

Answer: I’m planning to place it in /Users/jeeva/zookeeper/data and this directory as data home

Note: Data directory is one of the performance factor, see Point #3 at Performance & Availability Considerations

Q5: Where to put Data Log directory (dataLogDir) for ZooKeeper Server?

Answer: I’m planning to place it in /Users/jeeva/zookeeper/log and this directory as log home

Note: Log directory is one of the performance factor, see Point #4 at Performance & Availability Considerations

Q6: Memory allocation for ZooKeeper?

Answer: For demo & article, I’m keeping this heap size to default.  For customization create a file called java.env in the {ZooKeeperHome}/conf/

Note: JVM Heap size significantly contributes to performance factor, see Point #5 at Performance & Availability Considerations

Okay, now we have answers let’s move on.


Deploying ZooKeeper Cluster (Multi-Server) Setup

Let’s begin installation and configuration of ZooKeeper.

Step 1: Directory Structure creation, as decided in the designing section

Let’s take a look above created directory structure-

Okay, looks good!

Step 2: Creating a ZooKeeper Server ID, basically this file reside in the ZooKeeper data directory.  Go on choose your favorite text editor

Step 3: Downloading ZooKeeper Release

Download a ZooKeeper from http://hadoop.apache.org/zookeeper/releases.html; this article utilize the version 3.4.4 of ZooKeeper.  However same principle is applied for other version too.

Step 4: Extract & prepare ZooKeeper for deployment

Once done don’t forget to cleanup the /tmp/zookeeper-3.4.4

Step 5: Preparing ZooKeeper configuration called zoo.cfg at {zk-server-1}/conf/zoo.cfg.  Here I will show you for Server 1 and perform same steps with appropriate values (clientPort, dataDir, dataLogDir) for respective ZooKeeper server.

Place below configuration into it.

Screenshot: zk-server-1 directory structure along with conf/zoo.cfg

Screenshot: zk-server-1 directory structure along with conf/zoo.cfg

Step 6: Configuration ZooKeeper Logger for deployment.  Following are the default values of log4j.properties and it holds dev nature in it; update it as per your environment and need –

Step 7: Once zoo.cfg created for all the server then we can start the ZooKeeper Servers.  Let’s start the zk-server-1

Now, go ahead and start the remaining 4 ZooKeeper server(s).  Tail the zookeeper.out file in the bin directory to see more information.

zkServer.sh supports the following commands:

We will use ‘status’ command to see ZookKeeper Server status:

ZooKeeper CLI Client

ZooKeeper command line interface for handy administration How to Connect ZooKeeper through CLI? and Famous Four letter commands for it.


Integrating ZooKeeper Cluster

Typically, ZooKeeper enabled application will be able to connect right away.  To integrate the ZooKeeper with application, all you need know is ‘all the ZooKeeper server(s)’-

  • host-address/host-ip
  • port-no

For an example: SolrCloud, Elasticsearch, etc


Maintenance – Basic Elements

Typically two things to take care in ZooKeeper Server and also this is trick part.

  • Data directory Cleanup
  • Debug Log Cleanup (log4j)

Why I call maintenance is a tricky part? Let me describe – ZooKeeper provides the autopurge configuration like autopurge.snapRetainCount and autopurge.purgeInterval.  However, in real-time scenario’s maintenance activities will differ organization to organization.  Basically it depends on Organization IT policy and Business Requirements.

Please have a look on ZooKeeper Maintenance and plan yours!


Performance & Availability Considerations

  • As long as a majority of the ensemble are up, the ZooKeeper service will be available.  Because It requires a majority, it is best to use an odd number of machines.  For example, with four machines ZooKeeper can only handle the failure of a single machine; if two machines fail, the remaining two machines do not constitute a majority.   However, with five machines ZooKeeper can handle the failure of two machines
  • It’s critical that you run ZooKeeper under supervision, since Zookeeper is fail-fast and will exit the process if it encounters any error case.  See here for more details.
  • The ZooKeeper Data Directory contains files which are a persistent copy of the znodes stored by a particular serving ensemble.  It’s snapshot files.  As changes are made to the znodes these changes are appended to a transaction log, occasionally, when a log grows large, a snapshot of the current state of all znodes will be written to the filesystem.  This snapshot supercedes all previous logs
  • ZooKeeper’s transaction log must be on a dedicated device. (A dedicated partition is not enough.) ZooKeeper writes the log sequentially, without seeking Sharing your log device with other processes can cause seeks and contention, which in turn can cause multi-second delays
  • Do not put ZooKeeper in a situation that can cause a swap.  In order for ZooKeeper to function with any sort of timeliness, it simply cannot be allowed to swap.  Therefore, make certain that the maximum heap size given to ZooKeeper is not bigger than the amount of real memory available to ZooKeeper. For more on this, see Things to Avoid

References

http://zookeeper.apache.org/doc/r3.3.4/zookeeperOver.html
http://zookeeper.apache.org/doc/r3.3.4/zookeeperStarted.html
http://zookeeper.apache.org/doc/r3.3.4/zookeeperAdmin.html

  • Sigehere

    Hi friend,

    It’s very good artical, i have followed your documentation/artical but i got follwoing error can you tell me what is the problem in my configuration.

    2013-02-01 18:55:14,285 [myid:] – INFO [main:[email protected]] – Reading configuration from: conf/zoo.cfg

    2013-02-01 18:55:14,297 [myid:] – INFO [main:[email protected]] – Defaulting to majority quorums

    2013-02-01 18:55:14,301 [myid:1] – INFO [main:[email protected]] – autopurge.snapRetainCount set to 3

    2013-02-01 18:55:14,301 [myid:1] – INFO [main:[email protected]] – autopurge.purgeInterval set to 0

    2013-02-01 18:55:14,302 [myid:1] – INFO [main:[email protected]] – Purge task is not scheduled.

    2013-02-01 18:55:14,352 [myid:1] – INFO [main:[email protected]] – Starting quorum peer

    2013-02-01 18:55:14,369 [myid:1] – INFO [main:[email protected]] – binding to port 0.0.0.0/0.0.0.0:2181

    2013-02-01 18:55:14,386 [myid:1] – INFO [main:[email protected]] – tickTime set to 2000

    2013-02-01 18:55:14,386 [myid:1] – INFO [main:[email protected]] – minSessionTimeout set to -1

    2013-02-01 18:55:14,386 [myid:1] – INFO [main:[email protected]] – maxSessionTimeout set to -1

    2013-02-01 18:55:14,387 [myid:1] – INFO [main:[email protected]] – initLimit set to 5

    2013-02-01 18:55:14,402 [myid:1] – INFO [main:[email protected]] – Reading snapshot /home/hduser/zookeeperdata1/version-2/snapshot.300000006

    2013-02-01 18:55:14,479 [myid:1] – INFO [Thread-1:[email protected]] – My election bind port: 0.0.0.0/0.0.0.0:3888

    2013-02-01 18:55:14,493 [myid:1] – INFO [QuorumPeer[myid=1]/0.0.0.0:2181:[email protected]] – LOOKING

    2013-02-01 18:55:14,495 [myid:1] – INFO [QuorumPeer[myid=1]/0.0.0.0:2181:[email protected]] – New election. My id = 1, proposed zxid=0x4000001cf

    2013-02-01 18:55:14,498 [myid:1] – INFO [WorkerReceiver[myid=1]:[email protected]] – Notification: 1 (n.leader), 0x4000001cf (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x4 (n.peerEPoch), LOOKING (my state)

    2013-02-01 18:55:14,501 [myid:1] – WARN [WorkerSender[myid=1]:[email protected]] – Cannot open channel to 2 at election address localhost/127.0.0.1:3889

    java.net.ConnectException: Connection refused

    at java.net.PlainSocketImpl.socketConnect(Native Method)

    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)

    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)

    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)

    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)

    at java.net.Socket.connect(Socket.java:546)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)

    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)

    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365)

    at java.lang.Thread.run(Thread.java:679)

    2013-02-01 18:55:14,502 [myid:1] – WARN [WorkerSender[myid=1]:[email protected]] – Cannot open channel to 3 at election address localhost/127.0.0.1:3890

    java.net.ConnectException: Connection refused

    at java.net.PlainSocketImpl.socketConnect(Native Method)

    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)

    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)

    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)

    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)

    at java.net.Socket.connect(Socket.java:546)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)

    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)

    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365)

    at java.lang.Thread.run(Thread.java:679)

    2013-02-01 18:55:14,700 [myid:1] – WARN [QuorumPeer[myid=1]/0.0.0.0:2181:[email protected]] – Cannot open channel to 2 at election address localhost/127.0.0.1:3889

    java.net.ConnectException: Connection refused

    at java.net.PlainSocketImpl.socketConnect(Native Method)

    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)

    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)

    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)

    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)

    at java.net.Socket.connect(Socket.java:546)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:388)

    at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:765)

    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)

    2013-02-01 18:55:14,701 [myid:1] – WARN [QuorumPeer[myid=1]/0.0.0.0:2181:[email protected]] – Cannot open channel to 3 at election address localhost/127.0.0.1:3890

    java.net.ConnectException: Connection refused

    at java.net.PlainSocketImpl.socketConnect(Native Method)

    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)

    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)

    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)

    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)

    at java.net.Socket.connect(Socket.java:546)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:388)

    at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:765)

    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)

    2013-02-01 18:55:14,702 [myid:1] – INFO [QuorumPeer[myid=1]/0.0.0.0:2181:[email protected]] – Notification time out: 400

    2013-02-01 18:55:15,103 [myid:1] – WARN [QuorumPeer[myid=1]/0.0.0.0:2181:[email protected]] – Cannot open channel to 2 at election address localhost/127.0.0.1:3889

    java.net.ConnectException: Connection refused

    at java.net.PlainSocketImpl.socketConnect(Native Method)

    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)

    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)

    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)

    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)

    at java.net.Socket.connect(Socket.java:546)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:388)

    at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:765)

    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)

    2013-02-01 18:55:15,104 [myid:1] – WARN [QuorumPeer[myid=1]/0.0.0.0:2181:[email protected]] – Cannot open channel to 3 at election address localhost/127.0.0.1:3890

    java.net.ConnectException: Connection refused

    at java.net.PlainSocketImpl.socketConnect(Native Method)

    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)

    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)

    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)

    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)

    at java.net.Socket.connect(Socket.java:546)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:388)

    at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:765)

    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)

    2013-02-01 18:55:15,105 [myid:1] – INFO [QuorumPeer[myid=1]/0.0.0.0:2181:[email protected]] – Notification time out: 800

    2013-02-01 18:55:15,907 [myid:1] – WARN [QuorumPeer[myid=1]/0.0.0.0:2181:[email protected]] – Cannot open channel to 2 at election address localhost/127.0.0.1:3889

    java.net.ConnectException: Connection refused

    at java.net.PlainSocketImpl.socketConnect(Native Method)

    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)

    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)

    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)

    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)

    at java.net.Socket.connect(Socket.java:546)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:388)

    at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:765)

    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)

    2013-02-01 18:55:15,908 [myid:1] – WARN [QuorumPeer[myid=1]/0.0.0.0:2181:[email protected]] – Cannot open channel to 3 at election address localhost/127.0.0.1:3890

    java.net.ConnectException: Connection refused

    at java.net.PlainSocketImpl.socketConnect(Native Method)

    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)

    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)

    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)

    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)

    at java.net.Socket.connect(Socket.java:546)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:388)

    at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:765)

    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)

    2013-02-01 18:55:15,909 [myid:1] – INFO [QuorumPeer[myid=1]/0.0.0.0:2181:[email protected]] – Notification time out: 1600

    2013-02-01 18:55:17,511 [myid:1] – WARN [QuorumPeer[myid=1]/0.0.0.0:2181:[email protected]] – Cannot open channel to 2 at election address localhost/127.0.0.1:3889

    java.net.ConnectException: Connection refused

    at java.net.PlainSocketImpl.socketConnect(Native Method)

    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)

    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)

    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)

    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)

    at java.net.Socket.connect(Socket.java:546)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:388)

    at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:765)

    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)

    • Hello Sigehere –

      From exception snippet provided above, I have observed; you have configured 3 instance of ZooKeeper. So you have started the ZooKeeper instance id:1, it’s awaiting to connect to other 2 ZooKeeper instance to form ZooKeeper Quorum. Start your 3 ZooKeeper instances(id:1, id:2, id:3) one by one.

      Cheers,
      Jeeva

      • rajesh

        Hi Jeeva,

        I’ve configured 3 EC2 instances with the configuration mentioned by with Zookpeer 3.4.5. When I try to execute following command, I am getting the error below:
        ./zkcli.sh
        -cmd
        upconfig -zkhost
        172.31.11.101:2821,172.31.6.40:2821,172.31.6.165:2821
        -confdir /home/ubuntu/solr-4.7.2/example/multicore/core1/conf/
        -confname
        keeptraxapp

        Stack Trace:

        INFO – 2014-05-27 07:38:31.892; org.apache.zookeeper.Environment; Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT

        INFO – 2014-05-27 07:38:31.894; org.apache.zookeeper.Environment; Client environment:host.name=production2

        INFO – 2014-05-27 07:38:31.896; org.apache.zookeeper.Environment; Client environment:java.version=1.6.0_45

        INFO – 2014-05-27 07:38:31.897; org.apache.zookeeper.Environment; Client environment:java.vendor=Sun Microsystems Inc.

        INFO – 2014-05-27 07:38:31.897; org.apache.zookeeper.Environment; Client environment:java.home=/usr/lib/jvm/java-6-oracle/jre

        INFO – 2014-05-27 07:38:31.897; org.apache.zookeeper.Environment; Client environment:java.class.path=./../../solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-kuromoji-4.7.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/hadoop-auth-2.2.0.jar:./../../solr-webapp/webapp/WEB-INF/lib/lucene-misc-4.7.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/wstx-asl-3.2.7.jar:./../../solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-phonetic-4.7.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/commons-fileupload-1.2.1.jar:./../../solr-webapp/webapp/WEB-INF/lib/commons-codec-1.7.jar:./../../solr-webapp/webapp/WEB-INF/lib/dom4j-1.6.1.jar:./../../solr-webapp/webapp/WEB-INF/lib/lucene-grouping-4.7.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/noggit-0.5.jar:./../../solr-webapp/webapp/WEB-INF/lib/lucene-expressions-4.7.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/lucene-queries-4.7.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/lucene-join-4.7.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/hppc-0.5.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/lucene-queryparser-4.7.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/zookeeper-3.4.5.jar:./../../solr-webapp/webapp/WEB-INF/lib/solr-solrj-4.7.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/commons-cli-1.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/commons-lang-2.6.jar:./../../solr-webapp/webapp/WEB-INF/lib/joda-time-2.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/httpclient-4.3.1.jar:./../../solr-webapp/webapp/WEB-INF/lib/hadoop-common-2.2.0.jar:./../../solr-webapp/webapp/WEB-INF/lib/solr-core-4.7.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/commons-configuration-1.6.jar:./../../solr-webapp/webapp/WEB-INF/lib/httpmime-4.3.1.jar:./../../solr-webapp/webapp/WEB-INF/lib/hadoop-annotations-2.2.0.jar:./../../solr-webapp/webapp/WEB-INF/lib/spatial4j-0.4.1.jar:./../../solr-webapp/webapp/WEB-INF/lib/org.restlet-2.1.1.jar:./../../solr-webapp/webapp/WEB-INF/lib/hadoop-hdfs-2.2.0.jar:./../../solr-webapp/webapp/WEB-INF/lib/commons-io-2.1.jar:./../../solr-webapp/webapp/WEB-INF/lib/lucene-spatial-4.7.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/lucene-suggest-4.7.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/antlr-runtime-3.5.jar:./../../solr-webapp/webapp/WEB-INF/lib/concurrentlinkedhashmap-lru-1.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/org.restlet.ext.servlet-2.1.1.jar:./../../solr-webapp/webapp/WEB-INF/lib/lucene-codecs-4.7.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/lucene-highlighter-4.7.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/asm-4.1.jar:./../../solr-webapp/webapp/WEB-INF/lib/asm-commons-4.1.jar:./../../solr-webapp/webapp/WEB-INF/lib/protobuf-java-2.5.0.jar:./../../solr-webapp/webapp/WEB-INF/lib/guava-14.0.1.jar:./../../solr-webapp/webapp/WEB-INF/lib/httpcore-4.3.jar:./../../solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.7.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/lucene-memory-4.7.2.jar:./../../solr-webapp/webapp/WEB-INF/lib/lucene-core-4.7.2.jar:./../../lib/ext/jcl-over-slf4j-1.6.6.jar:./../../lib/ext/jul-to-slf4j-1.6.6.jar:./../../lib/ext/slf4j-api-1.6.6.jar:./../../lib/ext/slf4j-log4j12-1.6.6.jar:./../../lib/ext/log4j-1.2.16.jar

        INFO – 2014-05-27 07:38:31.897; org.apache.zookeeper.Environment; Client environment:java.library.path=/usr/lib/jvm/java-6-oracle/jre/lib/amd64/server:/usr/lib/jvm/java-6-oracle/jre/lib/amd64:/usr/lib/jvm/java-6-oracle/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib

        INFO – 2014-05-27 07:38:31.897; org.apache.zookeeper.Environment; Client environment:java.io.tmpdir=/tmp

        INFO – 2014-05-27 07:38:31.897; org.apache.zookeeper.Environment; Client environment:java.compiler=

        INFO – 2014-05-27 07:38:31.897; org.apache.zookeeper.Environment; Client environment:os.name=Linux

        INFO – 2014-05-27 07:38:31.898; org.apache.zookeeper.Environment; Client environment:os.arch=amd64

        INFO – 2014-05-27 07:38:31.898; org.apache.zookeeper.Environment; Client environment:os.version=3.13.0-24-generic

        INFO – 2014-05-27 07:38:31.898; org.apache.zookeeper.Environment; Client environment:user.name=ubuntu

        INFO – 2014-05-27 07:38:31.898; org.apache.zookeeper.Environment; Client environment:user.home=/home/ubuntu

        INFO – 2014-05-27 07:38:31.898; org.apache.zookeeper.Environment; Client environment:user.dir=/home/ubuntu/solr-4.7.2/example/scripts/cloud-scripts

        INFO – 2014-05-27 07:38:31.899; org.apache.zookeeper.ZooKeeper; Initiating client connection, connectString=localhost:2821 sessionTimeout=30000 [email protected]

        INFO – 2014-05-27 07:38:31.928; org.apache.solr.common.cloud.ConnectionManager; Waiting for client to connect to ZooKeeper

        INFO – 2014-05-27 07:38:31.933; org.apache.zookeeper.ClientCnxn$SendThread; Opening socket connection to server 127.0.0.1/127.0.0.1:2821. Will not attempt to authenticate using SASL (Unable to locate a login configuration)

        WARN – 2014-05-27 07:38:31.938; org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

        java.net.ConnectException: Connection refused

        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)

        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)

        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

        INFO – 2014-05-27 07:38:32.041; org.apache.zookeeper.ClientCnxn$SendThread; Opening socket connection to server ip6-localhost/0:0:0:0:0:0:0:1:2821. Will not attempt to authenticate using SASL (Unable to locate a login configuration)

        WARN – 2014-05-27 07:38:32.041; org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

        java.net.ConnectException: Connection refused

        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)

        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)

        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

        INFO – 2014-05-27 07:38:33.142; org.apache.zookeeper.ClientCnxn$SendThread; Opening socket connection to server 127.0.0.1/127.0.0.1:2821. Will not attempt to authenticate using SASL (Unable to locate a login configuration)

        WARN – 2014-05-27 07:38:33.143; org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

        Appreciate your help.
        Thanks
        Rajesh

  • Thanks very nice blog!

  • Vitalie Mudrenco

    Great article,

    I have one issue, I have followed the instructions step by step and created the same 5 instances, but when I start the second zookeeper and others I get errors in logs

    ——————

    2013-11-11 16:26:34,792 [myid:1] – WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2185:[email protected]] – Cannot open channel to 2 at election address localhost/127.0.0.1:3889

    java.net.ConnectException: Connection refused

    at java.net.PlainSocketImpl.socketConnect(Native Method)

    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)

    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)

    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)

    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)

    at java.net.Socket.connect(Socket.java:579)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)

    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:388)

    at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:765)

    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)

    ———————————-

    The ports are correct, I have checked them many times, I do this on my mac mini, then dir structure us the same. Can you please suggest me how to make it working?

    • @vitaliemudrenco:disqus – As per article follow above exception is normal, not to worry. Since we are setting up 5 servers so minimum 3 zookeeper server required to form a successful Quorum peers.

      Exception will stop once you start third zookeeper sever. Try and let me know.

      Cheers,
      Jeeva

      • Vitalie Mudrenco

        No, I get the same issue, I started all 5. The same. Recreated all dir structure from scratch again and changed the port numbers too from 2888 to 8888 and from 3888 to 6888, started all 5 servers, the same issue.

        2013-11-11 20:22:57,177 [myid:] – INFO [main:[email protected]] – Reading configuration from: /Users/xvitcoder/ds-1/zookeeper/zk5/zk-server/bin/../conf/zoo.cfg

        2013-11-11 20:22:57,181 [myid:] – INFO [main:[email protected]] – Defaulting to majority quorums

        2013-11-11 20:22:57,185 [myid:1] – INFO [main:[email protected]] – autopurge.snapRetainCount set to 3

        2013-11-11 20:22:57,186 [myid:1] – INFO [main:[email protected]] – autopurge.purgeInterval set to 0

        2013-11-11 20:22:57,186 [myid:1] – INFO [main:[email protected]] – Purge task is not scheduled.

        2013-11-11 20:22:57,197 [myid:1] – INFO [main:[email protected]] – Starting quorum peer

        2013-11-11 20:22:57,228 [myid:1] – INFO [main:[email protected]] – binding to port 0.0.0.0/0.0.0.0:2185

        2013-11-11 20:22:57,249 [myid:1] – INFO [main:[email protected]] – tickTime set to 2000

        2013-11-11 20:22:57,250 [myid:1] – INFO [main:[email protected]] – minSessionTimeout set to -1

        2013-11-11 20:22:57,250 [myid:1] – INFO [main:[email protected]] – maxSessionTimeout set to -1

        2013-11-11 20:22:57,250 [myid:1] – INFO [main:[email protected]] – initLimit set to 10

        2013-11-11 20:22:57,265 [myid:1] – INFO [main:[email protected]] – acceptedEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation

        2013-11-11 20:22:57,278 [myid:1] – INFO [Thread-3:[email protected]] – My election bind port: 0.0.0.0/0.0.0.0:9888

        2013-11-11 20:22:57,279 [myid:1] – ERROR [localhost/127.0.0.1:9888:[email protected]] – Exception while listening

        java.net.BindException: Address already in use

        at java.net.PlainSocketImpl.socketBind(Native Method)

        at java.net.PlainSocketImpl.socketBind(PlainSocketImpl.java:521)

        at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:414)

        at java.net.ServerSocket.bind(ServerSocket.java:326)

        at java.net.ServerSocket.bind(ServerSocket.java:284)

        at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:489)

        2013-11-11 20:22:57,287 [myid:1] – INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0%0:2185:[email protected]] – LOOKING

        2013-11-11 20:22:57,289 [myid:1] – INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0%0:2185:[email protected]] – New election. My id = 1, proposed zxid=0x0

        2013-11-11 20:22:57,290 [myid:1] – INFO [WorkerReceiver[myid=1]:[email protected]] – Notification: 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)

        2013-11-11 20:22:57,294 [myid:1] – WARN [WorkerSender[myid=1]:[email protected]] – Cannot open channel to 2 at election address localhost/127.0.0.1:9889

        java.net.ConnectException: Connection refused

        at java.net.PlainSocketImpl.socketConnect(Native Method)

        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:382)

        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:241)

        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:228)

        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431)

        at java.net.Socket.connect(Socket.java:527)

        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)

        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)

        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)

        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365)

        at java.lang.Thread.run(Thread.java:695)

        Can’t make it working

        • Hmm, still port already in use and connection refused is wired. I’m in IST time zone. Going to bed soon.

          Let’s do one thing, tomorrow after my office hours we can have screen share to sort out your issue. Will drop a comment here once I reach home.

          Sounds good?

          Cheers,
          Jeeva

          • Vitalie Mudrenco

            Yes would be great. My timezone is UTC+02:00, let me know when you have free time.

  • Rashmi Patel

    This is the great article.

    But i have one issue when i am trying to use host of different system
    server.1=localhost:2888:3888
    server.2=localhost:2889:3889
    server.3=localhost:2890:3890
    server.4=localhost:2891:3891
    server.5=localhost:2892:3892
    For the localhost it is working fine.

    But when i am using host of different system like localhost has ip 192.168.1.208,Rajkumar_Kunta has ip 192.168.10.15 and Sushant_Daware has ip 192.168.1.166.

    server.1=localhost:2888:3888
    server.2=Rajkumar_Kunta:2888:3888
    server.3=Sushant_Daware:2888:3888

    So when i will start all the three zookeeper server at that rime it give me blow error when it trying to connect with Rajkumar_Kunta, Sushant_Daware hosts

    2014-05-13 13:44:33,616 [myid:3] – WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:218
    3:[email protected]] – Cannot open channel to 2 at election address Rajkumar_
    Kunta/192.168.10.15:3888
    java.net.SocketTimeoutException: connect timed out
    at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
    at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
    at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
    at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
    at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
    at java.net.PlainSocketImpl.connect(Unknown Source)
    at java.net.SocksSocketImpl.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(Quorum
    CnxManager.java:368)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(Quorum
    CnxManager.java:402)
    at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(F
    astLeaderElection.java:840)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762
    )

    Can you please help me to resolve the issue?

    • @disqus_DUyo7hZeIU:disqus As per exception, possibilities are:

      1. Till the time you start the all severs in the ZooKeeper quorum, you will get exceptions like above (in your case 3 servers)

      2. Mapping of IP address against DNS name [in your case this is not a problem, since its obtaining Rajkumar_Kunta/192.168.10.15:3888 properly]
      If happens; resolution is map the following in hosts file: Rajkumar_Kunta 192.168.10.15

      3. Used Port number(s) are not open, so you may have to open it from firewall

      Try and let me know!

      Cheers,
      Jeeva

      • Rashmi Patel

        Hello Jeeva,

        Thanks for the quick reply.

        1) My hosts file is like below

        192.168.1.208 localhost
        192.168.10.15 Rajkumar_Kunta
        192.168.1.166 Sushant_Daware

        Is is fine?

        2) I have told my networking team to open a port from firewall but still i am getting the error. So can you please tell me How can i open port from firewall? So it will help me.

        Thanks,
        Rashmi

        • Rashmi Patel Hosts file look okay.

          Let’s come to open port check. Whenever ZooKeeper is running on some Box, you check connection from another machine/Box like below:
          $ telnet xxx.xxx.xxx.xxx 2888

          Once you receive response as “Connected to xxx.xxx.xxx.xxx.”, ignore rest of the printed text.

          It means no port issue!

          Can you please check, each ZooKeeper server ‘myid’ file has unique value?

          Cheers,
          Jeeva

  • @disqus_5Aev4seDgA:disqus based on the stack trace, it seems ZooKeeper is not running or incorrect ZooKeeper server URLs used.

    Cheers,
    Jeeva

    • rajesh

      Jeeva,
      Thanks for your time.
      Figured out that the port number I was using to connect was incorrect.
      I was using 2821 instead of 2181.

      Thanks again
      Rajesh

      • rajesh

        Jeeva,

        As of now I’ve not made any memory optimizations to solr.
        Do you have any suggestions on memory optimizations for production setup ?

        • @disqus_5Aev4seDgA:disqus recommended memory allocation is, assuming dedicated Virtual machine/physical box with 4 GB RAM with *nix OS. Distribute as follows:
          30% of memory for Operating System
          70% of memory for Solr instance(s)

          Then apply JVM flags for optimal use of allocated heap size. As per stacktrace it seems you’re using Java 6. I would recommend, go for Java 7 with following JVM params
          -XX:+UseParallelGC -XX:+UseG1GC

          besides typical params like (sample)
          -server -Xms128m -Xmx2048m -XX:PermSize=64m -XX:MaxPermSize=128m

          Cheers,
          Jeeva

          • atp

            Hi Jeeva,

            i followed all the steps and installed configured different servers , but getting error like , but when i restart zookeeper in all servers its saying ,

            “Starting zookeeper … already running as process 8538.”

            but in log file its shows , like below

            2014-06-05 17:09:15,960 [myid:2] – INFO [QuorumPeer[myid=2]/0.0.0.0:2181:[email protected]] – Notification time out: 6400
            2014-06-05 17:09:22,362 [myid:2] – WARN [QuorumPeer[myid=2]/0.0.0.0:2181:[email protected]] – Cannot open channel to 1 at election address Hadoop-Main/10.133.45.245:3888
            java.net.NoRouteToHostException: No route to host
            at java.net.PlainSocketImpl.socketConnect(Native Method)
            at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
            at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
            at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
            at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
            at java.net.Socket.connect(Socket.java:529)
            at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
            at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
            at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
            at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
            2014-06-05 17:09:22,363 [myid:2] – INFO [QuorumPeer[myid=2]/0.0.0.0:2181:[email protected]] – Notification time out: 12800

            But while try to start the zookeeper again in all servers
            its saying

            [[email protected] bin]# ./zkServer.sh start

            JMX enabled by default

            Using config:
            /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg

            Starting zookeeper … already running as process 8538.

            can you please help to resolve this issue, ?i have created myid under data directory in all the servers with unique value but no luck to resolve this error, pinging also working fine

          • @disqus_Do20lw5Uof:disqus did you do grep and ensure no zookeeper instances are not running?

            Cheers,
            Jeeva

          • atp

            Jeeva ,

            grep showing

            root 29881 0.0 0.3 4336580 46440 pts/1 Sl 10:58 0:00 /usr/java/jdk1.6.0_35//bin/java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /opt/zookeeper/zookeeper-3.4.6/bin/../build/classes:/opt/zookeeper/zookeeper-3.4.6/bin/../build/lib/*.jar:/opt/zookeeper/zookeeper-3.4.6/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/zookeeper/zookeeper-3.4.6/bin/../lib/slf4j-api-1.6.1.jar:/opt/zookeeper/zookeeper-3.4.6/bin/../lib/netty-3.7.0.Final.jar:/opt/zookeeper/zookeeper-3.4.6/bin/../lib/log4j-1.2.16.jar:/opt/zookeeper/zookeeper-3.4.6/bin/../lib/jline-0.9.94.jar:/opt/zookeeper/zookeeper-3.4.6/bin/../zookeeper-3.4.6.jar:/opt/zookeeper/zookeeper-3.4.6/bin/../src/java/lib/*.jar:/opt/zookeeper/zookeeper-3.4.6/bin/../conf: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg

            but while cheking log file its shows ,

            2014-06-10 11:21:16,889 [myid:1] – INFO [QuorumPeer[myid=1]/0.0.0.0:2184:[email protected]] – Notification time out: 60000
            2014-06-10 11:22:16,891 [myid:1] – WARN [QuorumPeer[myid=1]/0.0.0.0:2184:[email protected]] – Cannot open channel to 2 at election address Hadoop-Node1/10.133.45.245:3888
            java.net.NoRouteToHostException: No route to host

            is this will make any problem ?

            also

            while trying status is shows ,

            [Main bin]# ./zkServer.sh status
            JMX enabled by default
            Using config: /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
            Error contacting service. It is probably not running.

            Please help to resove this.
            Thanks

          • atp

            Thanks Jeeva, while checking its shows ,

            root 29881 0.0 0.5 4336580 63704 pts/1 Sl 10:58 0:01 /usr/java/jdk1.6.0_35//bin/java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /opt/zookeeper/zookeeper-3.4.6/bin/../build/classes:/opt/zookeeper/zookeeper-3.4.6/bin/../build/lib/*.jar:/opt/zookeeper/zookeeper-3.4.6/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/zookeeper/zookeeper-3.4.6/bin/../lib/slf4j-api-1.6.1.jar:/opt/zookeeper/zookeeper-3.4.6/bin/../lib/netty-3.7.0.Final.jar:/opt/zookeeper/zookeeper-3.4.6/bin/../lib/log4j-1.2.16.jar:/opt/zookeeper/zookeeper-3.4.6/bin/../lib/jline-0.9.94.jar:/opt/zookeeper/zookeeper-3.4.6/bin/../zookeeper-3.4.6.jar:/opt/zookeeper/zookeeper-3.4.6/bin/../src/java/lib/*.jar:/opt/zookeeper/zookeeper-3.4.6/bin/../conf: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg

            but if we execute ./zkServer.sh status.

            JMX enabled by default
            Using config: /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
            Error contacting service. It is probably not running.

            and if we try to execute start command again its shows .

            JMX enabled by default
            Using config: /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
            Starting zookeeper … already running as process 29881.

            Please help on this to resolve .
            Thanks.

  • Saurabh Guru

    Hi Jeeva,

    I have recently been experiencing that the size of the Data logs has increased quite a bit, in GBs and the auto purge isn’t cleaning them up, even though there are more than threshold 3 there. Any idea? Anyway, we could control the size of this log?

    • @saurabhguru:disqus I believe you have data dir and data log dir is separated out. ZooKeeper doesn’t provide option to control the size of the log. It only provides option for number of log files to persist and cleanup rest.

      Try to pitch in on log files and see ‘anything goes wrong while cleanup triggered by zookeeper’. Possibilities are ‘permission issue’ or ‘file might be in use’.

      Cheers,
      Jeeva

      • Saurabh Guru

        Thanks for the quick response. Here is my data directory after last purge.

        -rw-rw-r– 1 storm storm 449M Jun 6 02:01 log.e5982c
        -rw-rw-r– 1 storm storm 138K Jun 5 22:51 snapshot.e5982a
        -rw-rw-r– 1 storm storm 1.2G Jun 5 22:51 log.e4a86e
        -rw-rw-r– 1 storm storm 139K Jun 5 14:47 snapshot.e4a86c
        -rw-rw-r– 1 storm storm 1.5G Jun 5 14:47 log.e36b8b
        -rw-rw-r– 1 storm storm 138K Jun 5 04:04 snapshot.e36b89
        -rw-rw-r– 1 storm storm 1.7G Jun 5 04:04 log.e1fb87

        There was a 4th log file that it should have cleaned, as the snapRetainCount is set to 3. Although it didn’t. The zookeeper.out log doesn’t show any error. Is there any other place I should be looking? Also, what is the difference between data and data log directory, I guess I have them at the same place.

        • Yes, default value of snapRetainCount is 3.

          what is the value of autopurge.purgeInterval? this is important for clean job,

          For a difference between data and data log directory refer this http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_advancedConfiguration

          • Saurabh Guru

            autopurge.purgeInterval is set to 1. And it runs timely, but did not clean that last directory that you see above. Here is what my conf looks like.

            initLimit=10
            syncLimit=5
            dataDir=/mnt/storm/zookeeper-data
            clientPort=2181
            autopurge.snapRetainCount=3
            autopurge.purgeInterval=1

          • which version of zookeeper you’re using?

          • Saurabh Guru

            3.4.5

          • Hmm no errors/exception and

          • Hmm no errors/exception, you have provided configuration properly and you have 3.4.x above version for this feature.

            Drop a note to zookeeper dev team mailing list ‘[email protected]’, there could environment specific issue or version related bug.

          • Saurabh Guru

            I think it was eventually deleting it later on.. the file might have been in use.. I was getting impacted because I had a very small space to accommodate such big files. After setting the log directory to be on a separate volume which was pretty big, I haven’t notices any issue yet. Things seem okay.

          • @saurabhguru:disqus Glad to hear, good!

            Cheers,
            Jeeva

  • ATP

    Hi Jeeva ,

    We are getting no route to host error in solr configuration setup , could you please let us know whats the cause of this error?

    • @annamalaitpandiyan:disqus Sorry for late response. I was bit occupied with my work. Is this issue resolved?

      Cheers,
      Jeeva

  • Bhaskar

    What’s the need for three different ports for each server?

    I understand that the “client port” is necessary to accept the requests from the clients those try to connect to zookeeper server. But, what is the significance of other two ports (Quorum port and leader election port)?

    • Notes from ZooKeeper documentation: “Finally, note the two port numbers after each server name: ” 2888″ and “3888”. Peers use the former port to connect to other peers. Such a connection is necessary so that peers can communicate, for example, to agree upon the order of updates. More specifically, a ZooKeeper server uses this port to connect followers to the leader. When a new leader arises, a follower opens a TCP connection to the leader using this port. Because the default leader election also uses TCP, we currently require another port for leader election. This is the second port in the server entry.” URL reference – http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html

      Cheers,
      Jeeva

      • Bhaskar

        okay, thanks for the response.

  • Eldad Cohen

    Hi,

    Thanks for a great article.
    ZooKepper ,Is it relevant for simple deployment of an app , three windows machines and one Linux.

    Thanks

  • Pingback: Spring XD Setup | Spring XD Ninja()

  • Landon Campbell

    Jeeva,

    Thanks for the article — it was very helpful.

    I was wondering if you knew how to configure basic authentication correctly for a ZooKeeper/SolrCloud? I followed these instructions (http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-admin-password) to protect my Solr instances, but now the cloud can’t communicate internally — the shards have no leaders. Any idea how to configure this correctly?

    Thanks,
    Landon

    • @landoncampbell:disqus I’m sorry about delayed response, bit occupied with my day job.

      I know how to configure one. Which container are you using for Solr (Jetty, Tomcat, …)?

      Cheers,
      Jeeva

  • Rajesh Kannan S

    Thank you,
    I was looking for an article to set up zookeeper cluster locally for many days,
    Finally got your awesome article. Thank you so much. !! :)

  • Петр Малков

    is it possible to combine dif versions of server. i.e. 3.4.6 and new 3.5.1 for purpose of migration to bigger version by adding new hosts?

  • Петр Малков

    is it possible to run Zookeeper different versions of hosts in cluster?

    i.e. adding 3.5.1 to 3.4.6 for purpose of upgrading

    • @disqus_FZXo0AW0UB:disqus – I think it is feasible, as long as storage format is same with-in the cluster. Kindly check the storage format.

  • Bhas

    Hi Jeeva,

    I have one doubt regarding zookeeper server timeout with habse.

    what are the parameters causing the server timeout? I suspect that there minSessionTimout, maxSessionTimeout and initLimit and set the parameters as

    minSessionTimout = 4,
    maxSessionTimeout = 40,
    initLimit = 10

    But we have been facing the same zookeper server timeout issue in 3 Znode cluster? Do you have any solution to rectify?
    Thanks,
    Bhas

  • venkat

    Hi Jeeva, how to get zookeeper server instances i.e. ip addresses (we ahev 3 instances set up under zookeeper)

  • venkat

    is it possible to get using this command.

    curl -X GET -u admin:admin http://localhost:7180/api/v9/clusters/cluster/services/zookeeper/?????????

  • manohar

    Hi sir, can u provide deploying solr cloud 5.4.1 with zookeeper for multiple windows servers.

  • Klas W

    Thanks for the article! I have two books that explains this but your example was several times better and made me understand setting this up. Kudos!

  • Sasi Kumar

    Its been more than an year of this post, but still the link for Integrating ZooKeeper ensemble with Elasticsearch Cluster – (upcoming article) is missing.

  • Kind of curious, what tool you have used to draw the deployment diagram?