ZooKeeper Cluster (Multi-Server) Setup

ZooKeeper is a Distributed Coordination Service for Distributed Applications.  ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchal namespace which is organized similarly to a standard file system.  The name space consists of data registers – called znodes, in ZooKeeper parlance – and these are similar to files and directories, unlike a typical file system.  ZooKeeper runs in Java and has bindings for both Java and C.


ZooKeeper Cluster – Terminology

ZooKeeper Service is replicated over a sets of hosts called an ensemble.  A replicated group of servers in the same application is called a quorum.  All servers in the quorum have copies of the same configuration file.  QuorumPeers will form a ZooKeeper ensemble.  Zookeeper requires a majority, it’s recommended to use an odd number of machines/servers. For example: Five machines ZooKeeper can handle the failure of two machines.


Designing ZooKeeper Deployment

Designing, yes – few things we have to be clear enough before we begin the deployment of ZooKeeper, let’s have answers for following questions-

  • Identify # of ZooKeeper Server planned to deploy odd numbers are best?
  • Identify # of Physical machine Box will be participating in deployment?
  • Prepare ZooKeeper port #’s for deployment
    • Client port #
    • Quorum port #
    • Leader election port #
  • Where to put Data directory dataDir for ZooKeeper Server?
  • Where to put Data Log directory dataLogDir for ZooKeeper Server?
  • Memory allocation for ZooKeeper?

Deployment Diagram of this Article:

Zookeeper Cluster (Multi-Server) - Deployment Diagram

Of course, let’s have answers for this article, shall we:

Q1: Identify # of ZooKeeper Server planned to deploy (odd numbers are best)?
Answer: Planing to deploy 5 ZooKeeper servers
Q2: Identify # of Physical machine (Box) will be participating in deployment?
Answer: This uses the one physical box for ZooKeeper deployment
Q3: Prepare ZooKeeper port #'s for deployment?

Answer: As I said above for this article 5 ZooKeeper servers on one Box, so I have to choose unique port #’s for each.  Ensure selected port #’s are open in your Box, ZooKeeper uses TCP to communicate each other.

Note: Plan out appropriately on port #’s, otherwise ZooKeeper may not communicate each other.  It may delay a deployment success!

----------------------------------------------------------------
| Server ID | Client Port | Quorum Port | Leader Election Port |
----------------------------------------------------------------
|     1     |    2181     |     2888    |        3888          |
|     2     |    2182     |     2889    |        3889          |
|     3     |    2183     |     2890    |        3890          |
|     4     |    2184     |     2891    |        3891          |
|     5     |    2185     |     2892    |        3892          |
----------------------------------------------------------------

Q4: Where to put Data directory (dataDir) for ZooKeeper Server?

Answer: I’m planning to place it in /Users/jeeva/zookeeper/data and this directory as data home

Note: Data directory is one of the performance factor, see Point #3 at Performance & Availability Considerations

Q5: Where to put Data Log directory (dataLogDir) for ZooKeeper Server?

Answer: I’m planning to place it in /Users/jeeva/zookeeper/log and this directory as log home

Note: Log directory is one of the performance factor, see Point #4 at Performance & Availability Considerations

Q6: Memory allocation for ZooKeeper?

Answer: For demo & article, I’m keeping this heap size to default.  For customization create a file called java.env in the {ZooKeeperHome}/conf/

Note: JVM Heap size significantly contributes to performance factor, see Point #5 at Performance & Availability Considerations

Okay, now we have answers let’s move on.


Deploying ZooKeeper Cluster (Multi-Server) Setup

Let’s begin installation and configuration of ZooKeeper.

Step 1: Directory Structure creation, as decided in the designing section

mac-book-pro:demo jeeva$ mkdir -p /Users/jeeva/zookeeper/zk-server-1 /Users/jeeva/zookeeper/zk-server-2 /Users/jeeva/zookeeper/zk-server-3 /Users/jeeva/zookeeper/zk-server-4 /Users/jeeva/zookeeper/zk-server-5

mac-book-pro:demo jeeva$ mkdir -p /Users/jeeva/zookeeper/data/zk1 /Users/jeeva/zookeeper/data/zk2 /Users/jeeva/zookeeper/data/zk3 /Users/jeeva/zookeeper/data/zk4 /Users/jeeva/zookeeper/data/zk5

mac-book-pro:demo jeeva$ mkdir -p /Users/jeeva/zookeeper/log/zk1 /Users/jeeva/zookeeper/log/zk2 /Users/jeeva/zookeeper/log/zk3 /Users/jeeva/zookeeper/log/zk4 /Users/jeeva/zookeeper/log/zk5

Let’s take a look above created directory structure-

mac-book-pro:demo jeeva$ tree /Users/jeeva/zookeeper

/Users/jeeva/zookeeper
|-data
|---zk1
|---zk2
|---zk3
|---zk4
|---zk5
|-log
|---zk1
|---zk2
|---zk3
|---zk4
|---zk5
|-zk-server-1
|-zk-server-2
|-zk-server-3
|-zk-server-4
|-zk-server-5

mac-book-pro:demo jeeva$

Okay, looks good!

Step 2: Creating a ZooKeeper Server ID, basically this file reside in the ZooKeeper data directory.  Go on choose your favorite text editor

# just enter a value '1' in the file. Save the file, do the same for rest of ZooKeeper
mac-book-pro:demo jeeva$&  vi /Users/jeeva/zookeeper/data/zk1/myid

# follow the same way to fill server id
vi /Users/jeeva/zookeeper/data/zk2/myid
vi /Users/jeeva/zookeeper/data/zk3/myid
vi /Users/jeeva/zookeeper/data/zk4/myid
vi /Users/jeeva/zookeeper/data/zk5/myid

Step 3: Downloading ZooKeeper Release

Download a ZooKeeper from http://hadoop.apache.org/zookeeper/releases.html; this article utilize the version 3.4.4 of ZooKeeper.  However same principle is applied for other version too.

Step 4: Extract & prepare ZooKeeper for deployment

mac-book-pro:demo jeeva$ gzip -dc ~/Downloads/soft/zookeeper-3.4.4.tar.gz | tar -xf - -C /tmp
mac-book-pro:demo jeeva$ cp -r /tmp/zookeeper-3.4.4/* /Users/jeeva/zookeeper/zk-server-1/
mac-book-pro:demo jeeva$ cp -r /tmp/zookeeper-3.4.4/* /Users/jeeva/zookeeper/zk-server-2/
mac-book-pro:demo jeeva$ cp -r /tmp/zookeeper-3.4.4/* /Users/jeeva/zookeeper/zk-server-3/
mac-book-pro:demo jeeva$ cp -r /tmp/zookeeper-3.4.4/* /Users/jeeva/zookeeper/zk-server-4/
mac-book-pro:demo jeeva$ cp -r /tmp/zookeeper-3.4.4/* /Users/jeeva/zookeeper/zk-server-5/

Once done don’t forget to cleanup the /tmp/zookeeper-3.4.4

Step 5: Preparing ZooKeeper configuration called zoo.cfg at {zk-server-1}/conf/zoo.cfg.  Here I will show you for Server 1 and perform same steps with appropriate values (clientPort, dataDir, dataLogDir) for respective ZooKeeper server.

mac-book-pro:demo jeeva$ vi /Users/jeeva/zookeeper/zk-server-1/conf/zoo.cfg

Place below configuration into it.

# The number of milliseconds of each tick
tickTime=2000

# The number of ticks that the initial synchronization phase can take
initLimit=10

# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5

# the directory where the snapshot is stored.
# Choose appropriately for your environment
dataDir=/Users/jeeva/zookeeper/data/zk1

# the port at which the clients will connect
clientPort=2181

# the directory where transaction log is stored.
# this parameter provides dedicated log device for ZooKeeper
dataLogDir=/Users/jeeva/zookeeper/log/zk1

# ZooKeeper server and its port no.
# ZooKeeper ensemble should know about every other machine in the ensemble
# specify server id by creating 'myid' file in the dataDir
# use hostname instead of IP address for convenient maintenance
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890
server.4=localhost:2891:3891
server.5=localhost:2892:3892

Screenshot: zk-server-1 directory structure along with conf/zoo.cfg

Screenshot: zk-server-1 directory structure along with conf/zoo.cfg

Step 6: Configuration ZooKeeper Logger for deployment.  Following are the default values of log4j.properties and it holds dev nature in it; update it as per your environment and need –

zookeeper.root.logger=INFO, CONSOLE
zookeeper.console.threshold=INFO
zookeeper.log.dir=.
zookeeper.log.file=zookeeper.log
zookeeper.log.threshold=DEBUG
zookeeper.tracelog.dir=.
zookeeper.tracelog.file=zookeeper_trace.log

Step 7: Once zoo.cfg created for all the server then we can start the ZooKeeper Servers.  Let’s start the zk-server-1

mac-book-pro:demo jeeva$ cd /Users/jeeva/zookeeper/zk-server-1/bin/
mac-book-pro:bin jeeva$ ./zkServer.sh start
JMX enabled by default
Using config: /Users/jeeva/zookeeper/zk-server-1/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

mac-book-pro:bin jeeva$

Now, go ahead and start the remaining 4 ZooKeeper server(s).  Tail the zookeeper.out file in the bin directory to see more information.

zkServer.sh supports the following commands:

start
start-foreground
stop
restart
status
upgrade
print-cmd

We will use ‘status’ command to see ZookKeeper Server status:

mac-book-pro:demo jeeva$ /Users/jeeva/zookeeper/zk-server-3/bin/zkServer.sh status
JMX enabled by default
Using config: /Users/jeeva/zookeeper/zk-server-3/bin/../conf/zoo.cfg
Mode: leader

mac-book-pro:demo jeeva$

mac-book-pro:demo jeeva$ /Users/jeeva/zookeeper/zk-server-5/bin/zkServer.sh status
JMX enabled by default
Using config: /Users/jeeva/zookeeper/zk-server-5/bin/../conf/zoo.cfg
Mode: follower

mac-book-pro:demo jeeva$

ZooKeeper CLI Client

ZooKeeper command line interface for handy administration How to Connect ZooKeeper through CLI? and Famous Four letter commands for it.


Integrating ZooKeeper Cluster

Typically, ZooKeeper enabled application will be able to connect right away.  To integrate the ZooKeeper with application, all you need know is ‘all the ZooKeeper server(s)’-

  • host-address/host-ip
  • port-no

For an example: SolrCloud, Elasticsearch, etc


Maintenance – Basic Elements

Typically two things to take care in ZooKeeper Server and also this is trick part.

  • Data directory Cleanup
  • Debug Log Cleanup (log4j)

Why I call maintenance is a tricky part? Let me describe – ZooKeeper provides the autopurge configuration like autopurge.snapRetainCount and autopurge.purgeInterval.  However, in real-time scenario’s maintenance activities will differ organization to organization.  Basically it depends on Organization IT policy and Business Requirements.

Please have a look on ZooKeeper Maintenance and plan yours!


Performance & Availability Considerations

  • As long as a majority of the ensemble are up, the ZooKeeper service will be available.  Because It requires a majority, it is best to use an odd number of machines.  For example, with four machines ZooKeeper can only handle the failure of a single machine; if two machines fail, the remaining two machines do not constitute a majority.   However, with five machines ZooKeeper can handle the failure of two machines
  • It’s critical that you run ZooKeeper under supervision, since Zookeeper is fail-fast and will exit the process if it encounters any error case.  See here for more details.
  • The ZooKeeper Data Directory contains files which are a persistent copy of the znodes stored by a particular serving ensemble.  It’s snapshot files.  As changes are made to the znodes these changes are appended to a transaction log, occasionally, when a log grows large, a snapshot of the current state of all znodes will be written to the filesystem.  This snapshot supercedes all previous logs
  • ZooKeeper’s transaction log must be on a dedicated device. (A dedicated partition is not enough.) ZooKeeper writes the log sequentially, without seeking Sharing your log device with other processes can cause seeks and contention, which in turn can cause multi-second delays
  • Do not put ZooKeeper in a situation that can cause a swap.  In order for ZooKeeper to function with any sort of timeliness, it simply cannot be allowed to swap.  Therefore, make certain that the maximum heap size given to ZooKeeper is not bigger than the amount of real memory available to ZooKeeper. For more on this, see Things to Avoid

References

http://zookeeper.apache.org/doc/r3.3.4/zookeeperOver.html
http://zookeeper.apache.org/doc/r3.3.4/zookeeperStarted.html
http://zookeeper.apache.org/doc/r3.3.4/zookeeperAdmin.html