15.2　ZooKeeper的安装和配置
- 15.2.1　安装ZooKeeper
Set ZooKeeper Enviroment
The number of milliseconds of each tick
the directory where the snapshot is stored.
the port at which the clients will connect
The number of milliseconds of each tick
The number of ticks that the initial
synchronization phase can take
The number of ticks that can pass between
sending a request and getting an acknowledgement
the port at which the clients will connect
the directory where the snapshot is stored.
the location of the log file
The number of milliseconds of each tick
The number of ticks that the initial
synchronization phase can take
The number of ticks that can pass between
sending a request and getting an acknowledgement
the directory where the snapshot is stored.
the port at which the clients will connect
the location of the log file
The number of milliseconds of each tick
The number of ticks that the initial
synchronization phase can take
The number of ticks that can pass between
sending a request and getting an acknowledgement
the directory where the snapshot is stored.
the port at which the clients will connect
the location of the log file
The number of milliseconds of each tick
The number of ticks that the initial
synchronization phase can take
The number of ticks that can pass between
sending a request and getting an acknowledgement
the directory where the snapshot is stored.
the port at which the clients will connect
the location of the log file

15.2　ZooKeeper的安装和配置

在这一节中，我们将首先向大家介绍如何在不同的环境下安装并配置ZooKeeper服务；然后具体介绍如何通过ZooKeeper配置文件对ZooKeeper进行配置管理；最后向大家介绍如何在不同环境下启动ZooKeeper服务。

15.2.1　安装ZooKeeper

ZooKeeper有不同的运行环境，包括：单机环境、集群环境和集群伪分布环境。这里，我们将分别介绍不同环境下如何安装ZooKeeper服务，并简单介绍它们的区别与联系。

1.系统要求

下面将说明安装ZooKeeper对系统和软件的要求。

（1）支持的平台

ZooKeeper可以在不同的系统上运行，表15-1是关于这方面的一个简单说明。

（2）软件要求

首先，安装ZooKeeper需要Java的支持，并且要求1.6以上的版本。此外，对于集群的安装，ZooKeeper需要至少三个节点，我们建议将三个节点部署在不同的机器上。例如，Yahoo！将ZooKeeper部署在Red Hat Linux机器上，每台机器使用多核CPU，2G的内存和80G的IDE硬盘。

JDK的安装已经在前面章节中有过详细介绍，这里不再赘述。

注意　由于频繁的换入换出操作对系统的性能有较大的影响，为了避免这种情况的发生，建议将Java的堆大小设置为合适的值。一般说来，所设置的Java堆大小的值不应大于实际可用的内存值。对于具体的值的大小，可以通过负载测试来决定。例如，建议将4GB内存的机器的Java堆大小设置为3GB。

系统中，要求大多数机器处于可用状态。如果想要集群能够忍受m台机器的故障，那么整个集群至少需要2m+1台机器。因为此时剩余的m+1台才能构成系统的一个大多数集。例如，对于拥有三台机器的集群，系统能够在一台机器发生故障的情况下仍然提供服务。

另外，最好使用奇数台的机器。例如，拥有四台机器的ZooKeeper只能处理一台机器的故障，如果两台机器发生故障，余下的两台机器并不能组成一个可用的ZooKeeper大多数集（三台机器才能构成四台机器的大多数集）；而如果ZooKeeper拥有五台机器，那么它就能处理两台机器的故障了。

2.单机下安装ZooKeeper

（1）ZooKeeper的下载

如果大家是第一次使用ZooKeeper，那么我们建议首先尝试在单机模式下配置ZooKeeper服务器。因为，在单机模式下配置和使用相对来说都要简单得多，并且易于帮助大家理解ZooKeeper的工作原理。这对进一步学习使用ZooKeeper会有很大的帮助。

从Apache官方网站下载一个ZooKeeper的最新稳定版本，网址如下：

http：//hadoop.apache.org/zookeeper/releases.html

作为国内用户来说，选择最近的源文件服务器所在地，能够节省不少的时间，比如：

http：//labs.renren.com/apache-mirror/hadoop/zookeeper/

（2）ZooKeeper的安装

为了今后操作方便，我们需要对ZooKeeper的环境变量进行配置，方法如下，在/etc/profile文件中加入如下的内容：

Set ZooKeeper Enviroment

export ZOOKEEPER_HOME=$HADOOP_HOME/zookeeper-3.4.3

export PATH=$PATH：$ZOOKEEPER_HOME/bin：$ZOOKEEPER_HOME/conf

ZooKeeper服务器包含在单个JAR文件中，安装此服务需要用户创建一个配置文档，并对其进行设置。我们在ZooKeeper-..目录（本书以当前ZooKeeper的最新版3.4.3为例，故在下文中此“ZooKeeper-..”都将写为“ZooKeeper-3.4.3”）的conf文件夹下创建一个zoo.cfg文件，它包含如下的内容：

tickTime=2000

dataDir=$HADOOP_HOME/zookeeper-3.4.3/data

clientPort=2181

在这个文件中，$HADOOP_HOME代表Hadoop的安装目录，为了使用的方便，我们将其放在Hadoop安装目录下。需要注意的是，ZooKeeper的运行并不依赖于Hadoop，也不依赖于HBase或其它与Hadoop相关的项目。此外，我们需要指定dataDir的值，它指向了一个目录，这个目录在开始的时候应为空。下面是每个参数的含义：

tickTime：基本事件单元，以毫秒为单位。它用来指示心跳，最小的session过期时间为两倍的tickTime。

dataDir：存储内存中数据库快照的位置，如果不设置参数，更新事务的日志将被存储到默认位置。

clientPort：监听客户端连接的端口。

使用单机模式时大家需要注意：这种配置方式下没有ZooKeeper副本，所以如果ZooKeeper服务器出现故障，ZooKeeper服务将会停止。

代码清单15-1是我们根据自身情况所设置的ZooKeeper配置文档：zoo.cfg。

代码清单15-1　ZooKeeper配置文档zoo.cfg

The number of milliseconds of each tick

tickTime=2000

the directory where the snapshot is stored.

dataDir=$HADOOP_HOME/zookeeper-3.4.3/data

the port at which the clients will connect

clientPort=2181

3.在集群下安装ZooKeeper

为了获得可靠的ZooKeeper服务，用户应该在一个集群上部署ZooKeeper。只要集群上大多数的ZooKeeper服务启动了，那么总的ZooKeeper服务将是可用的。

这之后的操作和单机模式的安装类似，我们同样需要对Java环境进行设置，下载最新的ZooKeeper稳定版本并配置相应的环境变量。每台机器上conf/zoo.cfg配置文件的参数设置相同，可参考代码清单15-2的配置。

代码清单15-2　zoo.cfg中的参数设置

The number of milliseconds of each tick

tickTime=2000

The number of ticks that the initial

synchronization phase can take

initLimit=10

The number of ticks that can pass between

sending a request and getting an acknowledgement

syncLimit=5

the port at which the clients will connect

clientPort=2181

the directory where the snapshot is stored.

dataDir=$HADOOP_HOME/zookeeper-3.4.3/data

the location of the log file

dataLogDir=$HADOOP_HOME/zookeeper-3.4.3/log

server.1=zoo1：2888：3888

server.2=zoo2：2888：3888

server.3=zoo3：2888：3888

更多关于ZooKeeper参数的设置请参看15.2.2节。“server.id=host：port：port.”标识了不同的ZooKeeper服务器的配置。每台服务器作为集群的一部分应该知道ensemble^[1]中的其他机器，用户可以从“server.id=host：port：port.”中读取相关的信息。参数中host和port比较直观。id标识的是不同的服务器，在服务器的data（dataDir参数所指定的目录）目录下创建一个文件名为myid的文件，这个文件中仅含一行的内容，它所指定的是自身的id值。比如，服务器“1”应该在myid文件中写入“1”。而且这个id值必须是ensemble中唯一的，大小在1到255之间。在这一行配置中，第一个端口（port）是从（follower）机器连接到主（leader）机器的端口，第二个是用来进行leader选举的端口。在这个例子中，每台机器使用三个端口，分别是：clientPort，2181；port，2888；port，3888。

笔者在拥有三台机器的Hadoop集群上测试了ZooKeeper的安装，如上所示，代码清单15-2就是根据自身情况所设置的ZooKeeper配置文档。

清单中的zoo1、zoo2及zoo3分别为三台机器的主机名，该项需要在Ubuntu的host环境中进行设置，这部分内容不是本书的重点，不再赘述。大家可以查阅Ubuntu以及Linux的相关资料。

4.在集群伪分布模式下安装ZooKeeper

通过前面的章节，读者了解到Hadoop可以在伪分布模式下模拟分布式Hadoop的运行。与它不同的是，ZooKeeper不但可以在单机上运行单机模式ZooKeeper，而且可以在单机上模拟集群模式ZooKeeper的运行，也就是将不同的节点运行在同一台机器上。我们索性将其称之为“集群伪分布模式”，以区别“单机模式”。我们知道，伪分布模式下Hadoop的操作和分布式模式下有着很大的不同，但是在集群伪分布模式下对ZooKeeper的操作却和集群模式下没有本质的区别。显然，集群伪分布模式为我们体验ZooKeeper和做一些尝试性的实验提供了很大的便利。比如，我们在实验的时候，可以先使用少量数据在集群伪分布模式下进行测试。当测试可行的时候，再将其移植到集群模式下进行真实的数据实验。这样不但保证了它的可行性，同时大大提高了实验的效率。

那么，如何配置ZooKeeper的集群伪分布模式呢？其实很简单。用心的读者可以发现，在ZooKeeper配置文档中，clientPort参数是用来设置客户端连接ZooKeeper的端口。在server.1=IP1：2887：3887中，IP1指示的是组成ZooKeeper服务的机器IP地址，2887为进行leader选举的端口，3887是组成ZooKeeper服务的机器之间的通信端口。在集群伪分布模式下我们使用每个配置文档模拟一台机器，也就是说，需要在单台机器上运行多个ZooKeeper实例。但是，必须要保证各个配置文档的各个端口不能冲突。

下面是我们所配置的集群伪分布模式，分别通过zoo1.cfg、zoo2.cfg、zoo3.cfg来模拟有三台机器的ZooKeeper集群。详见代码清单15-3至清单15-5。

代码清单15-3　zoo1.cfg

The number of milliseconds of each tick

tickTime=2000

The number of ticks that the initial

synchronization phase can take

initLimit=10

The number of ticks that can pass between

sending a request and getting an acknowledgement

syncLimit=5

the directory where the snapshot is stored.

dataDir=$HADOOP_HOME/zookeeper-3.4.3/d_1

the port at which the clients will connect

clientPort=2181

the location of the log file

dataLogDir=$HADOOP_HOME/zookeeper-3.4.3/logs_1

server.1=localhost：2887：3887

server.2=localhost：2888：3888

server.3=localhost：2889：3889

代码清单15-4　zoo2.cfg

The number of milliseconds of each tick

tickTime=2000

The number of ticks that the initial

synchronization phase can take

initLimit=10

The number of ticks that can pass between

sending a request and getting an acknowledgement

syncLimit=5

the directory where the snapshot is stored.

dataDir=$HADOOP_HOME/zookeeper-3.4.3/d_2

the port at which the clients will connect

clientPort=2182

the location of the log file

dataLogDir=$HADOOP_HOME/zookeeper-3.4.3/logs_2

server.1=localhost：2887：3887

server.2=localhost：2888：3888

server.3=localhost：2889：3889

代码清单15-5　zoo3.cfg

The number of milliseconds of each tick

tickTime=2000

The number of ticks that the initial

synchronization phase can take

initLimit=10

The number of ticks that can pass between

sending a request and getting an acknowledgement

syncLimit=5

the directory where the snapshot is stored.

dataDir=$HADOOP_HOME/zookeeper-3.4.3/d_3

the port at which the clients will connect

clientPort=2183

the location of the log file

dataLogDir=$HADOOP_HOME/zookeeper-3.4.3/logs_3

server.1=localhost：2887：3887

server.2=localhost：2888：3888

server.3=localhost：2889：3889

从上述三个代码清单可以看到，它们除了clientPort不同之外，dataDir也不同。另外，不要忘记在dataDir所对应的目录中创建myid文件来指定对应的ZooKeeper服务器实例。

[1]全体，相对于大多数集（quorum）而言。

15.2 ZooKeeper的安装和配置

15.2 ZooKeeper的安装和配置

15.2.1 安装ZooKeeper

Set ZooKeeper Enviroment

The number of milliseconds of each tick

the directory where the snapshot is stored.

the port at which the clients will connect

The number of milliseconds of each tick

The number of ticks that the initial

synchronization phase can take

The number of ticks that can pass between

sending a request and getting an acknowledgement

the port at which the clients will connect

the directory where the snapshot is stored.

the location of the log file

The number of milliseconds of each tick

The number of ticks that the initial

synchronization phase can take

The number of ticks that can pass between

sending a request and getting an acknowledgement

the directory where the snapshot is stored.

the port at which the clients will connect

the location of the log file

The number of milliseconds of each tick

The number of ticks that the initial

synchronization phase can take

The number of ticks that can pass between

sending a request and getting an acknowledgement

the directory where the snapshot is stored.

the port at which the clients will connect

the location of the log file

The number of milliseconds of each tick

The number of ticks that the initial

synchronization phase can take

The number of ticks that can pass between

sending a request and getting an acknowledgement

the directory where the snapshot is stored.

the port at which the clients will connect

the location of the log file

15.2　ZooKeeper的安装和配置

15.2　ZooKeeper的安装和配置

15.2.1　安装ZooKeeper