ZooKeeper使用一个自定义的原子的消息传递协议。所以消息传递层是原子性的。ZooKeeper可以保证本地副本不会分割。当leader服务器收到一个写请求,它会计算这个写入操作执行时系统的状态和获取这个操作转化成一个事务的新状态。
Uses(使用)The programming interface to ZooKeeper is deliberately simple. With it, however, you can implement higher order operations, such as synchronizations primitives, group membership, ownership, etc. Some distributed applications have used it to: [tbd: add uses from white paper and video presentation.] For more information, see [tbd]
ZooKeeper的编程接口特意设计得简单。然而,你可以使用它来实现高层次的命令操作,例如同步原语,组的成员关系。所有权等。一些分布式应用可以使用它。
Performance(性能)ZooKeeper is designed to be highly performant. But is it? The results of the ZooKeeper's development team at Yahoo! Research indicate that it is. (See .) It is especially high performance in applications where reads outnumber writes, since writes involve synchronizing the state of all servers. (Reads outnumbering writes is typically the case for a coordination service.)
ZooKeeper是设计成高性能的,但是真的这样么?ZooKeeper在雅虎的研发团队研究结果显明它真的如此。(看.)应用在读取性能上表现地写性能高得多,因为写操作要涉及所有服务器的同步。(在调度服务中读性能超过写性能是普遍的情况)
ZooKeeper Throughput as the Read-Write Ratio Varies
The figure is a throughput graph of ZooKeeper release 3.2 running on servers with dual 2Ghz Xeon and two SATA 15K RPM drives. One drive was used as a dedicated ZooKeeper log device. The snapshots were written to the OS drive. Write requests were 1K writes and the reads were 1K reads. "Servers" indicate the size of the ZooKeeper ensemble, the number of servers that make up the service. Approximately 30 other servers were used to simulate the clients. The ZooKeeper ensemble was configured such that leaders do not allow connections from clients.
图是ZooKeeper3.2发布版本运行在配置为两个2GHz的至强芯片和两个SATA 15K RPM驱动器上的吞吐量图表。一个驱动器用来ZooKeeper专用的日志设备。快照写到系统驱动。1K的读和1K的写。“服务器”数表明ZooKeeper集群的大小,服务器的数量构成服务。大概30个服务器用于模拟客户端。ZooKeeper集群配置leaders不允许客户端的连接。
Note(说明)In version 3.2 r/w performance improved by ~2x compared to the .
Benchmarks also indicate that it is reliable, too. shows how a deployment responds to various failures. The events marked in the figure are the following:
Failure and recovery of a follower
Failure and recovery of a different follower
Failure of the leader
Failure and recovery of two followers
Failure of another leader
3.2版本比之前3.1版本提高了两倍性能。
基准测试也表明它的可靠性。 展示了部署的框架如何应用各种失效。下面是图像中标志的事件:
follower的失效和恢复
不同的follower的失效和恢复
leader的失效
两个follower的失效和恢复
另一个 leader 的失效
Reliability(可靠性)To show the behavior of the system over time as failures are injected we ran a ZooKeeper service made up of 7 machines. We ran the same saturation benchmark as before, but this time we kept the write percentage at a constant 30%, which is a conservative ratio of our expected workloads.
展示运行在7台机器上的ZooKeeper服务在故障发生后随着时间的推进系统的行为。我们运行跟上面测试同样的环境上,但这次只保持30%的写入,保持在一个保守的负载。
Reliability in the Presence of Errors
The are a few important observations from this graph. First, if followers fail and recover quickly, then ZooKeeper is able to sustain a high throughput despite the failure. But maybe more importantly, the leader election algorithm allows for the system to recover fast enough to prevent throughput from dropping substantially. In our observations, ZooKeeper takes less than 200ms to elect a new leader. Third, as followers recover, ZooKeeper is able to raise throughput again once they start processing requests.