1)kafka consumer吞吐率在parition,threads较大的情况下,在测试场景下,最大吞吐率达到了34MB/s
2)复制因子,影响较小。replication factor并不会影响consumer的吞吐率测试, consumer从每个partition的leader读数据,而与replication factor无关。同样,consumer吞吐率也与同步复制还是异步复制无关。
3)线程数和partition与吞吐率关系
当分区数较大时,增加thread数可显著提升consumer的吞吐率。
但要注意在分区较大时线程数不改大于分区数,否则会出现No broker partitions consumed
by consumer,对提升吞吐率也没有帮助。
4)批处理数对吞吐率影响
改变批处理数对吞吐率影响不大
5)压缩与吞吐率
压缩对吞吐率影响小。
附优化后的配置文件:
broker.id=1
listeners=PLAINTEXT://0.0.0.0:6667
advertised.listeners=PLAINTEXT://203.150.54.215:6667
port=6667
host.name=203.150.54.215
# Replication configurations
num.replica.fetchers=1
replica.fetch.max.bytes=1048576
replica.fetch.wait.max.ms=500
replica.high.watermark.checkpoint.interval.ms=5000
replica.socket.timeout.ms=30000
replica.socket.receive.buffer.bytes=65536
replica.lag.time.max.ms=10000
replica.lag.max.messages=4000
compression.codec:none
controller.socket.timeout.ms=30000
controller.message.queue.size=10
controlled.shutdown.enable=true
default.replication.factor:2
# Log configuration
num.partitions=1
num.recovery.threads.per.data.dir=1
message.max.bytes=1000000
auto.create.topics.enable=true
auto.leader.rebalance.enable=true
log.dirs=http://www.likecs.com/mnt/kafka-logs/kafka00
log.index.interval.bytes=4096
log.index.size.max.bytes=10485760
log.retention.hours=72 #保留三天,也可以更短
log.flush.interval.ms=10000 #每间隔1秒钟时间,刷数据到磁盘
log.flush.interval.messages=20000 #log数据文件刷新策略
log.flush.scheduler.interval.ms=2000
log.roll.hours=72
log.retention.check.interval.ms=300000
log.segment.bytes=1073741824 #kafka启动时是单线程扫描目录(log.dir)下所有数据文件
# ZK configuration
zookeeper.connection.timeout.ms=6000
zookeeper.sync.time.ms=2000
zookeeper.connect=203.150.54.215:2181,203.150.54.216:2182,203.150.54.217:2183
# Socket server configuration
num.io.threads=5 #配置线程数量为cpu核数加1
num.network.threads=8 #配置线程数量为cpu核数2倍,最大不超过3倍
socket.request.max.bytes=104857600
socket.receive.buffer.bytes=1048576
socket.send.buffer.bytes=1048576
queued.max.requests=500
fetch.purgatory.purge.interval.requests=1000
producer.purgatory.purge.interval.requests=1000