日志分析作为掌握业务情况的一个重要手段,目前使用最多最成熟的莫过于ELK方案,其中也有各种搭配组合,像rsyslog->ES->kibana、rsyslog->Redis->Logstash->ES->kibana、rsyslog->kafka->Logstash->ES->kibana等等,复杂点的有spark的引用。每种方案适合不同的应用场景,没有好坏,我的集群用的是rsyslog->kafka->Logstash->ES->kibana和rsyslog->rsyslog中继->kafka->Logstash->ES->kibana方案,目前4台ES每天索引10多亿条日志,包含nginx、redis、php等,运行比较健壮,每条日志的索引在10个字段左右,每天Primary Shard的索引量在500个G左右,考虑到性能和日志性质,我们没要复制分片,日志保留7天。
总结一下,其实就是采集->清洗->索引->展现几个环节,再去考虑各环节中缓存、队列的使用。下面介绍一下此方案集群的搭建和配置。希望对同行有所帮助,也算我积的福德,在ELK探索过程中多谢远川和冯超同学的奉献交流。
一、采集(使用rsyslog)
客户端使用rsyslog8.19.0做的收集,直接CentOS安装rpm包,安装详细见:
将yum源配置好后:
yum install rsyslog
yum install rsyslog-kafka
安装好后对应rsyslog的配置文件如下:
module(load="imfile")
module(load="omkafka")
$PreserveFQDN on
main_queue(
queue.workerthreads="10" # threads to work on the queue
queue.dequeueBatchSize="1000" # max number of messages to process at once
queue.size="50000" # max queue size
)
##########################nginx log################################
$template nginxlog,"%$myhostname%`%msg%"
if $syslogfacility-text == 'local6' then {
action(
broker=["10.13.88.190:9092","10.13.88.191:9092","10.13.88.192:9092","10.13.88.193:9092"]
type="omkafka"
topic="cms-nginx"
template="nginxlog"
partitions.auto="on"
)
stop
}
############################redis log#########################
$template redislog,"%$myhostname%`%msg%"
ruleset(name="redis7215-log") {
action(
broker=["10.13.88.190:9092","10.13.88.191:9092","10.13.88.192:9092","10.13.88.193:9092"]
type="omkafka"
topic="redis-log"
template="redislog"
partitions.auto="on"
)
}
input(type="imfile"
File="/data1/ms/log/front/redis7215.log"
Tag=""
ruleset="redis7215-log"
freshStartTail="on" #start tailf
reopenOnTruncate="on" #Truncate reopen
)
input(type="imfile"
File="/data1/ms/log/front/redis7243.log"
Tag=""
ruleset="redis7215-log"
freshStartTail="on"
reopenOnTruncate="on"
)
############################php curl log#############################
$template phpcurl-log,"%$myhostname%`%msg%"
ruleset(name="phpcurl-log") {
action(
broker=["10.13.88.190:9092","10.13.88.191:9092","10.13.88.192:9092","10.13.88.193:9092"]
type="omkafka"
topic="phpcurl-log"
template="phpcurl-log"
partitions.auto="on"
)
}
input(type="imfile"
File="/data1/ms/log/php_common/php_slow_log"
Tag=""
ruleset="phpcurl-log"
freshStartTail="on"
reopenOnTruncate="on"
)
为了避免在日志发送错误时,丢在message日志里,瞬间将磁盘占满,同时配置丢弃策略
*.info;mail.none;authpriv.none;cron.none;local6.none /var/log/messages
目前收集了nginx、redis、php curl三种日志,说一下收集方案。
1、对于nginx
方案1:采用nginx的rsyslog模块将日志打到local6,对应nginx的配置如下
##########elk#############################
access_log syslog:local6 STAT;
然后通过如上rsyslog的配置,将日志直接入kafka队列,kafka集群是4个broker。