编程 google apache 云计算 java linux Windows centos Python wordpress shell 微软 程序员 php Android Ubuntu nginx mysql Firefox 开源

Flume 正则方式收集nginx日志到Hbase hadoop

式收集Nginx日志到Hbase数据库里面, 这个我也是折腾好久才, 以前也没弄过JAVA. Flume 中文资料真的很少… 下面是一个Flume collector收集其他的Flume agent数据并通过sink正则放数据到hbase的配置文件内容, 这里就简单笔记下了.

agent.sources = source1
agent.sinks = hbase1
agent.channels = channel1

#agent.sources.source1.type = exec
#agent.sources.source1.command =  tail -F /home/hadoop/access.log
#agent.sources.source1.batchSize = 100

agent.sources.source1.type = avro
agent.sources.source1.bind = 10.15.200.47
agent.sources.source1.port = 44444

agent.sources.source1.channels = channel1
agent.sinks.sink1.channel = channel1
# Configure Hbase Sink sink1
#agent.sources.source1.interceptors.hbase1.regex=
agent.sinks.hbase1.type = org.apache.flume.sink.hbase.HBaseSink
agent.sinks.hbase1.channel = channel1
agent.sinks.hbase1.table = test
agent.sinks.hbase1.columnFamily = log

agent.sinks.hbase1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
// nginx日志都是分割成: ([xxx] [yyy] [zzz] [nnn] ...) 这种格式的, 所以用下面的正则
agent.sinks.hbase1.serializer.regex = \[(.*?)\]\ \[(.*?)\]\ \[(.*?)\]\ \[(.*?)\]\ \[(.*?)\]\ \[(.*?)\]\ \[(.*?)\]\ \[(.*?)\]\ \[(.*?)\]\ \[(.*?)\]\ \[(.*?)\]\ \[(.*?)\]\ \[(.*?)\]\ \[(.*?)\]
// 指定上面正则匹配到的数据对应的hbase列名
agent.sinks.hbase1.serializer.colNames = a,b,c,d,e,f,g,h,i,j,k,l,m,n
#agent.sinks.hbase1.serializer.payloadColumn = test

# Use a channel which buffers events in memory
agent.channels.channel1.type = memory
agent.channels.channel1.capacity = 800
agent.channels.channel1.transactionCapactiy = 100

延伸阅读

评论