Parsing Nagios log files with fluentd
Recently I’ve been experimenting with EFK to see how we can extract value from our machine logs. We also use Nagios to monitor various services and processes within our infrastructure. The text logs produces by Nagios are not very useful in their raw form as you can see…
[1405413255] Auto-save of retention data completed successfully.
[1405413285] SERVICE ALERT: servername;t 3306;OK;SOFT;2;QUERY OK: 'SELECT COUNT(*) FROM t' returned 32063.000000
[1405413745] SERVICE ALERT: servername;Memory;OK;HARD;3;OK Memory 9% used. Largest process: nscd (537) = 715.14MB (18%)
[1405414075] SERVICE NOTIFICATION: nagiosadmin;servername;MySQL Uptime 3306;WARNING;notify-service-by-email;WARNING: MySQL uptime, 1105 is below threshold: 4320.
[1405414315] SERVICE ALERT: servername;PING;WARNING;SOFT;1;PING WARNING - Packet loss = 28%, RTA = 34.29 ms
[1405414325] SERVICE ALERT: servername;PING;OK;SOFT;2;PING OK - Packet loss = 0%, RTA = 33.32 ms
[1405414345] SERVICE ALERT: servername;Memory;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
[1405414365] SERVICE NOTIFICATION: dash;servername;Service last results loaded;WARNING;notify-service-by-email;QUERY WARNING: SELECT COUNT(*) FROM t) AS t returned 0.000000
[1405414465] SERVICE ALERT: servername;Memory;CRITICAL;SOFT;2;CHECK_NRPE: Socket timeout after 10 seconds.
I wanted to get the service alerts in the log files into EFK. Here’s how I did it. First install the fluent-plugin-grok-parser plugin. If you are using td-agent…
/usr/lib64/fluent/ruby/bin/fluent-gem install fluent-plugin-grok-parser
Or if you are using the pure ruby version…
gem install fluent-plugin-grok-parser
Next we need to create a file containing the patterns we want to match. I used the one that can be found here. There’s also a useful grok debugger here if you want to test your own patterns. Click the “Nagios” link and copy and paste the next into a file; i.e. /usr/bin/scripts/nagios_grok_patterns.txt
Make sure td-agent can read the file…
chown td-agent:td-agent /usr/bin/scripts/nagios_grok_patterns.txt
The example here will parse a Nagios Service alert. The following log message…
[1405363825] SERVICE ALERT: servername;Memory;OK;SOFT;2;OK Memory 9% used. Largest process: nscd (537) = 715.14MB (18%)
Will be parsed by the following grok expression…
(?<nagios_type>SERVICE ALERT): (?<nagios_hostname>.*?);(?<nagios_service>.*?);(?<nagios_state>.*?);(?<nagios_statelevel>.*?);(?<nagios_attempt>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))));(?<nagios_message>.*)
and converted into the following json…
{
"nagios_type": [
[
"SERVICE ALERT"
]
],
"nagios_hostname": [
[
"servername"
]
],
"nagios_service": [
[
"Memory"
]
],
"nagios_state": [
[
"OK"
]
],
"nagios_statelevel": [
[
"SOFT"
]
],
"nagios_attempt": [
[
"2"
]
],
"nagios_message": [
[
"OK Memory 9% used. Largest process: nscd (537) = 715.14MB (18%)"
]
]
}
The following xml should be placed into /etc/td-agent/td-agent.conf to send Nagios Service alerts to your main server. Note the grok_pattern parameter uses the name of the expression in the file pointed at by custom_pattern_path.
<source>
type tail
format grok
grok_pattern %{NAGIOS_SERVICE_ALERT}
custom_pattern_path /usr/bin/scripts/nagios_grok_patterns.txt
path /usr/local/nagios/var/nagios.log
pos_file /var/log/td-agent/nagios_log.pos
tag nagios
</source>
<match nagios>
type record_reformer
tag nagios.source
source nagios
</match>
<match nagios.source>
type forward
<server>
host XXX.XXX.XXX.XXX
port 42186
</server>
</match>
Restart td-agent…
/etc/init.d/td-agent restart
The td-agent log file, probably /var/log/td-agent/ts-agent.log, should contain the following message if the previous steps have been setup correctly.
2014-07-14 19:50:08 +0100 [info]: Expanded the pattern (?<nagios_type>SERVICE ALERT): (?<nagios_hostname>.*?);(?<nagios_service>.*?);(?<nagios_state>.*?);(?<nagios_statelevel>.*?);(?<nagios_attempt>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))));(?<nagios_message>.*) into (?<nagios_type>SERVICE ALERT): (?<nagios_hostname>.*?);(?<nagios_service>.*?);(?<nagios_state>.*?);(?<nagios_statelevel>.*?);(?<nagios_attempt>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))));(?<nagios_message>.*)