Logstash using dissect instead of grok for filtering
Some time a go I've came across the dissect filter for logstash to extract data from my access_logs before I hand it over to elasticsearch. Dissect is a different type of filter than grok since it does not use regex, but it's an alternative way to aproach data. Dissect does not intend to replace grok. There are situations where the combination of dissect and grok would be preffered. Also dissect is preferably used in situations where number of fields are always the same otherwise grok is a better option.
Dissect is a split kind of filter where the fields can be split/dissected using various different delimiters.
A set of fields and delimiters are called dissections. The dissections are a set of %{}
sections.
A field is a text stating with %{
and ending with }
, and the delimitors is the text between }
and %
. The delimitors cannot be any of the following characters: %,{,}
.
Logstash does not come with dissect filter installed by default so it has to be installed manually by running the following command:
#cd /usr/share/logstash
# bin/logstash-plugin install logstash-filter-dissect
Once that is done you can start building your config file for handling the input. In our case we will use the access_log from nginx which is handled by a running filebeat. It's configuration is showed in this post.
I will put here a couple of lines from the access_logs, to have reference on the logstash mapping.
66.249.66.207 - - [14/Dec/2016:19:31:37 -0500] "GET /setup-and-configure-elasticsearch-logstash-logstash-forwarder-and-kibana-on-debian-jessie/ HTTP/1.1" 200 8707 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
66.249.66.207 - - [14/Dec/2016:19:31:39 -0500] "GET /comments/plugins/nodebb-plugin-blog-comments/css/comments.css HTTP/1.1" 200 1256 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
Our config file located at /etc/logstash/conf.d
would look like the following:
input {
beats {
host => "0.0.0.0"
port => "{some_port}"
}
}
filter {
if [type] == "nginx-access" {
dissect {
mapping => { 'message' => '%{clientip} %{ident} %{agent} [%{timestamp} %{+timestamp}] "%{method} %{request} HTTP/%{httpversion}" %{answer} %{byte} "%{referrer}" "%{ua} %{+ua} %{+ua}/%{ua_version} (%{OS} %{+OS} %{+OS}) %{subversion}/%{+subversion} %{browser}/%{browser_version}" "%{agent}"'}
}
}
geoip {
source => "clientip"
database => "/etc/logstash/GeoLite2-City.mmdb"
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["localhost:9200"]
}
}
As you can see the comparation from a dissect from grok here:
Grok filtering looks like this:
match => { 'message' => '%{IPORHOST:clientip} %{USER:ident} %{USER:agent} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{URIPATHPARAM:request}(?: HTTP/%{NUMBER:httpversion})?|)\" %{NUMBER:answer} (?:%{NUMBER:byte}|-) (?:\"(?:%{URI:referrer}|-))\" (?:%{QS:referree}) %{QS:agent}' }
Dissect filtering look like this:
mapping => { 'message' => '%{clientip} %{ident} %{agent} [%{timestamp} %{+timestamp}] "%{method} %{request} HTTP/%{httpversion}" %{answer} %{byte} "%{referrer}" "%{ua} %{+ua} %{+ua}/%{ua_version} (%{OS} %{+OS} %{+OS}) %{subversion}/%{+subversion} %{browser}/%{browser_version}" "%{agent}"'}
While grok uses regexp the dissect would use simple fields with mentioning the delimitors. However while grok handles pretty well fields like timestamp where it look like {HTTPDATE:TIMESTAMP}
the dissect would use each space separated text as a separate field unless you use the append format of it which mean to use +
before the fieldname as shown: %{timestamp} %{+timestamp}
.
The above logs in kibana would look like the following:
In short dissect filter can be successfully used for mapping fields in logstash for simply things where grok would be too complicated. I don't have some metrics yet but as soon as I find a way to get some I will post the difference between grok and dissect in terms of numbers.
I hope I gave you a good idea about the usage of dissect I wish you all happy searching.