Setup and configure elasticsearch, logstash, logstash-forwarder and kibana on debian jessie

INTRODUCTION##

This is the first part of a series of tutorials on how to install configure and setup elasticsearch, logstash and kibana on debian jessie using VPSie SSD VPS service.
Elastic as the company behind the three opensource projects - Elasticsearch, Logstash, and Kibana — designed to take data from any source and search, analyze, and visualize it in real time, Elastic is helping people make sense of data. From stock quotes to Twitter streams, Apache logs to WordPress blogs, our products are extending what's possible with data, delivering on the promise that good things come from connecting the dots.

ASSUMPTIONS##

I am assuming that you already have a VPS service and you have the knowledge how to deploy a VPS. For this tutorial series I will be using VPSie as a service and will also refer to the VPS as vpsies.
This tutorial will not cover the installation of the Debian Jessie on the vpsie for this you can check their blog. Having all these clarified I am also assuming that you will know how to SSH into your VPS. That mean your vps will have a fully working networking configured.

INSTALLATION##

ELASTICSEARCH###

First we will add the apt key and add the apt repository to the vps.

#wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
OK
#echo "deb http://packages.elastic.co/elasticsearch/1.7/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch-1.7.list

Then update the apt list to read the repository and install the elasticsearch.

#apt-get update
#apt-get install -y elasticsearch

After that is installed you need to edit the elasticsearch configuration file to add a cluster name. If the cluster name is not specified the default name will be used.
Note: If you want to have separate elasticsearch clusters for different services is highly recommended to use separate cluster names cause otherwise the nodes will automatically join existing clusters with the default name.

To configure the elasticsearch cluster name edit the /etc/elasticsearch/elasticsearch.yml and set a name for cluster.name as in this example:

cluster.name: z0z0.tk-1.5

if you want to set up the node name yourself then you can also set the node.name variable otherwise the elasticsearch cluster will assign a random name from a list of superheroes. To do so in the same configuration file you need to set the node.name variable/

node.name: "Franz Kafka"

There are tons of different other variables which can be set but we will only cover the basic configuration. Save your configuration file and it's time to set up elasticsearch for automatic start upon boot time and start the service.

#systemctl enable elasticsearch
#systemctl start elasticsearch

If your elasticsearch has started properly by running netstat -ntl you will see that elasticsearch is running on two different ports one is 9200 and the other is 9300 the port 9200 is the one you will be using to query data from your elasticsearch cluster.

We can also install a couple of useful plugins like head and bigdesk in elasticsearch. To do so you need to cd to /usr/share/elasticsearch and run the following commands:

#cd /usr/share/elasticsearch
#bin/plugin -install mobz/elasticsearch-head
#bin/plugin -url https://github.com/lukas-vlcek/bigdesk/archive/master.zip -install bigdesk

Once you have the plugins installed you need to restart the elasticsearch:

#systemctl restart elasticsearch

You can access the plugins by browsing your elasticsearch IP on port 9200 for example:

http://<elasticsearch_ip>:9200/_plugin/plugin_name

LOGSTASH###

Now it's time to install the logstash. First we need to create the repository for it:

#echo "deb http://packages.elasticsearch.org/logstash/1.5/debian stable main" | sudo tee -a /etc/apt/sources.list

After the Logstash repository is added we need to update the repository and install the logstash

#apt-get update
#apt-get install -y logstash

Once your logstash is installed it is time to configure it.
Logstash configuration file consist in three main parts first is the input is where you set where are the source of the information from where you want to load the data into the elasticsearch. The filters, codecs where you set the filters which you want to run on the information you have loaded, and the output is where you configure where the loaded and parsed information to be sent.
Since we will be configuring logstash to read the information from a remote server for input we will be using an input plugin called lumberjack.
First of all we need to create an ssl certificate using the following command:

#cd /etc/pki/tls
#openssl req -x509  -batch -nodes -newkey rsa:2048 -keyout private/logstash-forwarder.key -out certs/logstash-forwarder.crt -subj /CN=*.example.com

Note: You can use as CN the full domain name IP address or wildcard domain. Important is that if you use the IP address as CN you will have to use the IP address of the VPS which has the logstash running. If you are using some kind of virtualisation or have your logstash behind nat then I strongly suggest to use full domain or wildcard domains cause otherwise the logstash will drop all the connections. In this tutorial we will make elasticsearch to store the logs from nginx access logs. The basic configuration file for logstash is located at /etc/logstash/conf.d/. I will name the file logstash.conf for as less confusion as possible.

input {
    lumberjack {
            port => 5000
            type => "logs"
            ssl_certificate => "/etc/pki/tls/certs/z0z0.tk.crt"
            ssl_key => "/etc/pki/tls/private/z0z0.tk.key"
   }
}
filter {
    if [type] == "nginx-access" {
            grok {
                match => { 'message' => '%{IPORHOST:clientip} %{USER:ident} %{USER:agent} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{URIPATHPARAM:request}(?: HTTP/%{NUMBER:httpversion})?|)\" %{NUMBER:answer} (?:%{NUMBER:byte}|-) (?:\"(?:%{URI:referrer}|-))\" (?:%{QS:referree}) %{QS:agent}' }
            }
      }
}
output {
    stdout {
            codec => rubydebug
      }
    elasticsearch {
            host => "127.0.0.1"
            cluster => "z0z0.tk"
            flush_size => 2000
      }
}

There are patterns for nginx to be find on the internet but the information from the logs can differ depending on the configuration of the nginx you use that is why I strongly advise to use grok pattern to create your own patters which would work with the log files you are generating.
There are two great tools for creating the grok patterns or to check the grok pattern if it will work with your log files. The first one is called Grok Debugger and the second one is Grok Incremental Constructor which is great in constructing your own patterns incrementally.

Let's take a short look at the logstash.conf

The input section has the lumberjack plugin which contains of:
port - the port which logstash-forwarder will use to connect to logstash type - the type of information will be provided to logstash sslcertificate - the certificate generated to connect to logstash sslkey - the key for the certificate generated to connect to logstash The filter section we are verifying if the type is nginx-access and if it is true then we apply the grok pattern to the log.
On the output first we run the codec rubydebug, this would show a debug log and also send the results to elasticsearch on localhost with cluster name z0z0.k.
When your logstash is configured properly you have to make it to start at boot and start it up.

#systemctl enable logstash
#systemctl start logstash

Once the logstash has started up running netstat -ntlp you will see that is listening on port 5000.
logstash

KIBANA####

Now that we have logstash and elasticsearch up and running it is time to install kibana
We need to download kibana to our server using the following command:

#wget https://download.elastic.co/kibana/kibana/kibana-4.1.1-linux-x64.tar.gz

Once the kibana is downloaded we need to extract it to /opt:

#tar -xzvf kibana-4.1.1-linux-x64.tar.gz -C /opt

Now we need to rename the folder and setup the init to be able to run it as a service

#mv /opt/kibana-* /opt/kibana
#wget https://gist.githubusercontent.com/thisismitch/8b15ac909aed214ad04a/raw/bce61d85643c2dcdfbc2728c55a41dab444dca20/kibana4
#chmod +x /etc/init.d/kibana4
#sed -i '/^NAME/d' /etc/init.d/kibana4
#sed -i '/^KIBANA_BIN/a NAME=kibana4' /etc/init.d/kibana4

Now that we have all in the place we need to edit the kibana configuration file to set up for the IP address where elasticsearch is listening. If you are running kibana from the same server where elasticsearch is running then you don't need to do anything. The configuration file is located at /opt/kibana/config/kibana.yml and edit the following line by changing he localhost with your elasticsearch IP address:

elasticsearch_url: "http://localhost:9200"

Once that is done you are ready to enable the startup script and start the kibana4 service

#update-rc.d kibana4 defaults
#service kibana4 start

You will be able to access kibana by browsing the following address

http://{your IP address}:5601
Now it's time to install logstash-forwarder to the server where the nginx is running and set it up to send the logs to the logstash.

LOGSTASH-FORWARDER
Connect to your remote server and download the logstash-forwarder.crt from the logstash server and place it to /etc/pki/tls/certs

#scp root@{logstash server IP}:/etc/pki/tls/certs/logstash-forwarder.crt /etc/pki/tls/certs/

Download the logstash-forwarder package related to the distribution you are using from here. I will assume that the nginx server is running on Debian so I will be downloading the deb file. For centos download the RPM file

For Debian run the following command:

 #wget https://download.elastic.co/logstash-forwarder/binaries/logstash-forwarder_0.4.0_amd64.deb
 #dpkg -i logstash-forwarder_0.4.0_amd64.deb

for Centos run the following command:

 #wget https://download.elastic.co/logstash-forwarder/binaries/logstash-forwarder-0.4.0-1.x86_64.rpm
 #yum install -y logstash-forwarder-0.4.0-1.x86_64.rpm

or you can use directly:

 #yum install -y https://download.elastic.co/logstash-forwarder/binaries/logstash-forwarder-0.4.0-1.x86_64.rpm

The configuration file is located at /etc/logstash-forwarder.conf.
Now it's time to set it up to take the files from the nginx logs and send it to logstash.

{
  "network": {
    "servers": [ "elk.z0z0.tk:5000" ],
    "ssl ca": "/etc/pki/tls/certs/z0z0.tk.crt",
    "timeout": 15
  },
  "files": [
    {
       "paths": [ "/var/log/nginx/access.log" ],
      "fields": { "type": "nginx-access" }
    }
  ]
}

As I mentioned in the logstash section I have created the certificates as a wildcard domain because the elk stack is behind the NAT and it cannot be accessed directly from the server on which nginx is running.

Now I have started the logstash-forwarder with nohup

#nohup /opt/logstash-forwarder/bin/logstash-forwarder -c "/etc/logstash-forwarder" -spool-size=100 -t=true &

and set up a crontab to run in at startup.

#crontab -e
@reboot nohup /opt/logstash-forwarder/bin/logstash-forwarder -c "/etc/logstash-forwarder" -spool-size=100 -t=true &

Now that we have set up logstash forwarder to read the access.log and send the logfiles to the logstash it's time to set up kibana to make graphs out of those logs.

You can start browsing kibana using http://IP_address:5601

kibana setup####

At the Time_field name make sure you have @timestamp selected and click create to start indexing the logs in elasticsearch.
After some time you will see the logs to be appear in the Discover menu:
kibana discover

These are the basic cofigurations for the elk stack will be adding more advance configurations in the next few days like geo mapping and how to create visualizations and dashboards on Kibana. Soon I will also show you how to make elk stack load mysql general logs.