30 January 2014

Postfix log centralize and analysis in realtime with fluentd tdagent elasticsearch and kibana - part 3

<< Back to part 2 <<

3. Config Elasticsearch

Elasticsearch is a very strong and flexible search engine based on Apache lucene, it supports load balance and fail over by using shards and replicas technique - this helps to scale out to very big model with multiple clustered nodes. 

In this post, Elasticsearch will act as the search engine and is the final destination of the log stream. The installation and configuration is quite simple and easy.

Apache lucene is based on Java so we need to install the JRE environment : # yum groupinstall "Java Development" 

Download Elasticsearch : http://www.rpmse.org/#/dashboard?s=elasticsearch

# rpm -ivh ./elasticsearch-0.90.10.noarch.rpm

The main config file : /etc/elasticsearch/elasticsearch.yml

Log file location : /var/log/elasticsearch/

Data files location : /var/lib/elasticsearch/

This server has 8GB RAM memory, we will allocate 4GB (half of total memory) to Elasticsearch : # vim /etc/sysconfig/elasticsearch 

ES_HEAP_SIZE=4gb

Start the Elasticsearch service : # /etc/init.d/elasticsearch start

Check to see if the service is running : 

# netstat -pnat | grep 9200
tcp        0      0 :::9200              :::*             LISTEN      6896/java            

Some basic Elasticsearch command and uses :

  • Show all the indexes : curl localhost:9200/_aliases?pretty
  • Create a new index : curl localhost:9200/my-index -XPOST
  • Create a sample record : curl localhost:9200/my-index/my-table/record-id -d '{"name":"Nicolas Cage","sex":"Male"}' -XPOST
  • Show all records in an index type (table) : curl localhost:9200/my-index/my-table/_search?pretty
  • To count all the records : curl localhost:9200/my-index/_count?pretty
  • To have a search : 'curl localhost:9200/my-index/_search?pretty&q=name:nicolas'
  • To delete a record : curl localhost:9200/my-index/my-table/record-id -XDELETE
  • To delete a whole index : curl localhost:9200/my-index -XDELETE

Useful Elasticsearch plugins must have : 

From the previous post, we have configed td-agent to push postfix event log (after parsed) into Elasticsearch by using :

### Write event to ElasticSearch ###
<match mail.info>
 buffer_type file
 buffer_path /mnt/ramdisk/postfix-mail.buff
 buffer_chunk_limit 4m
 buffer_queue_limit 50
 flush_interval 0s
 type elasticsearch
 logstash_format true
 logstash_prefix postfix_mail
</match>

It should work properly and we will have a sample record like this : # curl localhost:9200/postfix_mail-2014.01.23/_search?pretty

"took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 4,
    "successful" : 4,
    "failed" : 0
  },
  "hits" : {
    "total" : 2825,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "postfix_mail-2014.01.23",
      "_type" : "fluentd",
      "_id" : "f9PMd6tLRp21Vwq7MqVKlg",
      "_score" : 1.0, "_source" : {"host":"postfix","ident":"postfixOUTGOING/smtp","pid":"19190","message":"5136311B8003: to=<somebody@yahoo.com>, relay=mta6.am0.yahoodns.net[98.138.112.33]:25, delay=6, delays=0.1/0.03/1.8/4.1, dsn=5.0.0, status=bounced (host mta6.am0.yahoodns.net[98.138.112.33] said: 554 delivery error: dd This user doesn't have a yahoo.com account (somebody@yahoo.com) [0] - mta1170.mail.ne1.yahoo.com (in reply to end of DATA command))","queueid":"5136311B8003","rcpt-to":"somebody@yahoo.com","relay":"mta6.am0.yahoodns.net[98.138.112.33]:25","status":"bounced","@timestamp":"2014-01-23T00:44:41+07:00"}
    }

Until now, the configuration has enough information to work properly and we ready to install Kibana to show interesting graphs. Below is section of Elasticsearch mapping config (for advanced), you can skip this if you want.

Elasticsearch mapping config :

By default, when a record is inserted into Elasticsearch, its fields value will be automatic detected and assigned for a data type, for example string or interger, float ... 

We can predefine this (data type of a field) and other thing  : 
  • Should a field be indexed or not, by default all fields are get indexed. In fact, there are some fields that we will never search for its value, just store it to show (message field in this config) it later. Set "index":"no" for saving the server energy. The value of this field will not be searchable.
  • Should a field be analyzed or not, by default all string fields are analyzed. Elasticsearch will split the string into terms, words, delimiter ... and analyze it. Set "index":"not_analyzed" to leave the field untouch. The value in this field will be searchable.
  • Index life : Log index should have a limited lifetime (time to live). Set _ttl to "default":"62d" means the index will be automatic deleted after 62 days. Leave this untouch if necessary.
We can define all these thing in a template file.  # vim /etc/elasticsearch/templates/postfix_mail.json 

{
    "postfix_mail" : {
        "template" : "postfix_mail*",
        "mappings" : {
            "fluentd" : {
                "_ttl" : { "enabled" : true, "default" : "62d" },
                "properties" : {
                        message:{"type":"string","index":"no"},
                        queueid:{"type":"string","index":"not_analyzed"},
                        rcpt-to:{"type":"string","index":"not_analyzed"},
                        status:{"type":"string","index":"not_analyzed"},
                        pid:{"type":"string","index":"not_analyzed"},
                        host:{"type":"string","index":"not_analyzed"},
                        relay:{"type":"string","index":"not_analyzed"}
                }
            }
        }
    }
}

The mapping template will work with the newly created indexes only, not effect with the already created indexes.

To check if the mapping has worked properly : curl localhost:9200/postfix_mail-2014.01.23/_mapping?pretty

{
  "postfix_mail-2014.01.23" : {
    "fluentd" : {
      "_ttl" : {
        "enabled" : true,
        "default" : 5356800000
      },
      "properties" : {
        "@timestamp" : {
          "type" : "date",
          "format" : "dateOptionalTime"
        },
        "host" : {
          "type" : "string",
          "index" : "not_analyzed",
          "omit_norms" : true,
          "index_options" : "docs"
        },
        "ident" : {
          "type" : "string"
        },
        "message" : {
          "type" : "string",
          "index" : "no"
        },
        "pid" : {
          "type" : "string",
          "index" : "not_analyzed",
          "omit_norms" : true,
          "index_options" : "docs"
        },
        "queueid" : {
          "type" : "string",
          "index" : "not_analyzed",
          "omit_norms" : true,
          "index_options" : "docs"
        },
        "rcpt-to" : {
          "type" : "string",
          "index" : "not_analyzed",
          "omit_norms" : true,
          "index_options" : "docs"
        },
        "relay" : {
          "type" : "string",
          "index" : "not_analyzed",
          "omit_norms" : true,
          "index_options" : "docs"
        },
        "status" : {
          "type" : "string",
          "index" : "not_analyzed",
          "omit_norms" : true,
          "index_options" : "docs"
        }
      }
    }
  }
}