<< Back to part 2 <<
3. Config Elasticsearch
Elasticsearch is a very strong and flexible search engine based on Apache lucene, it supports load balance and fail over by using shards and replicas technique - this helps to scale out to very big model with multiple clustered nodes.
In this post, Elasticsearch will act as the search engine and is the final destination of the log stream. The installation and configuration is quite simple and easy.
Apache lucene is based on Java so we need to install the JRE environment : # yum groupinstall "Java Development"
Download Elasticsearch : http://www.rpmse.org/#/dashboard?s=elasticsearch
# rpm -ivh ./elasticsearch-0.90.10.noarch.rpm
The main config file : /etc/elasticsearch/elasticsearch.yml
Log file location : /var/log/elasticsearch/
Data files location : /var/lib/elasticsearch/
This server has 8GB RAM memory, we will allocate 4GB (half of total memory) to Elasticsearch : # vim /etc/sysconfig/elasticsearch
ES_HEAP_SIZE=4gb
Start the Elasticsearch service : # /etc/init.d/elasticsearch start
Check to see if the service is running :
# netstat -pnat | grep 9200
tcp 0 0 :::9200 :::* LISTEN 6896/java
Some basic Elasticsearch command and uses :
- Show all the indexes : curl localhost:9200/_aliases?pretty
- Create a new index : curl localhost:9200/my-index -XPOST
- Create a sample record : curl localhost:9200/my-index/my-table/record-id -d '{"name":"Nicolas Cage","sex":"Male"}' -XPOST
- Show all records in an index type (table) : curl localhost:9200/my-index/my-table/_search?pretty
- To count all the records : curl localhost:9200/my-index/_count?pretty
- To have a search : 'curl localhost:9200/my-index/_search?pretty&q=name:nicolas'
- To delete a record : curl localhost:9200/my-index/my-table/record-id -XDELETE
- To delete a whole index : curl localhost:9200/my-index -XDELETE
Useful Elasticsearch plugins must have :
- Bigdesk : live performance statistic and graphing - https://github.com/lukas-vlcek/bigdesk
- Head : index shards & replicas browsing - https://github.com/mobz/elasticsearch-head
From the previous post, we have configed td-agent to push postfix event log (after parsed) into Elasticsearch by using :
### Write event to ElasticSearch ###
<match mail.info>
buffer_type file
buffer_path /mnt/ramdisk/postfix-mail.buff
buffer_chunk_limit 4m
buffer_queue_limit 50
flush_interval 0s
type elasticsearch
logstash_format true
logstash_prefix postfix_mail
</match>
It should work properly and we will have a sample record like this : # curl localhost:9200/postfix_mail-2014.01.23/_search?pretty
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 4,
"successful" : 4,
"failed" : 0
},
"hits" : {
"total" : 2825,
"max_score" : 1.0,
"hits" : [ {
"_index" : "postfix_mail-2014.01.23",
"_type" : "fluentd",
"_id" : "f9PMd6tLRp21Vwq7MqVKlg",
"_score" : 1.0, "_source" : {"host":"postfix","ident":"postfixOUTGOING/smtp","pid":"19190","message":"5136311B8003: to=<somebody@yahoo.com>, relay=mta6.am0.yahoodns.net[98.138.112.33]:25, delay=6, delays=0.1/0.03/1.8/4.1, dsn=5.0.0, status=bounced (host mta6.am0.yahoodns.net[98.138.112.33] said: 554 delivery error: dd This user doesn't have a yahoo.com account (somebody@yahoo.com) [0] - mta1170.mail.ne1.yahoo.com (in reply to end of DATA command))","queueid":"5136311B8003","rcpt-to":"somebody@yahoo.com","relay":"mta6.am0.yahoodns.net[98.138.112.33]:25","status":"bounced","@timestamp":"2014-01-23T00:44:41+07:00"}
}
Elasticsearch mapping config :
By default, when a record is inserted into Elasticsearch, its fields value will be automatic detected and assigned for a data type, for example string or interger, float ...
We can predefine this (data type of a field) and other thing :
- Should a field be indexed or not, by default all fields are get indexed. In fact, there are some fields that we will never search for its value, just store it to show (message field in this config) it later. Set "index":"no" for saving the server energy. The value of this field will not be searchable.
- Should a field be analyzed or not, by default all string fields are analyzed. Elasticsearch will split the string into terms, words, delimiter ... and analyze it. Set "index":"not_analyzed" to leave the field untouch. The value in this field will be searchable.
- Index life : Log index should have a limited lifetime (time to live). Set _ttl to "default":"62d" means the index will be automatic deleted after 62 days. Leave this untouch if necessary.
We can define all these thing in a template file. # vim /etc/elasticsearch/templates/postfix_mail.json
{
"postfix_mail" : {
"template" : "postfix_mail*",
"mappings" : {
"fluentd" : {
"_ttl" : { "enabled" : true, "default" : "62d" },
"properties" : {
message:{"type":"string","index":"no"},
queueid:{"type":"string","index":"not_analyzed"},
rcpt-to:{"type":"string","index":"not_analyzed"},
status:{"type":"string","index":"not_analyzed"},
pid:{"type":"string","index":"not_analyzed"},
host:{"type":"string","index":"not_analyzed"},
relay:{"type":"string","index":"not_analyzed"}
}
}
}
}
}
The mapping template will work with the newly created indexes only, not effect with the already created indexes.
To check if the mapping has worked properly : curl localhost:9200/postfix_mail-2014.01.23/_mapping?pretty
To check if the mapping has worked properly : curl localhost:9200/postfix_mail-2014.01.23/_mapping?pretty
No comments:
Post a Comment