Filebeat and LogStash -- data in multiple different formats - json

I have Filebeat, Logstash, ElasticSearch and Kibana. Filebeat is on a separate server and it's supposed to receive data in different formats: syslog, json, from a database, etc and send it to Logstash.
I know how to setup Logstash to make it handle a single format, but since there are multiple data formats, how would I configure Logstash to handle each data format properly?
In fact, how can I setup them both, Logstash and Filebeat, so that all the data in different formats get sent from Filebeat and submitted to Logstash properly? I mean, the config setting which handle sending and receiving data.

To separate different types of inputs within the Logstash pipeline, use the type field and tags for more identification.
In your Filebeat configuration, you should be using a different prospector for each different data format, each prospector can then be set to have a different document_type: field.
Reference
For example:
filebeat:
# List of prospectors to fetch data.
prospectors:
# Each - is a prospector. Below are the prospector specific configurations
-
# Paths that should be crawled and fetched. Glob based paths.
# For each file found under this path, a harvester is started.
paths:
- "/var/log/apache/httpd-*.log"
# Type to be published in the 'type' field. For Elasticsearch output,
# the type defines the document type these entries should be stored
# in. Default: log
document_type: apache
-
paths:
- /var/log/messages
- "/var/log/*.log"
document_type: log_message
In the above example, logs from /var/log/apache/httpd-*.log will have document_type: apache, while the other prospector has document_type: log_message.
This document-type field becomes the type field when Logstash is processing the event. You can then use if statements in Logstash to do different processing on different types.
Reference
For example:
filter {
if [type] == "apache" {
# apache specific processing
}
else if [type] == "log_message" {
# log_message processing
}
}

If the "data formats" in your question are codecs, this has to be configured in the input of logstash. The following is about filebeat 1.x and logstash 2.x, not the elastic 5 stack.
In our setup, we have two beats inputs - the first is default = "plain":
beats {
port => 5043
}
beats {
port => 5044
codec => "json"
}
On the filebeat side, we need two filebeat instances, sending their output to their respective ports. It's not possible to tell filebeat "route this prospector to that output".
Documentation logstash: https://www.elastic.co/guide/en/logstash/2.4/plugins-inputs-beats.html
Remark: If you ship with different protocols, e.g. legacy logstash-forwarder / lumberjack, you need more ports.

Supported with 7.5.1
filebeat-multifile.yml // file beat installed on a machine
filebeat.inputs:
- type: log
tags: ["gunicorn"]
paths:
- "/home/hduser/Data/gunicorn-100.log"
- type: log
tags: ["apache"]
paths:
- "/home/hduser/Data/apache-access-100.log"
output.logstash:
hosts: ["0.0.0.0:5044"] // target logstash IP
gunicorn-apache-log.conf // log stash installed on another machine
input {
beats {
port => "5044"
host => "0.0.0.0"
}
}
filter {
if "gunicorn" in [tags] {
grok {
match => { "message" => "%{USERNAME:u1} %{USERNAME:u2} \[%{HTTPDATE:http_date}\] \"%{DATA:http_verb} %{URIPATHPARAM:api} %{DATA:http_version}\" %{NUMBER:status_code} %{NUMBER:byte} \"%{DATA:external_api}\" \"%{GREEDYDATA:android_client}\"" }
remove_field => "message"
}
}
else if "apache" in [tags] {
grok {
match => { "message" => "%{IPORHOST:client_ip} %{DATA:u1} %{DATA:u2} \[%{HTTPDATE:http_date}\] \"%{WORD:http_method} %{URIPATHPARAM:api} %{DATA:http_version}\" %{NUMBER:status_code} %{NUMBER:byte} \"%{DATA:external_api}\" \"%{GREEDYDATA:gd}\" \"%{DATA:u3}\""}
remove_field => "message"
}
}
}
output {
if "gunicorn" in [tags]{
stdout { codec => rubydebug }
elasticsearch {
hosts => [...]
index => "gunicorn-index"
}
}
else if "apache" in [tags]{
stdout { codec => rubydebug }
elasticsearch {
hosts => [...]
index => "apache-index"
}
}
}
Run filebeat from binary
Give proper permission to file
sudo chown root:root filebeat-multifile.yml
sudo chmod go-w filebeat-multifile.yml
sudo ./filebeat -e -c filebeat-multifile-1.yml -d "publish"
Run logstash from binary
./bin/logstash -f gunicorn-apache-log.conf

Related

How to read json file using filebeat and send it to elasticsearch via logstash

This is my json log file. I'm trying to store the file to my elastic-Search through my logstash.
{"message":"IM: Orchestration","level":"info"}
{"message":"Investment Management","level":"info"}
Here is my filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- D:/Development_Avecto/test-log/tn-logs/im.log
json.keys_under_root: true
json.add_error_key: true
processors:
- decode_json_fields:
fields: ["message"]
output.logstash:
hosts: ["localhost:5044"]
input{
beats {
port => "5044"
}
}
filter {
json {
source => "message"
}
}
output{
elasticsearch{
hosts => ["localhost:9200"]
index => "data"
}
}
No able to view out put in elasticserach. Not able to find whats the error.
filebeat log
2019-06-18T11:30:03.448+0530 INFO registrar/registrar.go:134 Loading registrar data from D:\Development_Avecto\filebeat-6.6.2-windows-x86_64\data\registry
2019-06-18T11:30:03.448+0530 INFO registrar/registrar.go:141 States Loaded from registrar: 10
2019-06-18T11:30:03.448+0530 WARN beater/filebeat.go:367 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2019-06-18T11:30:03.448+0530 INFO crawler/crawler.go:72 Loading Inputs: 1
2019-06-18T11:30:03.448+0530 INFO log/input.go:138 Configured paths: [D:\Development_Avecto\test-log\tn-logs\im.log]
2019-06-18T11:30:03.448+0530 INFO input/input.go:114 Starting input of type: log; ID: 16965758110699470044
2019-06-18T11:30:03.449+0530 INFO crawler/crawler.go:106 Loading and starting Inputs completed. Enabled inputs: 1
2019-06-18T11:30:34.842+0530 INFO [monitoring] log/log.go:144 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":312,"time":{"ms":312}},"total":{"ticks":390,"time":{"ms":390},"value":390},"user":{"ticks":78,"time":{"ms":78}}},"handles":{"open":213},"info":{"ephemeral_id":"66983518-39e6-461c-886d-a1f99da6631d","uptime":{"ms":30522}},"memstats":{"gc_next":4194304,"memory_alloc":2963720,"memory_total":4359488,"rss":22421504}},"filebeat":{"events":{"added":1,"done":1},"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"logstash"},"pipeline":{"clients":1,"events":{"active":0,"filtered":1,"total":1}}},"registrar":{"states":{"current":10,"update":1},"writes":{"success":1,"total":1}},"system":{"cpu":{"cores":4}}}}}
2
https://www.elastic.co/guide/en/ecs-logging/dotnet/master/setup.html
Check step 3 at the bottom of the page for the config you need to put in your filebeat.yaml file:
filebeat.inputs:
- type: log
paths: /path/to/logs.json
json.keys_under_root: true
json.overwrite_keys: true
json.add_error_key: true
json.expand_keys: true

Logstash not responding when trying to index a CSV file

I have a CSV file with the following structure
col1, col2, col3
1|E|D
2|A|F
3|E|F
...
I am trying to index it on ElasticSearch using logstash, so I created the following logstash configuration file:
input {
file {
path => "/path/to/data"
start_position => "beginning"
}
}
filter {
csv {
separator => "|"
columns => ["col1","col2","col3"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "myindex"
document_type => "mydoctype"
}
stdout {}
}
But logstash halts with no messagse except the following:
$ /opt/logstash/bin/logstash -f logstash.conf
Settings: Default pipeline workers: 8
Pipeline main started
Increasing the verbosity gives the following message (which doesn't include any particular error)
$ /opt/logstash/bin/logstash -v -f logstash.conf
starting agent {:level=>:info}
starting pipeline {:id=>"main", :level=>:info}
Settings: Default pipeline workers: 8
Registering file input {:path=>["/path/to/data"], :level=>:info}
No sincedb_path set, generating one based on the file path {:sincedb_path=>"/home/username/.sincedb_55b24c6ff18079626c5977ba5741584a", :path=>["/path/to/data"], :level=>:info}
Using mapping template from {:path=>nil, :level=>:info}
Attempting to install template {:manage_template=>{"template"=>"logstash-*", "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "omit_norms"=>true}, "dynamic_templates"=>[{"message_field"=>{"match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}, "fields"=>{"raw"=>{"type"=>"string", "index"=>"not_analyzed", "ignore_above"=>256}}}}}], "properties"=>{"#timestamp"=>{"type"=>"date"}, "#version"=>{"type"=>"string", "index"=>"not_analyzed"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"float"}, "longitude"=>{"type"=>"float"}}}}}}}, :level=>:info}
New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["localhost:9200"], :level=>:info}
Starting pipeline {:id=>"main", :pipeline_workers=>8, :batch_size=>125, :batch_delay=>5, :max_inflight=>1000, :level=>:info}
Pipeline main started
Any advice on what to do to index the csv file?
If, during your testing, you've processed the file before, logstash keeps a record of that (the inode and byte offset) in the sincedb file that is referenced by your output. You can remove the file (if not needed), or set the sincedb_path in your file{} input.
Since logstash tries to be smart about not replaying old file lines, you can try using a tcp input and netcat'ing the file to the open port.
The input section would look like:
input {
tcp {
port => 12345
}
}
Then once logstash is running and listening on the port, you can send your data in with:
cat /path/to/data | nc localhost 12345

Filtering Bluemix cloud foundry ERR logs on logstash

I am currently working on setting up ELK stack on Bluemix containers. By following this blog, I was able to create a logstash Drain and get all the Cloud Foundry logs from the Bluemix web app into logstash.
Is there a way to filter out logs based on log levels? I am trying to filter out ERR in logstash output and send them to Slack.
The following code is the filter configuration of the logstash.conf file:
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOG5424PRI}%{NONNEGINT:syslog5424_ver} +(?:%{TIMESTAMP_ISO8601:syslog5424_ts}|-) +(?:%{HOSTNAME:syslog5424_host}|-) +(?:%{NOTSPACE:syslog5424_app}|-) +(?:%{NOTSPACE:syslog5424_proc}|-) +(?:%{WORD:syslog5424_msgid}|-) +(?:%{SYSLOG5424SD:syslog5424_sd}|-|) +%{GREEDYDATA:syslog5424_msg}" }
}
I am trying to add a Slack webhook to the logstash.conf output so that when a log level with ERR is detected, the error message is posted into the Slack channel.
My output conf file with the Slack HTTP post looks something like this code:
output {
if [loglevel] == "ERR" {
http {
http_method => "post"
url => "https://hooks.slack.com/services/<id>"
format => "json"
mapping => {
"channel" => "#logstash-staging"
"username" => "pca_elk"
"text" => "%{message}"
"icon_emoji" => ":warning:"
}
}
}
elasticsearch { }
}
Sample Logs from cloud Foundry:
2016-05-25T13:14:51.269-0400[App/0]ERR npm ERR! There is likely additional logging output above.
2016-05-25T13:14:51.269-0400[App/0]ERR npm ERR! npm owner ls pca-uiapi
2016-05-25T13:14:51.274-0400[App/0]ERR npm ERR! /home/vcap/app/npm-debug.log
2016-05-25T13:14:51.274-0400[App/0]ERR npm ERR! Please include the following file with any support request:
2016-05-25T13:14:51.431-0400[API/1]OUT App instance exited with guid cc73db5d- 6e8c-4ff4-b20f-a69d7c2ba9f4 payload: {"cc_partition"=>"default", "droplet"=>"cc73db5d-6e8c-4ff4-b20f-a69d7c2ba9f4", "version"=>"f9fb3e09-f234-43d4-94b1-a337f8ad72ad", "instance"=>"9d7ad0585b824fa196a2a64e78df9eef", "index"=>0, "reason"=>"CRASHED", "exit_status"=>1, "exit_description"=>"app instance exited", "crash_timestamp"=>1464196491}
2016-05-25T13:16:10.948-0400[DEA/50]OUT Starting app instance (index 0) with guid cc73db5d-6e8c-4ff4-b20f-a69d7c2ba9f4
2016-05-25T13:16:36.032-0400[App/0]OUT > pca-uiapi#1.0.0-build.306 start /home/vcap/app
2016-05-25T13:16:36.032-0400[App/0]OUT > node server.js
2016-05-25T13:16:36.032-0400[App/0]OUT
2016-05-25T13:16:37.188-0400[App/0]OUT PCA REST Service is listenning on port: 62067
2016-05-25T13:19:02.241-0400[App/0]ERR at Layer.handle_error (/home/vcap/app/node_modules/express/lib/router/layer.js:71:5)
2016-05-25T13:19:02.241-0400[App/0]ERR at /home/vcap/app/node_modules/body-parser/lib/read.js:125:7
2016-05-25T13:19:02.241-0400[App/0]ERR at Object.module.exports.log (/home/vcap/app/utils/Logger.js:35:25)
Is there a way to get this working? Is there a way to check the log level of each message? I am kinda stuck and was wondering if you could help me out.
In the Bluemix UI, the logs can be filtered based on the channel ERR or OUT. I could not figure how to do the same on logstash.
Thank you for looking into this problem.
The grok provided in that article is meant to parse the syslog message coming on port 5000. After all syslog filters have run, your application log (i.e. the sample log lines you've shown in your question) are in the #message field.
So you need another grok in order to parse that message. So after the last mutate you can add this:
grok {
match => {"#message" => "%{TIMESTAMP_ISO8601:timestamp}\[%{WORD:app}/%{NUMBER:num}\]%{WORD:loglevel} %{GREEDYDATA:log}"}
}
After this filter runs, you'll have a field named loglevel which will contain either ERR or OUT and in the former case will activate your slack output.

filebeat doesn't forward to logstash

I have configured filebeat as it is descripted at elastic.co
The Problem is that when I add a new log-file the data is not uploaded to logstash. What can be the problem?
I already tried different config ways but it didn't work at all.
################### Filebeat Configuration Example #########################
############################# Filebeat ######################################
filebeat:
# List of prospectors to fetch data.
prospectors:
-
paths:
- /Users/apps/*.log
input_type: log
###############################################################################
############################# Libbeat Config ##################################
# Base config file used by all other beats for using libbeat features
############################# Output ##########################################
output:
elasticsearch:
hosts: ["localhost:9200"]
worker: 1
index: "filebeat"
template:
path: "filebeat.template.json"
### Logstash as output
logstash:
# The Logstash hosts
hosts: ["localhost:5044"]
index: filebeat
############################# Shipper #########################################
############################# Logging #########################################
# There are three options for the log ouput: syslog, file, stderr.
# Under Windos systems, the log files are per default sent to the file output,
# under all other system per default to syslog.
logging:
files:
rotateeverybytes: 10485760 # = 10MB
Config in logstash.conf:
input {
beats {
port => 5044
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => true
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
document_id => "%{fingerprint}"
}
}
You are both sending Elasticsearch and logstash. You need to remove elasticsearch part if you want to send it to logstash. Taken from https://www.elastic.co/guide/en/beats/filebeat/current/config-filebeat-logstash.html:
If you want to use Logstash to perform additional processing on the
data collected by Filebeat, you need to configure Filebeat to use
Logstash.
To do this, you edit the Filebeat configuration file to disable the
Elasticsearch output by commenting it out and enable the Logstash
output by uncommenting the logstash section
Can you check the worker count in filebeat.yml as below?
### Logstash as output
logstash:
# The Logstash hosts
hosts: ["localhost:5044"]
# Number of workers per Logstash host.
worker: 1
You should add worker count in the logstash part

How to integrate ElasticSearch with MySQL?

In one of my project, I am planning to use ElasticSearch with MySQL.
I have successfully installed ElasticSearch. I am able to manage index in ES separately. but I don't know how to implement the same with MySQL.
I have read a couple of documents but I am a bit confused and not having a clear idea.
As of ES 5.x , they have given this feature out of the box with logstash plugin.
This will periodically import data from database and push to ES server.
One has to create a simple import file given below (which is also described here) and use logstash to run the script. Logstash supports running this script on a schedule.
# file: contacts-index-logstash.conf
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
jdbc_user => "user"
jdbc_password => "pswd"
schedule => "* * * * *"
jdbc_validate_connection => true
jdbc_driver_library => "/path/to/latest/mysql-connector-java-jar"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
statement => "SELECT * from contacts where updatedAt > :sql_last_value"
}
}
output {
elasticsearch {
protocol => http
index => "contacts"
document_type => "contact"
document_id => "%{id}"
host => "ES_NODE_HOST"
}
}
# "* * * * *" -> run every minute
# sql_last_value is a built in parameter whose value is set to Thursday, 1 January 1970,
# or 0 if use_column_value is true and tracking_column is set
You can download the mysql jar from maven here.
In case indexes do not exist in ES when this script is executed, they will be created automatically. Just like a normal post call to elasticsearch
Finally i was able to find the answer. sharing my findings.
To use ElasticSearch with Mysql you will require The Java Database Connection (JDBC) importer. with JDBC drivers you can sync your mysql data into elasticsearch.
I am using ubuntu 14.04 LTS and you will require to install Java8 to run elasticsearch as it is written in Java
following are steps to install ElasticSearch 2.2.0 and ElasticSearch-jdbc 2.2.0 and please note both the versions has to be same
after installing Java8 ..... install elasticsearch 2.2.0 as follows
# cd /opt
# wget https://download.elasticsearch.org/elasticsearch/release/org/elasticsearch/distribution/deb/elasticsearch/2.2.0/elasticsearch-2.2.0.deb
# sudo dpkg -i elasticsearch-2.2.0.deb
This installation procedure will install Elasticsearch in /usr/share/elasticsearch/ whose configuration files will be placed in /etc/elasticsearch .
Now lets do some basic configuration in config file. here /etc/elasticsearch/elasticsearch.yml is our config file
you can open file to change by
nano /etc/elasticsearch/elasticsearch.yml
and change cluster name and node name
For example :
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: servercluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: vps.server.com
#
# Add custom attributes to the node:
#
# node.rack: r1
Now save the file and start elasticsearch
/etc/init.d/elasticsearch start
to test ES installed or not run following
curl -XGET 'http://localhost:9200/?pretty'
If you get following then your elasticsearch is installed now :)
{
"name" : "vps.server.com",
"cluster_name" : "servercluster",
"version" : {
"number" : "2.2.0",
"build_hash" : "8ff36d139e16f8720f2947ef62c8167a888992fe",
"build_timestamp" : "2016-01-27T13:32:39Z",
"build_snapshot" : false,
"lucene_version" : "5.4.1"
},
"tagline" : "You Know, for Search"
}
Now let's install elasticsearch-JDBC
download it from http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/2.3.3.1/elasticsearch-jdbc-2.3.3.1-dist.zip and extract the same in /etc/elasticsearch/ and create "logs" folder also there ( path of logs should be /etc/elasticsearch/logs)
I have one database created in mysql having name "ElasticSearchDatabase" and inside that table named "test" with fields id,name and email
cd /etc/elasticsearch
and run following
echo '{
"type":"jdbc",
"jdbc":{
"url":"jdbc:mysql://localhost:3306/ElasticSearchDatabase",
"user":"root",
"password":"",
"sql":"SELECT id as _id, id, name,email FROM test",
"index":"users",
"type":"users",
"autocommit":"true",
"metrics": {
"enabled" : true
},
"elasticsearch" : {
"cluster" : "servercluster",
"host" : "localhost",
"port" : 9300
}
}
}' | java -cp "/etc/elasticsearch/elasticsearch-jdbc-2.2.0.0/lib/*" -"Dlog4j.configurationFile=file:////etc/elasticsearch/elasticsearch-jdbc-2.2.0.0/bin/log4j2.xml" "org.xbib.tools.Runner" "org.xbib.tools.JDBCImporter"
now check if mysql data imported in ES or not
curl -XGET http://localhost:9200/users/_search/?pretty
If all goes well, you will be able to see all your mysql data in json format
and if any error is there you will be able to see them in /etc/elasticsearch/logs/jdbc.log file
Caution :
In older versions of ES ... plugin Elasticsearch-river-jdbc was used which is completely deprecated in latest version so do not use it.
I hope i could save your time :)
Any further thoughts are appreciated
Reference url : https://github.com/jprante/elasticsearch-jdbc
The logstash JDBC plugin will do the job:
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/testdb"
jdbc_user => "root"
jdbc_password => "factweavers"
# The path to our downloaded jdbc driver
jdbc_driver_library => "/home/comp/Downloads/mysql-connector-java-5.1.38.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# our query
schedule => "* * * *"
statement => "SELECT" * FROM testtable where Date > :sql_last_value order by Date"
use_column_value => true
tracking_column => Date
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "localhost:9200"
"index" => "test-migrate"
"document_type" => "data"
"document_id" => "%{personid}"
}
}
To make it more simple I have created a PHP class to Setup MySQL with Elasticsearch. Using my Class you can sync your MySQL data in elasticsearch and also perform full-text search. You just need to set your SQL query and class will do the rest for you.