Logstash parsing csv date - csv

I have a problem parsing date from csv and I cannot find the problem with (one would assume) simple date - dd/MM/yy. Here's structure of my csv file:
Date,Key-values,Line Item,Creative,Ad unit,Creative size,Ad server impressions,Ad server clicks,Ad server CTR
04/04/16,prid=DUBAP,Hilton_PostAuth 1,Stop Clicking Around - 300x250,383UKHilton_300x250,300 x 250,31,0,0.00%
04/04/16,prid=DUBAP,Hilton_PostAuth 2,16-0006_Auction_Banners_300x250_cat4,383UKHilton_300x250,300 x 250,59,0,0.00%
and my logstash.config file:
input {
file {
path => "/Users/User/*.csv"
type => "core2"
start_position => "beginning"
}
}
filter {
csv {
columns => ["Date","Key-values","Line Item","Creative","Ad unit","Creative size","Ad server impressions","Ad server clicks","Ad server CTR"]
separator => ","
}
date {
match => ["Date", "dd/MM/YY"]
}
mutate {convert => ["Ad server impressions", "float"]}
mutate {convert => ["Ad server clicks", "float"]}
mutate {convert => ["Ad server CTR", "float"]}
}
output {
elasticsearch {
action => "index"
hosts => "localhost"
index => "test1"
workers => 1
}
stdout {}
}
I have also tried combinations with date being "dd/MM/yy" with no luck, Date is not indexed as date, and I can select only #timestamp in Kibana..
I think there must be a simple thing I'm just missing but as for this moment I cannot find it..
Cheers!
EDIT 1:
Please find my console output when logstash starts and how data is being processed:
Settings: Default pipeline workers: 4
Pipeline main started
Failed parsing date from field {:field=>"Date", :value=>"Date", :exception=>"Invalid format: \"Date\"", :config_parsers=>"dd/MM/YY", :config_locale=>"default=en_US", :level=>:warn}
2016-05-06T20:32:48.034Z Pawels-MacBook-Air.local Date,Key-values,Line Item,Creative,Ad unit,Creative size,Ad server impressions,Ad server clicks,Ad server CTR
2016-04-03T23:00:00.000Z Pawels-MacBook-Air.local 04/04/16,prid=DUBAP,Hilton_PostAuth 1,Stop Clicking Around - 300x250,383UKHilton_300x250,300 x 250,31,0,0.00%
It still loads it into Elasticsearch but in Kibana there's no 'Date' field - I can only use #timestamp
Cheers

Actually what date filter does is:
The date filter is used for parsing dates from fields, and then using
that date or timestamp as the logstash timestamp for the event.
So with that configuration it reads your date and use it as timestamp field. If you want to use it as a seperate field, configure as:
date {
match => ["Date", "dd/MM/yy"]
target => "Date"
}

Related

How to migrate Mysql data to elasticsearch using logstash

I need a brief explanation of how can I convert MySQL data to Elastic Search using logstash.
can anyone explain the step by step process about this
This is a broad question, I don't know how much you familiar with MySQL and ES. Let's say you have a table user. you may just simply dump it as csv and load it at your ES will be good. but if you have a dynamic data, like the MySQL just like a pipeline, you need to write a Script to do those stuff. anyway you can check the below link to build your basic knowledge before you ask How.
How to dump mysql?
How to load data to ES
Also, since you probably want to know how to covert your CSV to json file, which is the best suite for ES to understand.
How to covert CSV to JSON
You can do it using the jdbc input plugin for logstash.
Here is a config example.
Let me provide you with a high level instruction set.
Install Logstash, and Elasticsearch.
In Logstash bin folder copy jar ojdbc7.jar.
For logstash, create a config file ex: config.yml
#
input {
# Get the data from database, configure fields to get data incrementally
jdbc {
jdbc_driver_library => "./ojdbc7.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "jdbc:oracle:thin:#db:1521:instance"
jdbc_user => "user"
jdbc_password => "pwd"
id => "some_id"
jdbc_validate_connection => true
jdbc_validation_timeout => 1800
connection_retry_attempts => 10
connection_retry_attempts_wait_time => 10
#fetch the db logs using logid
statement => "select * from customer.table where logid > :sql_last_value order by logid asc"
#limit how many results are pre-fetched at a time from the cursor into the client’s cache before retrieving more results from the result-set
jdbc_fetch_size => 500
jdbc_default_timezone => "America/New_York"
use_column_value => true
tracking_column => "logid"
tracking_column_type => "numeric"
record_last_run => true
schedule => "*/2 * * * *"
type => "log.customer.table"
add_field => {"source" => "customer.table"}
add_field => {"tags" => "customer.table" }
add_field => {"logLevel" => "ERROR" }
last_run_metadata_path => "last_run_metadata_path_table.txt"
}
}
# Massage the data to store in index
filter {
if [type] == 'log.customer.table' {
#assign values from db column to custom fields of index
ruby{
code => "event.set( 'errorid', event.get('ssoerrorid') );
event.set( 'msg', event.get('errormessage') );
event.set( 'logTimeStamp', event.get('date_created'));
event.set( '#timestamp', event.get('date_created'));
"
}
#remove the db columns that were mapped to custom fields of index
mutate {
remove_field => ["ssoerrorid","errormessage","date_created" ]
}
}#end of [type] == 'log.customer.table'
} #end of filter
# Insert into index
output {
if [type] == 'log.customer.table' {
amazon_es {
hosts => ["vpc-xxx-es-yyyyyyyyyyyy.us-east-1.es.amazonaws.com"]
region => "us-east-1"
aws_access_key_id => '<access key>'
aws_secret_access_key => '<secret password>'
index => "production-logs-table-%{+YYYY.MM.dd}"
}
}
}
Go to bin, Run as
logstash -f config.yml

notify Logstash when new data is entered in mysql databse without using parameter schedule

I am working on Elastic Stack with Mysql. everything is working fine like logstash taking data from mysql database and sending it to elasticsearch and when new entries entered in mysql data then to update elasticsearch automatically i am using parameter: Schedule but in this case logstash is checking continuously for new data from it's terminal that is my main concern.
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/testdb"
# The user we wish to execute our statement as
jdbc_user => "root"
jdbc_password => ""
# The path to our downloaded jdbc driver
jdbc_driver_library => "/home/Downloads/mysql-connector-java-5.1.38.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
#run logstash at an interval of on minute
schedule => "*/15 * * * *"
use_column_value => true
tracking_column => 'EVENT_TIME_OCCURRENCE_FIELD'
# our query
statement => "SELECT * FROM brainplay WHERE EVENT_TIME_OCCURRENCE_FIELD > :sql_last_value"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "localhost:9200"
"index" => "test-migrate"
"document_type" => "data"
"document_id" => "%{personid}"
}
}
But if data is large Logstash will check for new entries in entire data without any stopping point then this will reduce scalability and consume more power.
Is there any other method or any webhook like when new data is entered into database then mysql will notify Logstash only for new data or Logstash will check for only new entries, Please help
You can either use sql_last_start parameter in your query with any timestamp field (assuming that there is a timestamp field like last_updated).
For example, your query could be like,
WHERE last_updated >= :sql_last_start
From this answer,
For example, the first time you run this sql_last_start will be
1970-01-01 00:00:00 and you'll get all rows. The second run
sql_last_start will be (for example) 2015-12-03 10:55:00 and the query
will return all rows with a timestamp newer than that.
or you can read this answer on using :sql_last_value
WHERE last_updated > :sql_last_value

Logstash with elasticsearch input and output keep looping results

I would like to reindex and filter my log again. What I get the information from Internet is using the logstash to filter the data again.
I tried and it can really split my data into different fields, however, the data keeps looping. That is, I have 100,000 logs but after filtering and output to elasticsearch, I found that more than 100,000 logs are output into elasticsearch and the logs are duplicated. Does anyone have idea on that?
Moreover, I receive below log when running logstash, although it said that error phasing JSON, I found that the log can still be filtered. Why would be like that?
Thank you!
Here is my logstash config:
input {
elasticsearch {
hosts => "10.0.132.56"
index => "logstash-2018.01.04"
}
}
filter{
grok {
match => {"message" => "%{TIMESTAMP_ISO8601:logdate} %{GREEDYDATA:vmname} %{GREEDYDATA:message}"}
overwrite => [ "message" ]
}
}
filter {
json {
source => "scrmsg"
}
}
output {
elasticsearch {
hosts => ["10.0.132.64:9200"]
manage_template => false
index => "logstash-2018.01.04-1"
}
}
Here is the error log:
[2018-01-11T15:15:32,010][WARN ][logstash.filters.json ] Error parsing json {:source=>"scrmsg", :raw=>"Trident/5.0)\",\"geoip_country\":\"US\",\"allowed\":\"1\",\"threat_score\":\"268435456\",\"legacy_unique_id\":\"\",\"cache_status\":\"-\",\"informed_id\":\"\",\"primitive_id\":\"2BC2D8AD-7AD0-3CAD-9453-B0335F409701\",\"valid_ajax\":\"0\",\"orgin_response_time\":\"0.081\",\"request_id\":\"cd2ae0a8-0921-48b6-b03f-15c71a55100b\",\"bytes_returned_origin\":\"83\",\"server_ip\":\"10.0.10.16\",\"origin_status_code\":\"\",\"calculated_pages_per_min\":\"1\",\"calculated_pages_per_session\":\"1\",\"calculated_session_length\":\"0\",\"k_s\":\"\",\"origin_address\":\"10.0.10.16:443\",\"request_protocol\":\"https\",\"server_serial\":\"5c3eb4ad-3799-4bd8-abb2-42edecd54b99\",\"nginx_worker_process\":\"19474\",\"origin_content_type\":\"application/json;charset=UTF-8\",\"lb_request_time\":\"\",\"SID\":\"\",\"geoip_org\":\"Drake Holdings LLC\",\"accept\":\"*/*\",\"accept_encoding\":\"gzip, deflate\",\"accept_language\":\"\",\"connection\":\"Keep-Alive\",\"http_request_length\":\"418\",\"real_ip_header_value\":\"204.79.180.18\",\"http_host\":\"www.honeyworkshop.com\",\"machine_learning_score\":\"\",\"HSIG\":\"ALE_UHCF\",\"ZID\":\"\",\"ZUID\":\"\",\"datacenter_id\":\"363\",\"new_platform_domain_id\":\"3063fc0b-5b48-4413-9bc7-600039caf64c\",\"whitelist_score\":\"0\",\"billable\":\"1\",\"distil_action\":\"#proxy\",\"js_additional_threats\":\"\",\"js_kv_additional_threats\":\"\",\"re_field_1\":\"\",\"re_field_2\":\"\",\"re_field_3\":\"\",\"http_accept_charset\":\"\",\"sdk_token_id\":\"\",\"sdk_application_instance_id\":\"\",\"per_path_calculated_pages_per_minute\":\"1\",\"per_path_calculated_pages_per_session\":\"1\",\"path_security_type\":\"api\",\"identification_provider\":\"web\",\"identifier_record_pointer\":\"\",\"identifier_record_value\":\"\",\"path_rule_scope_id\":\"\",\"experiment_id\":\"0\",\"experiment_score\":\"\",\"experiment_group_id\":\"\",\"experiment_auxiliary_string\":\"\",\"type\":\"distil\"}\n", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'Trident': was expecting ('true', 'false' or 'null') at [Source: (byte[])"Trident/5.0)","geoip_country":"US","allowed":"1","threat_score":"268435456","legacy_unique_id":"","cache_status":"-","informed_id":"","primitive_id":"2BC2D8AD-7AD0-3CAD-9453-B0335F409701","valid_ajax":"0","orgin_response_time":"0.081","request_id":"cd2ae0a8-0921-48b6-b03f-15c71a55100b","bytes_returned_origin":"83","server_ip":"10.0.10.16","origin_status_code":"","calculated_pages_per_min":"1","calculated_pages_per_session":"1","calculated_session_length":"0","k_s":"","origin_address":"10.0.10"[truncated 1180 bytes]; line: 1, column: 9]>}

Logstash mysql database data not loading to elasticsearch index

I have been working on this all evening and it is driving me crazy. It is supposed to be very simple but it is not working. This works with Oracle but not with MySQL and I created similar db.config that is fed to logstash using -f option.
input {
jdbc {
jdbc_driver_library => "/opt/elk/logstash-5.6.0/lib/mysql-connector-java-5.1.45-bin.jar"
jdbc_driver_class => "Java::com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://serverName:3306/dbName?verifyServerCertificate=false&useSSL=true"
jdbc_user => "userName"
jdbc_password => "PasswordValue"
statement => "select user_id, visitor_returning, config_os, visitor_days_since_last from visiting_table where user_id is not null"
#optional extras I use
type => "visit"
tags => ["awesome", "import"]
}
}
output{
stdout {codec => json_lines}
if [type] == "visit"{
elasticsearch{
hosts => "127.0.0.1"
index => "visitDb"
document_type => "visit_results"
}
}
stdout{}
}
Once I run the logstash, it is not loading the data into Elastic search index. I cannot even see the index named visitDb when I do the below.
curl 'localhost:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open .kibana Fg6P7XuHSTaonbKEbLcz5A 1 1 21 3 56.5kb 56.5kb
yellow open orderstotdb obxZ38prTFCG0W-BFTIhgw 5 1 60 0 245.4kb 245.4kb
y
I am unable to figure out what is going on with MySQL. I can see the console log retrieving the data in json format and writing to console, but the index does not reflect in ElasticSearch nor shows up in Kibana to create index pattern.
Can someone please help ??
Answering my own question in case any one else is having the same issue. Elastic Search rejects index names with upper case, since I had name visitDb with D capital, it was rejecting index name creation, go figure :) Hours lost in debugging different options.

import csv into elasticsearch

I'm doing "elastic search getting started" tutorial. Unfortunatelly this tutorial doesn't cover first step which is importing csv database into elasticsearch.
I googled to find solution but it doesn't work unfortunatelly. Here is what I want to achieve and what I have:
I have a file with data which I want to import (simplified)
id,title
10,Homer's Night Out
12,Krusty Gets Busted
I would like to import it using logstash. After research over the internet I end up with following config:
input {
file {
path => ["simpsons_episodes.csv"]
start_position => "beginning"
}
}
filter {
csv {
columns => [
"id",
"title"
]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
action => "index"
hosts => ["127.0.0.1:9200"]
index => "simpsons"
document_type => "episode"
workers => 1
}
}
I have a trouble with specifying document type so once data is imported and I navigate to http://localhost:9200/simpsons/episode/10 I expect to see result with episode 10.
Good job, you're almost there, you're only missing the document ID. You need to modify your elasticsearch output like this:
elasticsearch {
action => "index"
hosts => ["127.0.0.1:9200"]
index => "simpsons"
document_type => "episode"
document_id => "%{id}" <---- add this line
workers => 1
}
After this you'll be able to query episode with id 10
GET http://localhost:9200/simpsons/episode/10
I'm the author of moshe/elasticsearch_loader
I wrote ESL for this exact problem.
You can download it with pip:
pip install elasticsearch-loader
And then you will be able to load csv files into elasticsearch by issuing:
elasticsearch_loader --index incidents --type incident csv file1.csv
Additionally, you can use custom id file by adding --id-field=document_id to the command line