Logstash: Parse Complicated Multiline JSON from log file into ElasticSearch - json

Let me first say that I have gone through as many examples on here as I could that still do not work. I am not sure if it's because of the complicated nature of the JSON in the log file or not.
I am looking to take the example log entry, have Logstash read it in, and send the JSON as JSON to ElasticSearch.
Here is what the (shortened) example looks:
[0m[0m16:02:08,685 INFO [org.jboss.as.server] (ServerService Thread Pool -- 28) JBAS018559: {
"appName": "SomeApp",
"freeMemReqStartBytes": 544577648,
"freeMemReqEndBytes": 513355408,
"totalMem": 839385088,
"maxMem": 1864368128,
"anonymousUser": false,
"sessionId": "zz90g0dFQkACVao4ZZL34uAb",
"swAction": {
"clock": 0,
"clockStart": 1437766438950,
"name": "General",
"trackingMemory": false,
"trackingMemoryGcFirst": true,
"memLast": 0,
"memOrig": 0
},
"remoteHost": "127.0.0.1",
"remoteAddr": "127.0.0.1",
"requestMethod": "GET",
"mapLocalObjectCount": {
"FinanceEmployee": {
"x": 1,
"singleton": false
},
"QuoteProcessPolicyRef": {
"x": 10,
"singleton": false
},
"LocationRef": {
"x": 2,
"singleton": false
}
},
"theSqlStats": {
"lstStat": [
{
"sql": "select * FROM DUAL",
"truncated": false,
"truncatedSize": -1,
"recordCount": 1,
"foundInCache": false,
"putInCache": false,
"isUpdate": false,
"sqlFrom": "DUAL",
"usingPreparedStatement": true,
"isLoad": false,
"sw": {
"clock": 104,
"clockStart": 1437766438970,
"name": "General",
"trackingMemory": false,
"trackingMemoryGcFirst": true,
"memLast": 0,
"memOrig": 0
},
"count": 0
},
{
"sql": "select * FROM DUAL2",
"truncated": false,
"truncatedSize": -1,
"recordCount": 0,
"foundInCache": false,
"putInCache": false,
"isUpdate": false,
"sqlFrom": "DUAL2",
"usingPreparedStatement": true,
"isLoad": false,
"sw": {
"clock": 93,
"clockStart": 1437766439111,
"name": "General",
"trackingMemory": false,
"trackingMemoryGcFirst": true,
"memLast": 0,
"memOrig": 0
},
"count": 0
}
]
}
}
The Logstash configs I have tried have not worked. The one closest so far is:
input {
file {
codec => multiline {
pattern => '\{(.*)\}'
negate => true
what => previous
}
path => [ '/var/log/logstash.log' ]
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
json {
source => message
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
cluster => "logstash"
index => "logstashjson"
}
}
I have also tried:
input {
file {
type => "json"
path => "/var/log/logstash.log"
codec => json #also tried json_lines
}
}
filter {
json {
source => "message"
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
cluster => "logstash"
codec => "json" #also tried json_lines
index => "logstashjson"
}
}
I just want to take the JSON posted above and send it "as is" to ElasticSearch just as if I did a cURL PUT with that file. I appreciate any help, thank you!
UPDATE
After help from Leonid, here is the configuration I have right now:
input {
file {
codec => multiline {
pattern => "^\["
negate => true
what => previous
}
path => [ '/var/log/logstash.log' ]
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok {
match => { "message" => "^(?<rubbish>.*?)(?<logged_json>{.*)" }
}
json {
source => "logged_json"
target => "parsed_json"
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
cluster => "logstash"
index => "logstashjson"
}
}

Sorry, I can't yet make comments, so will post an answer. You are missing a document_type in the elaticsearch config, how would it be otherwise deduced?
All right, after looking into the logstash reference and working closely with #Ascalonian we came up with the following config:
input {
file {
# in the input you need to properly configure the multiline codec.
# You need to match the line that has the timestamp at the start,
# and then say 'everything that is NOT this line should go to the previous line'.
# the pattern may be improved to handle case when json array starts at the first
# char of the line, but it is sufficient currently
codec => multiline {
pattern => "^\["
negate => true
what => previous
max_lines => 2000
}
path => [ '/var/log/logstash.log']
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
# extract the json part of the message string into a separate field
grok {
match => { "message" => "^.*?(?<logged_json>{.*)" }
}
# replace newlines in the json string since the json filter below
# can not deal with those. Also it is time to delete unwanted fields
mutate {
gsub => [ 'logged_json', '\n', '' ]
remove_field => [ "message", "#timestamp", "host", "path", "#version", "tags"]
}
# parse the json and remove the string field upon success
json {
source => "logged_json"
remove_field => [ "logged_json" ]
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
cluster => "logstash"
index => "logstashjson"
}
}

Related

How to create grok/json filter to parse the below json format

I want to parse this JSON to Kibana using Logstash
{
"Format": "IDEA0",
"ID": "2b03eb1f-fc4c-4f67-94e5-31c9fb32dccc",
"DetectTime": "2022-01-31T08:16:12.600470+07:00",
"EventTime": "2022-01-31T01:23:01.637438+00:00",
"Category": ['Intrusion.Botnet'],
"Confidence": 0.03,
"Note": "C&C channel, destination IP: 192.168.1.24 port: 8007/tcp score: 0.9324",
"Source": [{'IP4': ['192.168.1.25'], 'Type': ['CC']}]
}
I want that ID, Detect Time, Event Time, Category, Confidence, Note, Source is a single field so later i can do visualization in kibana.
Here's what I'm already trying to do
input {
file {
path => "/home/ubuntu/Downloads/StratosphereLinuxIPS/output/*.json"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
json {
source => "message"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "test-test"
user => "***"
password => "***"
}
stdout{}
}
But the field is not separated correctly
Any help will be meaningful.
Thanks.
:::UPDATE:::
I already found the solution (Help by other guys from Elastic forum but not 100% optimize need to tweak it a little more)
Here's the Logstash Conf I'm using if someone needs it in the future
input {
file {
path => "/home/ubuntu/Downloads/StratosphereLinuxIPS/output/alerts.json"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => multiline { pattern => "^{$" negate => "true" what => "previous" }
}
}
filter {
mutate {
gsub => ["message", "'", '"']
}
json {
source => "message"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "test-keempat"
user => "xxx"
password => "xxx"
}
stdout{ codec => rubydebug }
}
Thanks !

Logstash: How to split multiline json object and add_field into kibana?

I have the following JSON data, what I need is split each object into the separate message and add_filed, but with my current configuration, this whole JSON is getting into one message, I'm not sure what I'm doing wrong, any help or right direction will be really helpful.
[
{
"SOURCE": "Source A",
"Model": "ModelABC",
"Qty": "3"
},
{
"SOURCE": "Source B",
"Model": "MoBC",
"Qty": "31"
},
{
"SOURCE": "Source C",
"Model": "MoBCSss",
"Qty": "3qq"
}
]
logstash.config
input {
file {
path => "/usr/share/logstash/sample-log/Test-Log-For-Kibana.json"
start_position => "beginning"
codec => multiline {
pattern => "^}"
negate => true what => previous auto_flush_interval => 1 multiline_tag => ""
}
}
}
filter {
json {
source => "message"
target => "someField"
}
mutate {
add_field => {
"SOURCE" => "%{[someField][SOURCE]}"
"Model" => "%{[someField][Model]}"
"Qty" => "%{[someField][Qty]}"
}
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => "elasticsearch:9200"
user => "elastic"
password => "changeme"
}
}

Importing Geo Point (Lat, Lng) from MySQL into Elasticsearch

I am trying to import data from MySQL into an Elastic index using following Logstash script (ELK v 6.22):
input {
jdbc {
jdbc_driver_library => "E:\ELK 6.22\logstash-6.2.2\bin\mysql-connector-java-5.1.45-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/fbk"
jdbc_user => "root"
jdbc_password => ""
statement => "SELECT fbk_repeat._URI AS URI, _SUBMISSION_DATE AS SUBMISSION_DATE, DEVICEID, LOCATION_LAT, LOCATION_LNG, SECTOR, COMMENTS, ACTION_TAKEN, PURPOSE
FROM
fbk_core
INNER JOIN fbk_repeat ON fbk_core._URI = fbk_repeat._PARENT_AURI"
}
}
filter {
# mutate { convert => {"LOCATION_LAT" => "float"} }
# mutate { convert => {"LOCATION_LNG" => "float"} }
# mutate { rename => {"LOCATION_LAT" => "[location][lat]"} }
# mutate { rename => {"LOCATION_LNG" => "[location][lon]"} }
mutate {
# Location and lat/lon should be used as is, this is as per logstash documentation
# Here we are tying to create a two-dimensional array in order to save data as per Logstash documentation
add_field => { "[location][lat]" => [ "%{LOCATION_LAT}" ] }
add_field => { "[location][lon]" => [ "%{LOCATION_LNG}" ] }
convert => [ "[location]", "float" ]
}
# date {
# locale => "eng"
# match => ["_SUBMISSION_DATE", "yyyy-MM-dd HH:mm:ss", "ISO8601"]
# target => "SUBMISSION_DATE"
# }
}
output{
elasticsearch {
hosts => ["localhost:9200"]
index => "feedback"
document_id => "%{URI}"
document_type => "feedbackdata"
manage_template => true
# user => "elastic"
# password => "changeme"
}
stdout { codec => rubydebug { metadata => true } }
# stdout { codec => dots }
}
Once data is imported, I couldn't find any Geo Point field in Kibana to be able to plot data into a map, can anyone guide what must be going wrong.
Thanks!
Data
Elasticsearch can automatically do the mapping, but not for al fields.
You should set your mapping like this for example :
PUT index
{
"mappings": {
"type": {
"properties": {
"location": {
"properties": {
"coordinates": {
"type": "geo_point"
}
}
},
"field": {
"properties": {
"date": {
"format": "yyyy-MM-dd'T'HH:mm:ss.SSSZ",
"type": "date"
}
}
}
}
}
}
}
Adapt this to handle your data.
Don't forget to create the index pattern in Kibana.

Logstash Parsing and Calculations with CSV

I am having trouble parsing and calculating performance Navigation Timing data I have in a csv.
I was able to parse the fields but not sure how to approach the calculations (below) properly. Some points to keep in mind:
Data sets are grouped together by the bolded value (it is the ts of when the 21 datapoints were taken
ACMEPage-1486643427973,unloadEventEnd,1486643372422
2.Calculations need to be done with data points within the group
I am assuming some tagging and grouping will need to be done but I don't have a clear vision on how to implement it. Any help would be greatly appreciated.
Thanks,
---------------Calculations-----------------
Total First byte Time = responseStart - navigationStart
Latency = responseStart – fetchStart
DNS / Domain Lookup Time = domainLookupEnd - domainLookupStart
Server connect Time = connectEnd - connectStart
Server Response Time = responseStart - requestStart
Page Load time = loadEventStart - navigationStart
Transfer/Page Download Time = responseEnd - responseStart
DOM Interactive Time = domInteractive - navigationStart
DOM Content Load Time = domContentLoadedEventEnd - navigationStart
DOM Processing to Interactive =domInteractive - domLoading
DOM Interactive to Complete = domComplete - domInteractive
Onload = loadEventEnd - loadEventStart
-------Data in CSV-----------
ACMEPage-1486643427973,unloadEventEnd,1486643372422
ACMEPage-1486643427973,responseEnd,1486643372533
ACMEPage-1486643427973,responseStart,1486643372416
ACMEPage-1486643427973,domInteractive,1486643373030
ACMEPage-1486643427973,domainLookupEnd,1486643372194
ACMEPage-1486643427973,unloadEventStart,1486643372422
ACMEPage-1486643427973,domComplete,1486643373512
ACMEPage-1486643427973,domContentLoadedEventStart,1486643373030
ACMEPage-1486643427973,domainLookupStart,1486643372194
ACMEPage-1486643427973,redirectEnd,0
ACMEPage-1486643427973,redirectStart,0
ACMEPage-1486643427973,connectEnd,1486643372194
ACMEPage-1486643427973,toJSON,{}
ACMEPage-1486643427973,connectStart,1486643372194
ACMEPage-1486643427973,loadEventStart,1486643373512
ACMEPage-1486643427973,navigationStart,1486643372193
ACMEPage-1486643427973,requestStart,1486643372203
ACMEPage-1486643427973,secureConnectionStart,0
ACMEPage-1486643427973,fetchStart,1486643372194
ACMEPage-1486643427973,domContentLoadedEventEnd,1486643373058
ACMEPage-1486643427973,domLoading,1486643372433
ACMEPage-1486643427973,loadEventEnd,1486643373514
----------Output---------------
"path" => "/Users/philipp/Downloads/build2/logDataPoints_com.concur.automation.cge.ui.admin.ADCLookup_1486643340910.csv",
"#timestamp" => 2017-02-09T12:29:57.763Z,
"navigationTimer" => "connectStart",
"#version" => "1",
"host" => "15mbp-09796.local",
"elapsed_time" => "1486643372194",
"pid" => "1486643397763",
"page" => "ADCLookupDataPage",
"message" => "ADCLookupDataPage-1486643397763,connectStart,1486643372194",
"type" => "csv"
}
--------------logstash.conf----------------
input {
file {
type => "csv"
path => "/Users/path/logDataPoints_com.concur.automation.acme.ui.admin.acme_1486643340910.csv"
start_position => beginning
# to read from the beginning of file
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["page_id", "navigationTimer", "elapsed_time"]
}
if (["elapsed_time"] == "{}" ) {
drop{}
}
else {
grok {
match => { "page_id" => "%{WORD:page}-%{INT:pid}"
}
remove_field => [ "page_id" ]
}
}
date {
match => [ "pid", "UNIX_MS" ]
target => "#timestamp"
}
}
output {
elasticsearch { hosts => ["localhost:9200"] }
stdout { codec => rubydebug }
}
I the following to get trend my data:
-I found it easier to pivot the data, rather than going down the column, to have the data go along the rows per each "event" or "document"
-Each field needed to be mapped accordingly as an integer or string
Once the data was in Kibana properly I had problems using the ruby code filter to make simple math calculations so I ended up using the "scripted fields" to make the calculations in Kibana.
input {
file {
type => "csv"
path => "/Users/philipp/perf_csv_pivot2.csv"
start_position => beginning
# to read from the beginning of file
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["page_id","unloadEventEnd","responseEnd","responseStart","domInteractive","domainLookupEnd","unloadEventStart","domComplete","domContentLoadedEventStart","domainLookupstart","redirectEnd","redirectStart","connectEnd","toJSON","connectStart","loadEventStart","navigationStart","requestStart","secureConnectionStart","fetchStart","domContentLoadedEventEnd","domLoading","loadEventEnd"]
}
grok {
match => { "page_id" => "%{WORD:page}-%{INT:page_ts}" }
remove_field => [ "page_id", "message", "path" ]
}
mutate {
convert => { "unloadEventEnd" => "integer" }
convert => { "responseEnd" => "integer" }
convert => { "responseStart" => "integer" }
convert => { "domInteractive" => "integer" }
convert => { "domainLookupEnd" => "integer" }
convert => { "unloadEventStart" => "integer" }
convert => { "domComplete" => "integer" }
convert => { "domContentLoadedEventStart" => "integer" }
convert => { "domainLookupstart" => "integer" }
convert => { "redirectEnd" => "integer" }
convert => { "redirectStart" => "integer" }
convert => { "connectEnd" => "integer" }
convert => { "toJSON" => "string" }
convert => { "connectStart" => "integer" }
convert => { "loadEventStart" => "integer" }
convert => { "navigationStart" => "integer" }
convert => { "requestStart" => "integer" }
convert => { "secureConnectionStart" => "integer" }
convert => { "fetchStart" => "integer" }
convert => { "domContentLoadedEventEnd" => "integer" }
convert => { "domLoading" => "integer" }
convert => { "loadEventEnd" => "integer" }
}
date {
match => [ "page_ts", "UNIX_MS" ]
target => "#timestamp"
remove_field => [ "page_ts", "timestamp", "host", "toJSON" ]
}
}
output {
elasticsearch { hosts => ["localhost:9200"] }
stdout { codec => rubydebug }
}
Hope this can help someone else,

Interpret locations from Keen.io JSON file in logstash filter

I'm trying to parse a JSON file from Keen.io with logstash into elasticsearch. The location and timestamp are stored in parameters like this:
{
"result":
[
{
"keen":
{
"timestamp": "2014-12-02T12:23:51.000Z",
"created_at": "2014-12-01T23:25:31.396Z",
"id": "XXXX",
"location":
{
"coordinates": [-95.8, 36.1]
}
}
}
]
}
My filter currently looks like this:
input {
file {
path => ["test.json"]
start_position => beginning
type => json
}
}
filter {
json {
source => message
remove_field => message
}
}
output {
stdout { codec => rubydebug }
}
How can I parse the "timestamp" and "location" fields so they are used for the #timestamp and #geoip.coordinates in Elasticsearch?
Update:
I've tried variations of this with no luck. The documentation is very basic - am I misunderstanding how to reference the JSON fields? Is there a way of adding debug output to help? I tried How to debug the logstash file plugin and Print a string to stdout using Logstash 1.4? but neither works.
filter {
json {
source => message
remove_field => message
}
if ("[result][0][keen][created_at]") {
date {
add_field => [ "[timestamp]", "[result][0][keen][created_at]" ]
remove_field => "[result][0][keen][created_at]"
}
}
Update 2:
Date is working now, still need to get location working.
filter {
json {
source => message
remove_field => message
add_tag => ["valid_json"]
}
if ("valid_json") {
if ("[result][0][keen][created_at]") {
date {
match => [ "[result][0][keen][created_at]", "ISO8601" ]
}
}
}
}
Keen.io's "created_at" field is stored in ISO 8601 format and so can easily be parsed by the date filter. Lat/long co-ordinates can be set by copying Keen.io's existing co-ordinates into logstash's geoip.coordinates array.
input {
file {
path => ["data.json"]
start_position => beginning
type => json
}
}
filter {
json {
source => message
remove_field => message
add_tag => ["valid_json"]
}
if ("valid_json") {
if ("[result][0][keen][created_at]") {
date {
# Set #timestamp to Keen.io's "created_at" field
match => [ "[result][0][keen][created_at]", "ISO8601" ]
}
}
if ("[result][0][keen][location][coordinates]") {
mutate {
# Copy existing co-orndiates into geoip.coordinates array
add_field => [ "[geoip][coordinates]", "%{[result][0][keen][location][coordinates][0]}" ]
add_field => [ "[geoip][coordinates]", "%{[result][0][keen][location][coordinates][1]}" ]
remove_field => "[result][0][keen][location][coordinates]"
}
}
}
}
output {
stdout { codec => rubydebug }
}