Parse multiline JSON with grok in logstash - json

I've got a JSON of the format:
{
"SOURCE":"Source A",
"Model":"ModelABC",
"Qty":"3"
}
I'm trying to parse this JSON using logstash. Basically I want the logstash output to be a list of key:value pairs that I can analyze using kibana. I thought this could be done out of the box. From a lot of reading, I understand I must use the grok plugin (I am still not sure what the json plugin is for). But I am unable to get an event with all the fields. I get multiple events (one even for each attribute of my JSON). Like so:
{
"message" => " \"SOURCE\": \"Source A\",",
"#version" => "1",
"#timestamp" => "2014-08-31T01:26:23.432Z",
"type" => "my-json",
"tags" => [
[0] "tag-json"
],
"host" => "myserver.example.com",
"path" => "/opt/mount/ELK/json/mytestjson.json"
}
{
"message" => " \"Model\": \"ModelABC\",",
"#version" => "1",
"#timestamp" => "2014-08-31T01:26:23.438Z",
"type" => "my-json",
"tags" => [
[0] "tag-json"
],
"host" => "myserver.example.com",
"path" => "/opt/mount/ELK/json/mytestjson.json"
}
{
"message" => " \"Qty\": \"3\",",
"#version" => "1",
"#timestamp" => "2014-08-31T01:26:23.438Z",
"type" => "my-json",
"tags" => [
[0] "tag-json"
],
"host" => "myserver.example.com",
"path" => "/opt/mount/ELK/json/mytestjson.json"
}
Should I use the multiline codec or the json_lines codec? If so, how can I do that? Do I need to write my own grok pattern or is there something generic for JSONs that will give me ONE EVENT with key:value pairs that I get for one event above? I couldn't find any documentation that sheds light on this. Any help would be appreciated. My conf file is shown below:
input
{
file
{
type => "my-json"
path => ["/opt/mount/ELK/json/mytestjson.json"]
codec => json
tags => "tag-json"
}
}
filter
{
if [type] == "my-json"
{
date { locale => "en" match => [ "RECEIVE-TIMESTAMP", "yyyy-mm-dd HH:mm:ss" ] }
}
}
output
{
elasticsearch
{
host => localhost
}
stdout { codec => rubydebug }
}

I think I found a working answer to my problem. I am not sure if it's a clean solution, but it helps parse multiline JSONs of the type above.
input
{
file
{
codec => multiline
{
pattern => '^\{'
negate => true
what => previous
}
path => ["/opt/mount/ELK/json/*.json"]
start_position => "beginning"
sincedb_path => "/dev/null"
exclude => "*.gz"
}
}
filter
{
mutate
{
replace => [ "message", "%{message}}" ]
gsub => [ 'message','\n','']
}
if [message] =~ /^{.*}$/
{
json { source => message }
}
}
output
{
stdout { codec => rubydebug }
}
My mutliline codec doesn't handle the last brace and therefore it doesn't appear as a JSON to json { source => message }. Hence the mutate filter:
replace => [ "message", "%{message}}" ]
That adds the missing brace. and the
gsub => [ 'message','\n','']
removes the \n characters that are introduced. At the end of it, I have a one-line JSON that can be read by json { source => message }
If there's a cleaner/easier way to convert the original multi-line JSON to a one-line JSON, please do POST as I feel the above isn't too clean.

You will need to use a multiline codec.
input {
file {
codec => multiline {
pattern => '^{'
negate => true
what => previous
}
path => ['/opt/mount/ELK/json/mytestjson.json']
}
}
filter {
json {
source => message
remove_field => message
}
}
The problem you will run into has to do with the last event in the file. It won't show up till there is another event in the file (so basically you'll lose the last event in a file) -- you could append a single { to the file before it gets rotated to deal with that situation.

Related

How to create grok/json filter to parse the below json format

I want to parse this JSON to Kibana using Logstash
{
"Format": "IDEA0",
"ID": "2b03eb1f-fc4c-4f67-94e5-31c9fb32dccc",
"DetectTime": "2022-01-31T08:16:12.600470+07:00",
"EventTime": "2022-01-31T01:23:01.637438+00:00",
"Category": ['Intrusion.Botnet'],
"Confidence": 0.03,
"Note": "C&C channel, destination IP: 192.168.1.24 port: 8007/tcp score: 0.9324",
"Source": [{'IP4': ['192.168.1.25'], 'Type': ['CC']}]
}
I want that ID, Detect Time, Event Time, Category, Confidence, Note, Source is a single field so later i can do visualization in kibana.
Here's what I'm already trying to do
input {
file {
path => "/home/ubuntu/Downloads/StratosphereLinuxIPS/output/*.json"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
json {
source => "message"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "test-test"
user => "***"
password => "***"
}
stdout{}
}
But the field is not separated correctly
Any help will be meaningful.
Thanks.
:::UPDATE:::
I already found the solution (Help by other guys from Elastic forum but not 100% optimize need to tweak it a little more)
Here's the Logstash Conf I'm using if someone needs it in the future
input {
file {
path => "/home/ubuntu/Downloads/StratosphereLinuxIPS/output/alerts.json"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => multiline { pattern => "^{$" negate => "true" what => "previous" }
}
}
filter {
mutate {
gsub => ["message", "'", '"']
}
json {
source => "message"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "test-keempat"
user => "xxx"
password => "xxx"
}
stdout{ codec => rubydebug }
}
Thanks !

How To Remove Quotation Marks (" ") In Geo Coordinates On Logstash Conf File

I want to express a line on a map in Kibana and the map should be Geojson structure.
The data that I have is a set of SQL table then I was about to transfer them to Elastic Search using Logstash like this
input{ ... }
filter{
if [lat] and [lon] {
mutate{convert => ["lat", "float"]}
mutate{convert => ["lon", "float"]}
mutate{convert => ["for_lat", "float"]}
mutate{convert => ["for_lon", "float"]}
mutate{
add_field => {"[location-geotest][type]" => "multilinestring"}
add_field => {"[location-geotest][coordinates]" => [["%{lon}", "%{lat}"]]}
add_field => {"[location-geotest][coordinates]" => [["%{for_lon}", "%{for_lat}"]]}
}
}
}
However the logstash conf file failed to index the data on Elasticsearch
{
:status=>400,
:action=>["index", {:_id=>"18022", :_index=>"geo_shape_test", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x687994b9>],
:response=> {
"index"=>{
"_index"=>"geo_shape_test",
"_type"=>"_doc",
"_id"=>"18022",
"status"=>400,
"error"=>{
"type"=>"mapper_parsing_exception",
"reason"=>"failed to parse field [location-geotest] of type [geo_shape]",
"caused_by"=>{"type"=>"x_content_parse_exception",
"reason"=>"[1:164] [geojson] failed to parse field [coordinates]",
"caused_by"=>{
"type"=>"parse_exception",
"reason"=>"geo coordinates must be numbers"
}
}
}
}
}
}
and this is the one of what logstash tried to index
{
"lat" => 37.567179953757886,
"gps_id" => 10491,
"timestamp" => 2020-11-22T06:10:45.000Z,
"speed" => 17.25745240090587,
"lon" => 126.99598717854032,
"for_lat" => 37.567179953757886,
"#timestamp" => 2020-11-27T03:54:21.131Z,
"for_lon" => 126.99598717854032,
"#version" => "1",
"location-geotest" => {
"coordinates" => [
[0] "[\"126.99598717854032\", \"37.567179953757886\"]",
[1] "[\"126.99598717854032\", \"37.567179953757886\"]"
],
"type" => "multilinestring"
}
}
I think the problem is...
"coordinates" => [
[0] "[\"126.99598717854032\", \"37.567179953757886\"]",
[1] "[\"126.99598717854032\", \"37.567179953757886\"]"
],
if I change the part, it will be...
"coordinates" => [
[0] [126.99598717854032, 37.567179953757886],
[1] [126.99598717854032, 37.567179953757886]
],
But I can't find how to solve.
I think the problem is as you say, the coordinates has to be float instead of strings. Apparently the mutate function converts the value back to string. As mentioned in
https://discuss.elastic.co/t/logstash-mutate-filter-always-stringifies-hash-and-array/25917
They suggest the solution to use a ruby script instead.
This has been done for the linestring as.
https://discuss.elastic.co/t/geo-shape-geo-link-problems-with-coordinates/179924/4
From the data provided I don't see why you need multiline string? With only two points it should be enough to store as line string.
I tried it out with
filter{
if [lat] and [lon] {
mutate{
convert => ["lat", "float"]
convert => ["lon", "float"]
convert => ["for_lat", "float"]
convert => ["for_lon", "float"]
add_field => {"[location-geotest][type]" => "linestring"}
}
ruby{
code => "event.set('[location-geotest][coordinates]', [[event.get('lon'), event.get('lat')], [event.get('for_lon'), event.get('for_lat')]])"
}
}
}
and get the result:
"location-geotest" => {
"type" => "linestring",
"coordinates" => [
[0] [
[0] 126.99598717854032,
[1] 37.567179953757886
],
[1] [
[0] 126.99598717854032,
[1] 37.567179953757886
]
]
}
Which is indexed correctly.
If you need multi string I guess you need more data and add one more layer of arrays in the ruby script.

Logstash cannot extract json key

I need help regarding logstash filter to extract json key/value to new_field. The following is my logstash conf.
input {
tcp {
port => 5044
}
}
filter {
json {
source => "message"
add_field => {
"data" => "%{[message][data]}"
}
}
}
output {
stdout { codec => rubydebug }
}
I have tried with mutate:
filter {
json {
source => "message"
}
mutate {
add_field => {
"data" => "%{[message][data]}"
}
}
}
I have tried with . instead of []:
filter {
json {
source => "message"
}
mutate {
add_field => {
"data" => "%{message.data}"
}
}
}
I have tried with index number:
filter {
json {
source => "message"
}
mutate {
add_field => {
"data" => "%{[message][0]}"
}
}
}
All with no luck. :(
The following json is sent to port 5044:
{"data": "blablabla"}
The problem is the new field not able to extract value from the key of the json.
"data" => "%{[message][data]}"
The following is my stdout:
{
"#version" => "1",
"host" => "localhost",
"type" => "logstash",
"data" => "%{[message][data]}",
"path" => "/path/from/my/app",
"#timestamp" => 2019-01-11T20:39:10.845Z,
"message" => "{\"data\": \"blablabla\"}"
}
However if I use "data" => "%{[message]}" instead:
filter {
json {
source => "message"
add_field => {
"data" => "%{[message]}"
}
}
}
I will get the whole json from stdout.
{
"#version" => "1",
"host" => "localhost",
"type" => "logstash",
"data" => "{\"data\": \"blablabla\"}",
"path" => "/path/from/my/app",
"#timestamp" => 2019-01-11T20:39:10.845Z,
"message" => "{\"data\": \"blablabla\"}"
}
Can anyone please tell me what I did wrong.
Thank you in advance.
I use docker-elk stack, ELK_VERSION=6.5.4
add_field is used to add custom logic when filter succeeds, many filters have this option. If you want to parse json into a field, you should use target:
filter {
json {
source => "message"
target => "data" // parse into data field
}
}

Codec Json doesn't send data to elasticsearch

INPUT: json
{"userid": 125,"type": "SELL"}
{"userid": 127,"type": "SELL"}
LOGSTASH CONF FILE:
input {
kafka {
bootstrap_servers => ""
topics => ["topic1"]
codec => "json"
}
}
output {
amazon_es {
hosts => [""]
region => ""
aws_access_key_id => ''
aws_secret_access_key => ''
index => "indexname"
}
stdout { codec => rubydebug }
}
stdout output:
{
"userid" => 127,
"#version" => "1",
"#timestamp" => 2018-10-18T13:54:37.641Z,
"type" => "SELL"
}
The output looks exactly like what I want. But this will just not go into the elasticsearch topic.
If I do not use the json filter the entire json goes as 'message' into ES.
Any help will be appreciated.
Did you try to use json filter?
input {
kafka {
bootstrap_servers => ""
topics => ["topic1"]
}
}
filter {
json {
source => "message"
}
}
output {
amazon_es {
hosts => [""]
region => ""
aws_access_key_id => ''
aws_secret_access_key => ''
index => "indexname",
custom_headers => {'Content-Type': 'application/json'}
}
stdout { codec => rubydebug }
}
I figured the problem. I was using a field called type in my json file data which was causing the problem. logstash stored it as _type in ES which is no more allowed with the newer versions of elasticsearch. And that is why if my type was 'BUY' it would take the entry and if it was 'SELL' it would reject it. Thank you for the help!

Logstash 1.4.1 multiline codec not working

I'm trying to parse multiline data from log file.
I have tried multiline codec and multiline filter.
but it doesn't work for me.
Log data
INFO 2014-06-26 12:34:42,881 [4] [HandleScheduleRequests] Request Entity:
User Name : user
DLR : 04
Text : string
Interface Type : 1
Sender : sdr
DEBUG 2014-06-26 12:34:43,381 [4] [HandleScheduleRequests] Entitis is : 1 System.Exception
and this is configuration file
input {
file {
type => "cs-bulk"
path =>
[
"/logs/bulk/*.*"
]
start_position => "beginning"
sincedb_path => "/logstash-1.4.1/bulk.sincedb"
codec => multiline {
pattern => "^%{LEVEL4NET}"
what => "previous"
negate => true
}
}
}
output {
stdout { codec => rubydebug }
if [type] == "cs-bulk" {
elasticsearch {
host => localhost
index => "cs-bulk"
}
}
}
filter {
if [type] == "cs-bulk" {
grok {
match => { "message" => "%{LEVEL4NET:level} %{TIMESTAMP_ISO8601:time} %{THREAD:thread} %{LOGGER:method} %{MESSAGE:message}" }
overwrite => ["message"]
}
}
}
and this is what I get when logstash parsing the multiline part
It just get the first line, and tag it as multiline.
the other lines not parsed!
{
"#timestamp" => "2014-06-27T16:27:21.678Z",
"message" => "Request Entity:",
"#version" => "1",
"tags" => [
[0] "multiline"
],
"type" => "cs-bulk",
"host" => "lab",
"path" => "/logs/bulk/22.log",
"level" => "INFO",
"time" => "2014-06-26 12:34:42,881",
"thread" => "[4]",
"method" => "[HandleScheduleRequests]"
}
Place a (?m) at the beginning of your grok pattern. That will allow regex to not stop at \n.
Not quite sure what's going on, but using a multiline filter instead of a codec like this:
input {
stdin {
}
}
filter {
multiline {
pattern => "^(WARN|DEBUG|ERROR)"
what => "previous"
negate => true
}
}
Does work in my testing...
{
"message" => "INFO 2014-06-26 12:34:42,881 [4] [HandleScheduleRequests] Request Entity:\nUser Name : user\nDLR : 04\nText : string\nInterface Type : 1\nSender : sdr",
"#version" => "1",
"#timestamp" => "2014-06-27T20:32:05.288Z",
"host" => "HOSTNAME",
"tags" => [
[0] "multiline"
]
}
{
"message" => "DEBUG 2014-06-26 12:34:43,381 [4] [HandleScheduleRequests] Entitis is : 1 System.Exception",
"#version" => "1",
"#timestamp" => "2014-06-27T20:32:05.290Z",
"host" => "HOSTNAME"
}
Except... the test file I used it never prints out the last line (because it's still looking for more to follow)