How to extract values within square brackets fields in a json log - json

i am a newbie in the use of logstash and i need help with the following json log format:
{
"field1" :[
{
"sub_field1": {
"sub_field2":"value X"
"sub_field3": {"sub_field4":"value Y"}
}
"sub_field5":"value W"
}
]
}
i want to know how to get Value X,Value Y and Value W using the mutate: "Add_field".
Thanks in advance!

Assuming that you will only have one array element under field1, it's just:
add_field => {
sub_field1 => '%{[field1][0][sub_field1]}'
sub_field2 => '%{[field1][0][sub_field1][sub_field2]}'
...
}
A good way to test this -- create a file called test.json
{ "field1" :[ { "sub_field1": { "sub_field2":"value X","sub_field3": {"sub_field4":"value Y"} }, "sub_field5":"value W" } ] }
Create a config file like test.conf:
{
stdin { codec => 'json_lines' }
}
filter {
mutate {
add_field => {
sub_field1 => '%{[field1][0][sub_field1]}'
sub_field2 => '%{[field1][0][sub_field1][sub_field2]}'
}
}
}
output {
stdout { codec => "rubydebug" }
}
And then run it: cat test.json | bin/logstash -f test.conf
You'll get output like this:
{
"field1" => [
[0] {
"sub_field5" => "value W",
"sub_field1" => {
"sub_field3" => {
"sub_field4" => "value Y"
},
"sub_field2" => "value X"
}
}
],
"#timestamp" => 2020-02-17T17:26:59.471Z,
"#version" => "1",
"host" => "xxxxxxxx",
"sub_field2" => "value X",
"sub_field1" => "{\"sub_field3\":{\"sub_field4\":\"value Y\"},\"sub_field2\":\"value X\"}",
"tags" => []
}
Which shows sub_field2 and sub_field1.
If you don't predictable field names then you'll need to resort to a ruby filter or something like that. And if you have more than one element you need to spit out, you'll need to use a strategy that's discussed in the comments here: https://stackoverflow.com/a/45493411/2785358

Related

Logstash filter - mask secrets in json data / replace specific keys values

I have some JSON data sent in to my logstash filter and wish to mask secrets from appearing in Kibana. My log looks like this:
{
"payloads":
[
{
"sequence": 1,
"request":
{
"url": "https://hello.com",
"method": "POST",
"postData": "{\"one:\"1\",\"secret:"THISISSECRET",\"username\":\"hello\",\"secret2\":\"THISISALSOSECRET\"}",
},
"response":
{
"status": 200,
}
}
],
...
My filter converts the payloads to payload and I then wish to mask the JSON in postData to be:
"postData": "{\"one:\"1\",\"secret\":\"[secret]\",\"username\":\"hello\",\"secret2\":\"[secret]\"}"
My filter now looks like this:
if ([payloads]) {
split {
field => "payloads"
target => "payload"
remove_field => [payloads]
}
}
# innetTmp is set to JSON here - this works
json {
source => "innerTmp"
target => "parsedJson"
if [parsedJson][secret] =~ /.+/ {
remove_field => [ "secret" ]
add_field => { "secret" => "[secret]" }
}
if [parsedJson][secret2] =~ /.+/ {
remove_field => [ "secret2" ]
add_field => { "secret2" => "[secret]" }
}
}
Is this a correct approach? I cannot see the filter replacing my JSON key/values with "[secret]".
Kind regards /K
The approach is good, you are using the wrong field
After the split the secret field is part of postData and that field is part of parsedJson.
if [parsedJson][postData][secret] {
remove_field => [ "[parsedJson][postData][secret]" ]
add_field => { "[parsedJson][postData][secret]" => "[secret]" }
}

How do you split json into multiple events in Logstash, where events are parts of a hash (unique)

I apologize if this question has been asked already, but I could not find it anywhere.
In Logstash I have json input that looks like this:
{
common_field: "commonValue",
events: {
ID_1: {
name: "name1",
other_field: "value1",
...
},
ID_2: {
name: "name2",
other_field: "value2",
...
},
ID_3: {
name: "name3",
other_field: "value3",
...
},
ID_4: {
name: "name4",
other_field: "value4",
...
}
}
}
What is the best way in logstash to create separate events for each "ID"? Logstash will not allow me to split on "events" and I get a tag: [0] "_split_type_failure". Any guidance would be extremely helpful!
If ID_1/ID_2/ID_3 are always the same, you can use the clone filter to clone the event and and then remove the other fields on the clones. If they aren't, as far as I know, the only way to do it is with a custom plugin.
The last time I checked (several years ago), then mechanism that clone uses (yield) isn't available to a ruby filter.
To use clone, you'd do this:
clone {
clones => ["ID_1","ID_2"..,"ID_n"]
}
if [type] == "ID_1" {
mutate {
remove_field => ["[events][ID_2]","[events][ID_3]",..."[events][ID_n]"]
rename => {
"[events][ID_1][name]" => "name",
"[events][ID_1][other_field]" => "other_field"
}
}
} else if [type] == "ID_2" {
mutate {
remove_field => ["[events][ID_1]","[events][ID_3]",..."[events][ID_n]"]
rename => {
"[events][ID_2][name]" => "name",
"[events][ID_2][other_field]" => "other_field"
}
}
} ...
What about converting the hash into an array, and then split the array?
I would do something like
ruby { code => "event.set('events_array', event.get('events').to_a) "}
split {field => "events_array"
target => "new_object"
}

Un-breaking an analyzed field in kibana

I have an ELK stack that receives from filebeat structured JSON logs like these:
{"what": "Connected to proxy service", "who": "proxy.service", "when": "03.02.2016 13:29:51", "severity": "DEBUG", "more": {"host": "127.0.0.1", "port": 2004}}
{"what": "Service registered with discovery", "who": "proxy.discovery", "when": "03.02.2016 13:29:51", "severity": "DEBUG", "more": {"ctx": {"node": "igz0", "ip": "127.0.0.1:5301", "irn": "proxy"}, "irn": "igz0.proxy.827378e7-3b67-49ef-853c-242de033e645"}}
{"what": "Exception raised while setting service value", "who": "proxy.discovery", "when": "03.02.2016 13:46:34", "severity": "WARNING", "more": {"exc": "ConnectionRefusedError('Connection refused',)", "service": "igz0.proxy.827378e7-3b67-49ef-853c-242de033e645"}}
The "more" field which is a nested JSON is broken down (not sure by what part of the stack) to different fields ("more.host", "more.ctx" and such) in kibana.
This is my beats input:
input {
beats {
port => 5044
}
}
filter {
if [type] == "node" {
json {
source => "message"
add_field => {
"who" => "%{name}"
"what" => "%{msg}"
"severity" => "%{level}"
"when" => "%{time}"
}
}
} else {
json {
source => "message"
}
}
date {
match => [ "when" , "dd.MM.yyyy HH:mm:ss", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"]
}
}
And this is my output:
output {
elasticsearch {
hosts => ["localhost"]
sniffing => true
manage_template => false
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
stdout { codec => rubydebug }
}
Is there any way of making a field which will contain the entire "more" field without breaking it apart?
You should be able to use a ruby filter to take the hash and convert it back into a string.
filter {
ruby {
code => "event['more'] = event['more'].to_s"
}
}
You'd probably want to surround it with an if to make sure that the field exists first.

Accessing nested JSON Value with variable key name in Logstash

I've got a question regarding JSON in Logstash.
I have got a JSON Input that looks something like this:
{
"2": {
"name": "name2",
"state": "state2"
},
"1": {
"name": "name1",
"state": "state1"
},
"0": {
"name": "name0",
"state": "state0"
}
}
Now, let's say I want to add a field in the logstash config
json{
source => "message"
add_field => {
"NAME" => "%{ What to write here ?}"
"STATE" => "%{ What to write here ?}"
}
}
Is there a way to access the JSON Input such that I get a field Name with value name1, another field with name 2 and a third field with name 3. The first key in the JSON is changing, that means there can only be one or many more parts. So I don't want to hardcode it like
%{[0][name]}
Thanks for your help.
If you remove all new lines in your input you can simply use the json filter. You don't need any add_field action.
Working config without new lines:
filter {
json { source => message }
}
If you can't remove the new lines in your input you need to merge the lines with the multiline codec.
Working config with new lines:
input {
file {
path => ["/path/to/your/file"] # I suppose your input is a file.
start_position => "beginning"
sincedb_path => "/dev/null" # just for testing
codec => multiline {
pattern => "^}"
what => "previous"
negate => "true"
}
}
}
filter {
mutate { replace => { "message" => "%{message}}" } }
json { source => message }
}
I suppose that you use the file input. In case you don't, just change it.
Output (for both):
"2" => {
"name" => "name2",
"state" => "state2"
},
"1" => {
"name" => "name1",
"state" => "state1"
},
"0" => {
"name" => "name0",
"state" => "state0"
}

Logstash filter parse json file result a double fields

I am using the latest ELK (Elasticsearch 1.5.2 , Logstash 1.5.0, Kibana 4.0.2)
I have a question that
sample .json
{ "field1": "This is value1", "field2": "This is value2" }
longstash.conf
input {
stdin{ }
}
filter {
json {
source => "message"
add_field =>
{
"field1" => "%{field1}"
"field2" => "%{field2}"
}
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
host => "localhost"
index => "scan"
}
}
Output:
{
"message" => "{ \"field1\": \"This is value1\", \"field2\": \"This is value2\" }",
"#version" => "1",
"#timestamp" => "2015-05-07T06:02:56.088Z",
"host" => "myhost",
"field1" => [
[0] "This is value1",
[1] "This is value1"
],
"field2" => [
[0] "This is value2",
[1] "This is value2"
]
}
My question is 1) why the field result appear double in the result? 2) If there is nested array , how is it should reference in the logstash configure?
Thanks a lot!
..Petera
I think you have misunderstood what the json filter does. When you process a field through the json filter it will look for field names and corresponding values.
In your example, you have done that with this part:
filter {
json {
source => "message"
Then you have added a field called "field1" with the content of field "field1", since the field already exists you have just added the same information to the field that was already there, it has now become an array:
add_field =>
{
"field1" => "%{field1}"
"field2" => "%{field2}"
}
}
}
If you simplify your code to the following you should be fine:
filter {
json {
source => "message"
}
}
I suspect your question about arrays becomes moot at this point, as you probably don't need the nested array, and therefore, won't need to address it, but in case you do, I believe you can do this like so:
[field1][0]
[field1][1]