append array of json logstash elasticsearch - csv

how can i append an array on elasticsearch with json object using logstash from csv
exemple of csv
a csv containt lines
id,key1,key2
1,toto1,toto2
1,titi1,titi2
2,tata1,tata2
the result should be 2 documents
{
"id": 1,
[{
"key1": "toto1",
"key2": "toto2"
}, {
"key1": "titi1 ",
"key2": "titi2"
}]
}
,{
"id": 2,
[{
"key1": "tata1",
"key2": "tata2"
}]
}
cordially

First, create your ES mapping, if necessarry, declaring you inner objects as nested objects.
{
"mappings": {
"key_container": {
"properties": {
"id": {
"type": "keyword",
"index": true
},
"keys": {
"type": "nested",
"properties": {
"key1": {
"type": "keyword",
"index": true
},
"key2": {
"type": "text",
"index": true
}
}
}
}
}
}
}
The keys property will contain the array of nested objects.
Than you can load the csv in two hops with logstash:
Index (Create) the base object containing only the id property
update the base object with the keys property containing the array of nested objects
The first logstash configuration (only the relevant part):
filter {
csv {
columns => ["id","key1","key1"]
separator => ","
# Remove the keys because the will be loaded in the next hop with update
remove_field => [ "key1", "key2"]
}
# Remove the row containing the column names
if [id] == "id" {
drop { }
}
}
output {
elasticsearch {
action => "index"
document_id => "%{id}"
hosts => [ "localhost:9200" ]
index => "key_container"
}
}
The second steps logstash configuration (you have to enable scripting in elasticsearch):
filter {
csv {
columns => ["id","key1","key2"]
separator => ","
}
# Convert the attributes into an object called 'key' that is passed to the script below (via the 'event' object)
mutate{
rename => {
"key1" => "[key][key1]"
"key2" => "[key][key2]"
}
}
}
output {
elasticsearch {
action => "update"
document_id => "%{id}"
doc_as_upsert => "true"
hosts => [ "localhost:9200" ]
index => "key_container"
script_lang => "groovy"
# key_container.keys is an array of key objects
# arrays can be built only with scripts and defined as an array when we put the first element into it
script => "if (ctx._source.containsKey('keys')) {ctx._source.keys += event.key} else {ctx._source.keys = [event.key]}"
}
}
Summary, you need this two hop loading because of array creation that requires scripting which is available only with update.

Related

Logstash filter - mask secrets in json data / replace specific keys values

I have some JSON data sent in to my logstash filter and wish to mask secrets from appearing in Kibana. My log looks like this:
{
"payloads":
[
{
"sequence": 1,
"request":
{
"url": "https://hello.com",
"method": "POST",
"postData": "{\"one:\"1\",\"secret:"THISISSECRET",\"username\":\"hello\",\"secret2\":\"THISISALSOSECRET\"}",
},
"response":
{
"status": 200,
}
}
],
...
My filter converts the payloads to payload and I then wish to mask the JSON in postData to be:
"postData": "{\"one:\"1\",\"secret\":\"[secret]\",\"username\":\"hello\",\"secret2\":\"[secret]\"}"
My filter now looks like this:
if ([payloads]) {
split {
field => "payloads"
target => "payload"
remove_field => [payloads]
}
}
# innetTmp is set to JSON here - this works
json {
source => "innerTmp"
target => "parsedJson"
if [parsedJson][secret] =~ /.+/ {
remove_field => [ "secret" ]
add_field => { "secret" => "[secret]" }
}
if [parsedJson][secret2] =~ /.+/ {
remove_field => [ "secret2" ]
add_field => { "secret2" => "[secret]" }
}
}
Is this a correct approach? I cannot see the filter replacing my JSON key/values with "[secret]".
Kind regards /K
The approach is good, you are using the wrong field
After the split the secret field is part of postData and that field is part of parsedJson.
if [parsedJson][postData][secret] {
remove_field => [ "[parsedJson][postData][secret]" ]
add_field => { "[parsedJson][postData][secret]" => "[secret]" }
}

is it possible to split a nested json field value in json log into further sub fields in logstash filtering using mutate?

I have a json log like this being streamed into ELK
{
"event": "Events Report",
"level": "info",
"logger": "XXXXX",
"method": "YYYYY",
"report_duration": {
"duration": "5 days, 12:43:16",
"end": "2021-12-13 03:43:16",
"start": "2021-12-07 15:00:00"
},
"request_type": "GET",
"rid": "xyz-123-yzfs",
"field_id": "arefer-e3-adfe93439",
"timestamp": "12/13/2021 03:43:53 AM",
"user": "8f444233ed4-91b8-4839-a57d-ande2534"
}
I would like to further split duration value i.e "5 days, 12:43:16" as some thing like "days": "5"
I have tried using below logstash filter and still its not working
filter {
if "report_duration" in [reports]{
mutate {
split => { "duration" => " " }
add_field => { "days" => "%{[duration][0]}" }
convert => {
"days" => "integer"
}
}
}
}
I think I have config that fits what you want:
# Since I wasn't sure of what you wanted, I changed the conditional here to check if the duration nested field is present
if [report_duration][duration]{
mutate {
# Since duration is nested under report_duration, it has to be accessed this way:
split => { "[report_duration][duration]" => " " }
# The split option replace the text field with an array, so it's still nested
add_field => { "days" => "%{[report_duration][duration][0]}" }
}
# the convert option is executed before the split option, so it has to be moved in its own plugin call
mutate {
convert => {
"days" => "integer"
}
}
}
Some references: accessing nested fields, mutate filter process order

Separate key/value pair in multiline JSON read with filebeat and input into logstash

I have a large JSON file I'm trying to input into elastic with logstash and filebeat. In the example I don't care about "results" or "types" but I want all of the "keys" to have the associated "values"
Filebeat Config
filebeat.inputs:
- type: log
enabled: true
paths:
- C:\path\to\file
tags: ["myfile"]
multiline.type: pattern
multiline.pattern: '\['
multiline.negate: true
multiline.match: after
Logstash Config
input {
beats{
port => "5044"
}
}
filter {
if "myfile" in [tags] {
json {
source => "message"
}
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "filebeat-%{YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}
Sample JSON
{
"results": {
"type1": {
"key1": [
"value1",
"value2",
"value3"
],
"key2": [
"value4",
"value5"
]
},
"type2": {
"key3": [
"value6",
"value7"
]
}
}

Accessing nested JSON Value with variable key name in Logstash

I've got a question regarding JSON in Logstash.
I have got a JSON Input that looks something like this:
{
"2": {
"name": "name2",
"state": "state2"
},
"1": {
"name": "name1",
"state": "state1"
},
"0": {
"name": "name0",
"state": "state0"
}
}
Now, let's say I want to add a field in the logstash config
json{
source => "message"
add_field => {
"NAME" => "%{ What to write here ?}"
"STATE" => "%{ What to write here ?}"
}
}
Is there a way to access the JSON Input such that I get a field Name with value name1, another field with name 2 and a third field with name 3. The first key in the JSON is changing, that means there can only be one or many more parts. So I don't want to hardcode it like
%{[0][name]}
Thanks for your help.
If you remove all new lines in your input you can simply use the json filter. You don't need any add_field action.
Working config without new lines:
filter {
json { source => message }
}
If you can't remove the new lines in your input you need to merge the lines with the multiline codec.
Working config with new lines:
input {
file {
path => ["/path/to/your/file"] # I suppose your input is a file.
start_position => "beginning"
sincedb_path => "/dev/null" # just for testing
codec => multiline {
pattern => "^}"
what => "previous"
negate => "true"
}
}
}
filter {
mutate { replace => { "message" => "%{message}}" } }
json { source => message }
}
I suppose that you use the file input. In case you don't, just change it.
Output (for both):
"2" => {
"name" => "name2",
"state" => "state2"
},
"1" => {
"name" => "name1",
"state" => "state1"
},
"0" => {
"name" => "name0",
"state" => "state0"
}

Logstash: XML to JSON output from array to string

I am in the process of trying to use Logstash to convert an XML into JSON for ElasticSearch. I am able to get the the values read and sent to ElasticSearch. The issue is that all the values come out as arrays. I would like to make them come out as just strings. I know I can do a replace for each field individually, but then I run into an issue with nested fields being 3 levels deep.
XML
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<acs2:SubmitTestResult xmlns:acs2="http://tempuri.org/" xmlns:acs="http://schemas.sompleace.org" xmlns:acs1="http://schemas.someplace.org">
<acs2:locationId>Location Id</acs2:locationId>
<acs2:userId>User Id</acs2:userId>
<acs2:TestResult>
<acs1:CreatedBy>My Name</acs1:CreatedBy>
<acs1:CreatedDate>2015-08-07</acs1:CreatedDate>
<acs1:Output>10.5</acs1:Output>
</acs2:TestResult>
</acs2:SubmitTestResult>
Logstash Config
input {
file {
path => "/var/log/logstash/test.xml"
}
}
filter {
multiline {
pattern => "^\s\s(\s\s|\<\/acs2:SubmitTestResult\>)"
what => "previous"
}
if "multiline" in [tags] {
mutate {
replace => ["message", '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>%{message}']
}
xml {
target => "SubmitTestResult"
source => "message"
}
mutate {
remove_field => ["message", "#version", "host", "#timestamp", "path", "tags", "type"]
remove_field => ["entry", "[SubmitTestResult][xmlns:acs2]", "[SubmitTestResult][xmlns:acs]", "[SubmitTestResult][xmlns:acs1]"]
# This works
replace => [ "[SubmitTestResult][locationId]", "%{[SubmitTestResult][locationId]}" ]
# This does NOT work
replace => [ "[SubmitTestResult][TestResult][CreatedBy]", "%{[SubmitTestResult][TestResult][CreatedBy]}" ]
}
}
}
output {
stdout {
codec => "rubydebug"
}
elasticsearch {
index => "xmltest"
cluster => "logstash"
}
}
Example Output
{
"_index": "xmltest",
"_type": "logs",
"_id": "AU8IZBURkkRvuur_3YDA",
"_version": 1,
"found": true,
"_source": {
"SubmitTestResult": {
"locationId": "Location Id",
"userId": [
"User Id"
],
"TestResult": [
{
"CreatedBy": [
"My Name"
],
"CreatedDate": [
"2015-08-07"
],
"Output": [
"10.5"
]
}
]
}
}
}
As you can see, the output is an array for each element (except for the locationId I replaced with). I am trying to not have to do the replace for each element. Is there a way to adjust the config to make the output come put properly? If not, how do I get 3 levels deep in the replace?
--UPDATE--
I figured out how to get to the 3rd level in Test Results. The replace is:
replace => [ "[SubmitTestResult][TestResult][0][CreatedBy]", "%{[SubmitTestResult][TestResult][0][CreatedBy]}" ]
I figured it out. Here is the solution.
replace => [ "[SubmitTestResult][TestResult][0][CreatedBy]", "%{[SubmitTestResult][TestResult][0][CreatedBy]}" ]