CSV data saving Elasticsearch using Logstash - csv

I want to save CSV data in Elasticsearch using Logstash to receive the following result:
"my_field": [{"col1":"AAA", "col2": "BBB"},{"col1":"CCC", "col2": "DDD"}]
So, it's important that CSV data gets saved as the array [...] in a specific document.
However, I get this result:
"path": "path/to/csv",
"#timestamp": "2017-09-22T11:28:59.143Z",
"#version": "1",
"host": "GT-HYU",
"col2": "DDD",
"message": "CCC,DDD",
"col1": "CCC"
It looks like only the last CSV row gets saved (because of overwriting). I tried to use document_id => "1" in Logstash, but it obviously provokes the overwriting. How can I save data in the array?
Also, I don't understand how to define that the data gets saved in my_field.
input {
file {
path => ["path/to/csv"]
sincedb_path => "/dev/null"
start_position => beginning
}
}
filter {
csv {
columns => ["col1","col2"]
separator => ","
}
if [col1] == "col1" {
drop {}
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
action => "update"
hosts => ["127.0.0.1:9200"]
index => "my_index"
document_type => "my_type"
document_id => "1"
workers => 1
}
}

Related

Logstash filter - mask secrets in json data / replace specific keys values

I have some JSON data sent in to my logstash filter and wish to mask secrets from appearing in Kibana. My log looks like this:
{
"payloads":
[
{
"sequence": 1,
"request":
{
"url": "https://hello.com",
"method": "POST",
"postData": "{\"one:\"1\",\"secret:"THISISSECRET",\"username\":\"hello\",\"secret2\":\"THISISALSOSECRET\"}",
},
"response":
{
"status": 200,
}
}
],
...
My filter converts the payloads to payload and I then wish to mask the JSON in postData to be:
"postData": "{\"one:\"1\",\"secret\":\"[secret]\",\"username\":\"hello\",\"secret2\":\"[secret]\"}"
My filter now looks like this:
if ([payloads]) {
split {
field => "payloads"
target => "payload"
remove_field => [payloads]
}
}
# innetTmp is set to JSON here - this works
json {
source => "innerTmp"
target => "parsedJson"
if [parsedJson][secret] =~ /.+/ {
remove_field => [ "secret" ]
add_field => { "secret" => "[secret]" }
}
if [parsedJson][secret2] =~ /.+/ {
remove_field => [ "secret2" ]
add_field => { "secret2" => "[secret]" }
}
}
Is this a correct approach? I cannot see the filter replacing my JSON key/values with "[secret]".
Kind regards /K
The approach is good, you are using the wrong field
After the split the secret field is part of postData and that field is part of parsedJson.
if [parsedJson][postData][secret] {
remove_field => [ "[parsedJson][postData][secret]" ]
add_field => { "[parsedJson][postData][secret]" => "[secret]" }
}

How to extract values within square brackets fields in a json log

i am a newbie in the use of logstash and i need help with the following json log format:
{
"field1" :[
{
"sub_field1": {
"sub_field2":"value X"
"sub_field3": {"sub_field4":"value Y"}
}
"sub_field5":"value W"
}
]
}
i want to know how to get Value X,Value Y and Value W using the mutate: "Add_field".
Thanks in advance!
Assuming that you will only have one array element under field1, it's just:
add_field => {
sub_field1 => '%{[field1][0][sub_field1]}'
sub_field2 => '%{[field1][0][sub_field1][sub_field2]}'
...
}
A good way to test this -- create a file called test.json
{ "field1" :[ { "sub_field1": { "sub_field2":"value X","sub_field3": {"sub_field4":"value Y"} }, "sub_field5":"value W" } ] }
Create a config file like test.conf:
{
stdin { codec => 'json_lines' }
}
filter {
mutate {
add_field => {
sub_field1 => '%{[field1][0][sub_field1]}'
sub_field2 => '%{[field1][0][sub_field1][sub_field2]}'
}
}
}
output {
stdout { codec => "rubydebug" }
}
And then run it: cat test.json | bin/logstash -f test.conf
You'll get output like this:
{
"field1" => [
[0] {
"sub_field5" => "value W",
"sub_field1" => {
"sub_field3" => {
"sub_field4" => "value Y"
},
"sub_field2" => "value X"
}
}
],
"#timestamp" => 2020-02-17T17:26:59.471Z,
"#version" => "1",
"host" => "xxxxxxxx",
"sub_field2" => "value X",
"sub_field1" => "{\"sub_field3\":{\"sub_field4\":\"value Y\"},\"sub_field2\":\"value X\"}",
"tags" => []
}
Which shows sub_field2 and sub_field1.
If you don't predictable field names then you'll need to resort to a ruby filter or something like that. And if you have more than one element you need to spit out, you'll need to use a strategy that's discussed in the comments here: https://stackoverflow.com/a/45493411/2785358

append array of json logstash elasticsearch

how can i append an array on elasticsearch with json object using logstash from csv
exemple of csv
a csv containt lines
id,key1,key2
1,toto1,toto2
1,titi1,titi2
2,tata1,tata2
the result should be 2 documents
{
"id": 1,
[{
"key1": "toto1",
"key2": "toto2"
}, {
"key1": "titi1 ",
"key2": "titi2"
}]
}
,{
"id": 2,
[{
"key1": "tata1",
"key2": "tata2"
}]
}
cordially
First, create your ES mapping, if necessarry, declaring you inner objects as nested objects.
{
"mappings": {
"key_container": {
"properties": {
"id": {
"type": "keyword",
"index": true
},
"keys": {
"type": "nested",
"properties": {
"key1": {
"type": "keyword",
"index": true
},
"key2": {
"type": "text",
"index": true
}
}
}
}
}
}
}
The keys property will contain the array of nested objects.
Than you can load the csv in two hops with logstash:
Index (Create) the base object containing only the id property
update the base object with the keys property containing the array of nested objects
The first logstash configuration (only the relevant part):
filter {
csv {
columns => ["id","key1","key1"]
separator => ","
# Remove the keys because the will be loaded in the next hop with update
remove_field => [ "key1", "key2"]
}
# Remove the row containing the column names
if [id] == "id" {
drop { }
}
}
output {
elasticsearch {
action => "index"
document_id => "%{id}"
hosts => [ "localhost:9200" ]
index => "key_container"
}
}
The second steps logstash configuration (you have to enable scripting in elasticsearch):
filter {
csv {
columns => ["id","key1","key2"]
separator => ","
}
# Convert the attributes into an object called 'key' that is passed to the script below (via the 'event' object)
mutate{
rename => {
"key1" => "[key][key1]"
"key2" => "[key][key2]"
}
}
}
output {
elasticsearch {
action => "update"
document_id => "%{id}"
doc_as_upsert => "true"
hosts => [ "localhost:9200" ]
index => "key_container"
script_lang => "groovy"
# key_container.keys is an array of key objects
# arrays can be built only with scripts and defined as an array when we put the first element into it
script => "if (ctx._source.containsKey('keys')) {ctx._source.keys += event.key} else {ctx._source.keys = [event.key]}"
}
}
Summary, you need this two hop loading because of array creation that requires scripting which is available only with update.

Accessing nested JSON Value with variable key name in Logstash

I've got a question regarding JSON in Logstash.
I have got a JSON Input that looks something like this:
{
"2": {
"name": "name2",
"state": "state2"
},
"1": {
"name": "name1",
"state": "state1"
},
"0": {
"name": "name0",
"state": "state0"
}
}
Now, let's say I want to add a field in the logstash config
json{
source => "message"
add_field => {
"NAME" => "%{ What to write here ?}"
"STATE" => "%{ What to write here ?}"
}
}
Is there a way to access the JSON Input such that I get a field Name with value name1, another field with name 2 and a third field with name 3. The first key in the JSON is changing, that means there can only be one or many more parts. So I don't want to hardcode it like
%{[0][name]}
Thanks for your help.
If you remove all new lines in your input you can simply use the json filter. You don't need any add_field action.
Working config without new lines:
filter {
json { source => message }
}
If you can't remove the new lines in your input you need to merge the lines with the multiline codec.
Working config with new lines:
input {
file {
path => ["/path/to/your/file"] # I suppose your input is a file.
start_position => "beginning"
sincedb_path => "/dev/null" # just for testing
codec => multiline {
pattern => "^}"
what => "previous"
negate => "true"
}
}
}
filter {
mutate { replace => { "message" => "%{message}}" } }
json { source => message }
}
I suppose that you use the file input. In case you don't, just change it.
Output (for both):
"2" => {
"name" => "name2",
"state" => "state2"
},
"1" => {
"name" => "name1",
"state" => "state1"
},
"0" => {
"name" => "name0",
"state" => "state0"
}

Logstash filter parse json file result a double fields

I am using the latest ELK (Elasticsearch 1.5.2 , Logstash 1.5.0, Kibana 4.0.2)
I have a question that
sample .json
{ "field1": "This is value1", "field2": "This is value2" }
longstash.conf
input {
stdin{ }
}
filter {
json {
source => "message"
add_field =>
{
"field1" => "%{field1}"
"field2" => "%{field2}"
}
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
host => "localhost"
index => "scan"
}
}
Output:
{
"message" => "{ \"field1\": \"This is value1\", \"field2\": \"This is value2\" }",
"#version" => "1",
"#timestamp" => "2015-05-07T06:02:56.088Z",
"host" => "myhost",
"field1" => [
[0] "This is value1",
[1] "This is value1"
],
"field2" => [
[0] "This is value2",
[1] "This is value2"
]
}
My question is 1) why the field result appear double in the result? 2) If there is nested array , how is it should reference in the logstash configure?
Thanks a lot!
..Petera
I think you have misunderstood what the json filter does. When you process a field through the json filter it will look for field names and corresponding values.
In your example, you have done that with this part:
filter {
json {
source => "message"
Then you have added a field called "field1" with the content of field "field1", since the field already exists you have just added the same information to the field that was already there, it has now become an array:
add_field =>
{
"field1" => "%{field1}"
"field2" => "%{field2}"
}
}
}
If you simplify your code to the following you should be fine:
filter {
json {
source => "message"
}
}
I suspect your question about arrays becomes moot at this point, as you probably don't need the nested array, and therefore, won't need to address it, but in case you do, I believe you can do this like so:
[field1][0]
[field1][1]