Logstash: XML to JSON output from array to string - json

I am in the process of trying to use Logstash to convert an XML into JSON for ElasticSearch. I am able to get the the values read and sent to ElasticSearch. The issue is that all the values come out as arrays. I would like to make them come out as just strings. I know I can do a replace for each field individually, but then I run into an issue with nested fields being 3 levels deep.
XML
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<acs2:SubmitTestResult xmlns:acs2="http://tempuri.org/" xmlns:acs="http://schemas.sompleace.org" xmlns:acs1="http://schemas.someplace.org">
<acs2:locationId>Location Id</acs2:locationId>
<acs2:userId>User Id</acs2:userId>
<acs2:TestResult>
<acs1:CreatedBy>My Name</acs1:CreatedBy>
<acs1:CreatedDate>2015-08-07</acs1:CreatedDate>
<acs1:Output>10.5</acs1:Output>
</acs2:TestResult>
</acs2:SubmitTestResult>
Logstash Config
input {
file {
path => "/var/log/logstash/test.xml"
}
}
filter {
multiline {
pattern => "^\s\s(\s\s|\<\/acs2:SubmitTestResult\>)"
what => "previous"
}
if "multiline" in [tags] {
mutate {
replace => ["message", '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>%{message}']
}
xml {
target => "SubmitTestResult"
source => "message"
}
mutate {
remove_field => ["message", "#version", "host", "#timestamp", "path", "tags", "type"]
remove_field => ["entry", "[SubmitTestResult][xmlns:acs2]", "[SubmitTestResult][xmlns:acs]", "[SubmitTestResult][xmlns:acs1]"]
# This works
replace => [ "[SubmitTestResult][locationId]", "%{[SubmitTestResult][locationId]}" ]
# This does NOT work
replace => [ "[SubmitTestResult][TestResult][CreatedBy]", "%{[SubmitTestResult][TestResult][CreatedBy]}" ]
}
}
}
output {
stdout {
codec => "rubydebug"
}
elasticsearch {
index => "xmltest"
cluster => "logstash"
}
}
Example Output
{
"_index": "xmltest",
"_type": "logs",
"_id": "AU8IZBURkkRvuur_3YDA",
"_version": 1,
"found": true,
"_source": {
"SubmitTestResult": {
"locationId": "Location Id",
"userId": [
"User Id"
],
"TestResult": [
{
"CreatedBy": [
"My Name"
],
"CreatedDate": [
"2015-08-07"
],
"Output": [
"10.5"
]
}
]
}
}
}
As you can see, the output is an array for each element (except for the locationId I replaced with). I am trying to not have to do the replace for each element. Is there a way to adjust the config to make the output come put properly? If not, how do I get 3 levels deep in the replace?
--UPDATE--
I figured out how to get to the 3rd level in Test Results. The replace is:
replace => [ "[SubmitTestResult][TestResult][0][CreatedBy]", "%{[SubmitTestResult][TestResult][0][CreatedBy]}" ]

I figured it out. Here is the solution.
replace => [ "[SubmitTestResult][TestResult][0][CreatedBy]", "%{[SubmitTestResult][TestResult][0][CreatedBy]}" ]

Related

Logstash filter - mask secrets in json data / replace specific keys values

I have some JSON data sent in to my logstash filter and wish to mask secrets from appearing in Kibana. My log looks like this:
{
"payloads":
[
{
"sequence": 1,
"request":
{
"url": "https://hello.com",
"method": "POST",
"postData": "{\"one:\"1\",\"secret:"THISISSECRET",\"username\":\"hello\",\"secret2\":\"THISISALSOSECRET\"}",
},
"response":
{
"status": 200,
}
}
],
...
My filter converts the payloads to payload and I then wish to mask the JSON in postData to be:
"postData": "{\"one:\"1\",\"secret\":\"[secret]\",\"username\":\"hello\",\"secret2\":\"[secret]\"}"
My filter now looks like this:
if ([payloads]) {
split {
field => "payloads"
target => "payload"
remove_field => [payloads]
}
}
# innetTmp is set to JSON here - this works
json {
source => "innerTmp"
target => "parsedJson"
if [parsedJson][secret] =~ /.+/ {
remove_field => [ "secret" ]
add_field => { "secret" => "[secret]" }
}
if [parsedJson][secret2] =~ /.+/ {
remove_field => [ "secret2" ]
add_field => { "secret2" => "[secret]" }
}
}
Is this a correct approach? I cannot see the filter replacing my JSON key/values with "[secret]".
Kind regards /K
The approach is good, you are using the wrong field
After the split the secret field is part of postData and that field is part of parsedJson.
if [parsedJson][postData][secret] {
remove_field => [ "[parsedJson][postData][secret]" ]
add_field => { "[parsedJson][postData][secret]" => "[secret]" }
}

Parsing JSON from CouchDB to ElasticSearch via Logstash

I am not able to parse JSON from CouchDB to Elasticsearch index in the desired way.
My CouchDB data looks like this:
{
"_id": "56161609157031561692637",
"_rev": "4-4119e8df293a6354be4c9fd7e8b12e68",
"deleteFlag": "N",
"entryUser": "John",
"parameter": "{\"id\":\"14188\",\"rcs_p\":null,\"rcs_e\":null,\"dep_p\":null,\"dep_e\":null,\"dep_place\":null,\"rcf_p\":null,\"rcf_e\":null,\"rcf_place\":null,\"dlv_p\":\"3810\",\"dlv_e\":\"1569\",\"seg_no\":null,\"trans_type\":\"incoming\",\"trans_service\":\"delivery\"}",
"physicalId": "0",
"recordDate": "2020-12-28T17:50:16+05:45",
"tag": "CARGO",
"uId": "56161609157031561692637",
"~version": "CgMBKgA="
}
What I am trying to do is be able to search using the nested field of the parameter of the above JSON.
When I put the data in ES index it is stored like this:
{
"_index": "del3",
"_type": "_doc",
"_id": "XRCV9XYBx5PRwauO--qO",
"_version": 1,
"_score": 0,
"_source": {
"#version": "1",
"doc_as_upsert": true,
"doc": {
"physicalId": "0",
"recordDate": "2020-12-27T12:56:45+05:45",
"tag": "CARGO",
"~version": "CgMBGgA=",
"uId": "48541609052212485430933",
"_rev": "3-937bf92e6010afec13664b1d9d06844b",
"deleteFlag": "N",
"entryUser": "John",
"parameter": "{\"id\":\"4038\",\"rcs_p\":null,\"rcs_e\":null,\"dep_p\":null,\"dep_e\":null,\"dep_place\":null,\"rcf_p\":null,\"rcf_e\":null,\"rcf_place\":null,\"dlv_p\":\"5070\",\"dlv_e\":\"2015\",\"seg_no\":null,\"trans_type\":\"incoming\",\"trans_service\":\"delivery\"}"
},
"#timestamp": "2021-01-12T07:53:33.978Z"
},
"fields": {
"#timestamp": [
"2021-01-12T07:53:33.978Z"
],
"doc.recordDate": [
"2020-12-27T07:11:45.000Z"
]
}
}
I want to be able to access the fields inside the parameter (id, rcs_p, rcs_e, ..) in Elasticsearch.
Here is my logstash.conf file:
input {
couchdb_changes {
host => "<host_name>"
port => 5984
db => "mychannel_asset$management"
keep_id => false
keep_revision => true
#initial_sequence => 0
always_reconnect => true
sequence_path => "/usr/share/logstash/config/seqfile"
}
}
filter {
json {
source => "[parameter]"
remove_field => ["[parameter]"]
}
}
output {
if([doc][tag] == "CARGO") {
elasticsearch {
hosts => ["http://elasticsearch:9200"]
index => "del3"
user => elastic
password => changeme
}
}
}
How do I achieve my desired result? I also tried to do by creating a custom template by defining a nested type for parameter but no luck yet. Any help would be appreciated.
I think you did almost everything right. I'm not too sure about the actual structure, but one of these might work:
filter {
json {
source => "parameter"
target => "parameter"
}
}
filter {
json {
source => "[doc][parameter]"
target => "[doc][parameter]"
}
}
I don't know how CouchDB source input plugins works but it seems to be putting everything under doc object.

Un-breaking an analyzed field in kibana

I have an ELK stack that receives from filebeat structured JSON logs like these:
{"what": "Connected to proxy service", "who": "proxy.service", "when": "03.02.2016 13:29:51", "severity": "DEBUG", "more": {"host": "127.0.0.1", "port": 2004}}
{"what": "Service registered with discovery", "who": "proxy.discovery", "when": "03.02.2016 13:29:51", "severity": "DEBUG", "more": {"ctx": {"node": "igz0", "ip": "127.0.0.1:5301", "irn": "proxy"}, "irn": "igz0.proxy.827378e7-3b67-49ef-853c-242de033e645"}}
{"what": "Exception raised while setting service value", "who": "proxy.discovery", "when": "03.02.2016 13:46:34", "severity": "WARNING", "more": {"exc": "ConnectionRefusedError('Connection refused',)", "service": "igz0.proxy.827378e7-3b67-49ef-853c-242de033e645"}}
The "more" field which is a nested JSON is broken down (not sure by what part of the stack) to different fields ("more.host", "more.ctx" and such) in kibana.
This is my beats input:
input {
beats {
port => 5044
}
}
filter {
if [type] == "node" {
json {
source => "message"
add_field => {
"who" => "%{name}"
"what" => "%{msg}"
"severity" => "%{level}"
"when" => "%{time}"
}
}
} else {
json {
source => "message"
}
}
date {
match => [ "when" , "dd.MM.yyyy HH:mm:ss", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"]
}
}
And this is my output:
output {
elasticsearch {
hosts => ["localhost"]
sniffing => true
manage_template => false
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
stdout { codec => rubydebug }
}
Is there any way of making a field which will contain the entire "more" field without breaking it apart?
You should be able to use a ruby filter to take the hash and convert it back into a string.
filter {
ruby {
code => "event['more'] = event['more'].to_s"
}
}
You'd probably want to surround it with an if to make sure that the field exists first.

logstash json filter not parsing fields getting _jsonparsefailure

Hi I am trying to parse a json file. I have tried troubleshooting with suggestions from stackoverflow (links at bottom)but none have worked for me. I am hoping someone has some insight on probably a silly mistake I am making.
I have tried using only the json codec, only the json filter, as well as both. For some reason I am still getting this _jsonparsefailure. What can I do to get this to work?
Thanks in advance!
My json file:
{
"log": {
"version": "1.2",
"creator": {
"name": "WebInspector",
"version": "537.36"
},
"pages": [
{
"startedDateTime": "2015-10-13T20:28:46.081Z",
"id": "page_1",
"title": "https://demo.com",
"pageTimings": {
"onContentLoad": 377.8560000064317,
"onLoad": 377.66200001351535
}
},
{
"startedDateTime": "2015-10-13T20:29:01.734Z",
"id": "page_2",
"title": "https://demo.com",
"pageTimings": {
"onContentLoad": 1444.0670000039972,
"onLoad": 2279.20100002666
}
},
{
"startedDateTime": "2015-10-13T20:29:04.014Z",
"id": "page_3",
"title": "https://demo.com",
"pageTimings": {
"onContentLoad": 1802.0240000041667,
"onLoad": 2242.4060000048485
}
},
{
"startedDateTime": "2015-10-13T20:29:09.224Z",
"id": "page_4",
"title": "https://demo.com",
"pageTimings": {
"onContentLoad": 274.82699998654425,
"onLoad": 1453.034000005573
}
}
]
}
}
My logstash conf:
input {
file {
type => "json"
path => "/Users/anonymous/Documents/demo.json"
start_position => beginning
}
}
filter{
json{
source => "message"
}
}
output {
elasticsearch { host => localhost protocol => "http" port => "9200" }
stdout { codec => rubydebug }
}
Output I am getting from logstash hopefully with clues:
Trouble parsing json {:source=>"message", :raw=>" \"startedDateTime\": \"2015-10-19T18:05:37.887Z\",", :exception=>#<TypeError: can't convert String into Hash>, :level=>:warn}
{
"message" => " {",
"#version" => "1",
"#timestamp" => "2015-10-26T20:05:53.096Z",
"host" => "15mbp-09796.local",
"path" => "/Users/anonymous/Documents/demo.json",
"type" => "json",
"tags" => [
[0] "_jsonparsefailure"
]
}
Decompose Logstash json message into fields
How to use logstash's json filter?
I test my JSON here JSONLint. Perhaps this will solve your problem. The error I am getting is that it is expecting string.
It seems that you have an unnecessary comma(',') at the end. Either remove it or add another JSON variable after that.

Accessing nested JSON Value with variable key name in Logstash

I've got a question regarding JSON in Logstash.
I have got a JSON Input that looks something like this:
{
"2": {
"name": "name2",
"state": "state2"
},
"1": {
"name": "name1",
"state": "state1"
},
"0": {
"name": "name0",
"state": "state0"
}
}
Now, let's say I want to add a field in the logstash config
json{
source => "message"
add_field => {
"NAME" => "%{ What to write here ?}"
"STATE" => "%{ What to write here ?}"
}
}
Is there a way to access the JSON Input such that I get a field Name with value name1, another field with name 2 and a third field with name 3. The first key in the JSON is changing, that means there can only be one or many more parts. So I don't want to hardcode it like
%{[0][name]}
Thanks for your help.
If you remove all new lines in your input you can simply use the json filter. You don't need any add_field action.
Working config without new lines:
filter {
json { source => message }
}
If you can't remove the new lines in your input you need to merge the lines with the multiline codec.
Working config with new lines:
input {
file {
path => ["/path/to/your/file"] # I suppose your input is a file.
start_position => "beginning"
sincedb_path => "/dev/null" # just for testing
codec => multiline {
pattern => "^}"
what => "previous"
negate => "true"
}
}
}
filter {
mutate { replace => { "message" => "%{message}}" } }
json { source => message }
}
I suppose that you use the file input. In case you don't, just change it.
Output (for both):
"2" => {
"name" => "name2",
"state" => "state2"
},
"1" => {
"name" => "name1",
"state" => "state1"
},
"0" => {
"name" => "name0",
"state" => "state0"
}