I am in the process of trying to use Logstash to convert an XML into JSON for ElasticSearch. I am able to get the the values read and sent to ElasticSearch. The issue is that all the values come out as arrays. I would like to make them come out as just strings. I know I can do a replace for each field individually, but then I run into an issue with nested fields being 3 levels deep.
XML
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<acs2:SubmitTestResult xmlns:acs2="http://tempuri.org/" xmlns:acs="http://schemas.sompleace.org" xmlns:acs1="http://schemas.someplace.org">
<acs2:locationId>Location Id</acs2:locationId>
<acs2:userId>User Id</acs2:userId>
<acs2:TestResult>
<acs1:CreatedBy>My Name</acs1:CreatedBy>
<acs1:CreatedDate>2015-08-07</acs1:CreatedDate>
<acs1:Output>10.5</acs1:Output>
</acs2:TestResult>
</acs2:SubmitTestResult>
Logstash Config
input {
file {
path => "/var/log/logstash/test.xml"
}
}
filter {
multiline {
pattern => "^\s\s(\s\s|\<\/acs2:SubmitTestResult\>)"
what => "previous"
}
if "multiline" in [tags] {
mutate {
replace => ["message", '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>%{message}']
}
xml {
target => "SubmitTestResult"
source => "message"
}
mutate {
remove_field => ["message", "#version", "host", "#timestamp", "path", "tags", "type"]
remove_field => ["entry", "[SubmitTestResult][xmlns:acs2]", "[SubmitTestResult][xmlns:acs]", "[SubmitTestResult][xmlns:acs1]"]
# This works
replace => [ "[SubmitTestResult][locationId]", "%{[SubmitTestResult][locationId]}" ]
# This does NOT work
replace => [ "[SubmitTestResult][TestResult][CreatedBy]", "%{[SubmitTestResult][TestResult][CreatedBy]}" ]
}
}
}
output {
stdout {
codec => "rubydebug"
}
elasticsearch {
index => "xmltest"
cluster => "logstash"
}
}
Example Output
{
"_index": "xmltest",
"_type": "logs",
"_id": "AU8IZBURkkRvuur_3YDA",
"_version": 1,
"found": true,
"_source": {
"SubmitTestResult": {
"locationId": "Location Id",
"userId": [
"User Id"
],
"TestResult": [
{
"CreatedBy": [
"My Name"
],
"CreatedDate": [
"2015-08-07"
],
"Output": [
"10.5"
]
}
]
}
}
}
As you can see, the output is an array for each element (except for the locationId I replaced with). I am trying to not have to do the replace for each element. Is there a way to adjust the config to make the output come put properly? If not, how do I get 3 levels deep in the replace?
--UPDATE--
I figured out how to get to the 3rd level in Test Results. The replace is:
replace => [ "[SubmitTestResult][TestResult][0][CreatedBy]", "%{[SubmitTestResult][TestResult][0][CreatedBy]}" ]
I figured it out. Here is the solution.
replace => [ "[SubmitTestResult][TestResult][0][CreatedBy]", "%{[SubmitTestResult][TestResult][0][CreatedBy]}" ]
Related
I have some JSON data sent in to my logstash filter and wish to mask secrets from appearing in Kibana. My log looks like this:
{
"payloads":
[
{
"sequence": 1,
"request":
{
"url": "https://hello.com",
"method": "POST",
"postData": "{\"one:\"1\",\"secret:"THISISSECRET",\"username\":\"hello\",\"secret2\":\"THISISALSOSECRET\"}",
},
"response":
{
"status": 200,
}
}
],
...
My filter converts the payloads to payload and I then wish to mask the JSON in postData to be:
"postData": "{\"one:\"1\",\"secret\":\"[secret]\",\"username\":\"hello\",\"secret2\":\"[secret]\"}"
My filter now looks like this:
if ([payloads]) {
split {
field => "payloads"
target => "payload"
remove_field => [payloads]
}
}
# innetTmp is set to JSON here - this works
json {
source => "innerTmp"
target => "parsedJson"
if [parsedJson][secret] =~ /.+/ {
remove_field => [ "secret" ]
add_field => { "secret" => "[secret]" }
}
if [parsedJson][secret2] =~ /.+/ {
remove_field => [ "secret2" ]
add_field => { "secret2" => "[secret]" }
}
}
Is this a correct approach? I cannot see the filter replacing my JSON key/values with "[secret]".
Kind regards /K
The approach is good, you are using the wrong field
After the split the secret field is part of postData and that field is part of parsedJson.
if [parsedJson][postData][secret] {
remove_field => [ "[parsedJson][postData][secret]" ]
add_field => { "[parsedJson][postData][secret]" => "[secret]" }
}
I am not able to parse JSON from CouchDB to Elasticsearch index in the desired way.
My CouchDB data looks like this:
{
"_id": "56161609157031561692637",
"_rev": "4-4119e8df293a6354be4c9fd7e8b12e68",
"deleteFlag": "N",
"entryUser": "John",
"parameter": "{\"id\":\"14188\",\"rcs_p\":null,\"rcs_e\":null,\"dep_p\":null,\"dep_e\":null,\"dep_place\":null,\"rcf_p\":null,\"rcf_e\":null,\"rcf_place\":null,\"dlv_p\":\"3810\",\"dlv_e\":\"1569\",\"seg_no\":null,\"trans_type\":\"incoming\",\"trans_service\":\"delivery\"}",
"physicalId": "0",
"recordDate": "2020-12-28T17:50:16+05:45",
"tag": "CARGO",
"uId": "56161609157031561692637",
"~version": "CgMBKgA="
}
What I am trying to do is be able to search using the nested field of the parameter of the above JSON.
When I put the data in ES index it is stored like this:
{
"_index": "del3",
"_type": "_doc",
"_id": "XRCV9XYBx5PRwauO--qO",
"_version": 1,
"_score": 0,
"_source": {
"#version": "1",
"doc_as_upsert": true,
"doc": {
"physicalId": "0",
"recordDate": "2020-12-27T12:56:45+05:45",
"tag": "CARGO",
"~version": "CgMBGgA=",
"uId": "48541609052212485430933",
"_rev": "3-937bf92e6010afec13664b1d9d06844b",
"deleteFlag": "N",
"entryUser": "John",
"parameter": "{\"id\":\"4038\",\"rcs_p\":null,\"rcs_e\":null,\"dep_p\":null,\"dep_e\":null,\"dep_place\":null,\"rcf_p\":null,\"rcf_e\":null,\"rcf_place\":null,\"dlv_p\":\"5070\",\"dlv_e\":\"2015\",\"seg_no\":null,\"trans_type\":\"incoming\",\"trans_service\":\"delivery\"}"
},
"#timestamp": "2021-01-12T07:53:33.978Z"
},
"fields": {
"#timestamp": [
"2021-01-12T07:53:33.978Z"
],
"doc.recordDate": [
"2020-12-27T07:11:45.000Z"
]
}
}
I want to be able to access the fields inside the parameter (id, rcs_p, rcs_e, ..) in Elasticsearch.
Here is my logstash.conf file:
input {
couchdb_changes {
host => "<host_name>"
port => 5984
db => "mychannel_asset$management"
keep_id => false
keep_revision => true
#initial_sequence => 0
always_reconnect => true
sequence_path => "/usr/share/logstash/config/seqfile"
}
}
filter {
json {
source => "[parameter]"
remove_field => ["[parameter]"]
}
}
output {
if([doc][tag] == "CARGO") {
elasticsearch {
hosts => ["http://elasticsearch:9200"]
index => "del3"
user => elastic
password => changeme
}
}
}
How do I achieve my desired result? I also tried to do by creating a custom template by defining a nested type for parameter but no luck yet. Any help would be appreciated.
I think you did almost everything right. I'm not too sure about the actual structure, but one of these might work:
filter {
json {
source => "parameter"
target => "parameter"
}
}
filter {
json {
source => "[doc][parameter]"
target => "[doc][parameter]"
}
}
I don't know how CouchDB source input plugins works but it seems to be putting everything under doc object.
I have an ELK stack that receives from filebeat structured JSON logs like these:
{"what": "Connected to proxy service", "who": "proxy.service", "when": "03.02.2016 13:29:51", "severity": "DEBUG", "more": {"host": "127.0.0.1", "port": 2004}}
{"what": "Service registered with discovery", "who": "proxy.discovery", "when": "03.02.2016 13:29:51", "severity": "DEBUG", "more": {"ctx": {"node": "igz0", "ip": "127.0.0.1:5301", "irn": "proxy"}, "irn": "igz0.proxy.827378e7-3b67-49ef-853c-242de033e645"}}
{"what": "Exception raised while setting service value", "who": "proxy.discovery", "when": "03.02.2016 13:46:34", "severity": "WARNING", "more": {"exc": "ConnectionRefusedError('Connection refused',)", "service": "igz0.proxy.827378e7-3b67-49ef-853c-242de033e645"}}
The "more" field which is a nested JSON is broken down (not sure by what part of the stack) to different fields ("more.host", "more.ctx" and such) in kibana.
This is my beats input:
input {
beats {
port => 5044
}
}
filter {
if [type] == "node" {
json {
source => "message"
add_field => {
"who" => "%{name}"
"what" => "%{msg}"
"severity" => "%{level}"
"when" => "%{time}"
}
}
} else {
json {
source => "message"
}
}
date {
match => [ "when" , "dd.MM.yyyy HH:mm:ss", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"]
}
}
And this is my output:
output {
elasticsearch {
hosts => ["localhost"]
sniffing => true
manage_template => false
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
stdout { codec => rubydebug }
}
Is there any way of making a field which will contain the entire "more" field without breaking it apart?
You should be able to use a ruby filter to take the hash and convert it back into a string.
filter {
ruby {
code => "event['more'] = event['more'].to_s"
}
}
You'd probably want to surround it with an if to make sure that the field exists first.
Hi I am trying to parse a json file. I have tried troubleshooting with suggestions from stackoverflow (links at bottom)but none have worked for me. I am hoping someone has some insight on probably a silly mistake I am making.
I have tried using only the json codec, only the json filter, as well as both. For some reason I am still getting this _jsonparsefailure. What can I do to get this to work?
Thanks in advance!
My json file:
{
"log": {
"version": "1.2",
"creator": {
"name": "WebInspector",
"version": "537.36"
},
"pages": [
{
"startedDateTime": "2015-10-13T20:28:46.081Z",
"id": "page_1",
"title": "https://demo.com",
"pageTimings": {
"onContentLoad": 377.8560000064317,
"onLoad": 377.66200001351535
}
},
{
"startedDateTime": "2015-10-13T20:29:01.734Z",
"id": "page_2",
"title": "https://demo.com",
"pageTimings": {
"onContentLoad": 1444.0670000039972,
"onLoad": 2279.20100002666
}
},
{
"startedDateTime": "2015-10-13T20:29:04.014Z",
"id": "page_3",
"title": "https://demo.com",
"pageTimings": {
"onContentLoad": 1802.0240000041667,
"onLoad": 2242.4060000048485
}
},
{
"startedDateTime": "2015-10-13T20:29:09.224Z",
"id": "page_4",
"title": "https://demo.com",
"pageTimings": {
"onContentLoad": 274.82699998654425,
"onLoad": 1453.034000005573
}
}
]
}
}
My logstash conf:
input {
file {
type => "json"
path => "/Users/anonymous/Documents/demo.json"
start_position => beginning
}
}
filter{
json{
source => "message"
}
}
output {
elasticsearch { host => localhost protocol => "http" port => "9200" }
stdout { codec => rubydebug }
}
Output I am getting from logstash hopefully with clues:
Trouble parsing json {:source=>"message", :raw=>" \"startedDateTime\": \"2015-10-19T18:05:37.887Z\",", :exception=>#<TypeError: can't convert String into Hash>, :level=>:warn}
{
"message" => " {",
"#version" => "1",
"#timestamp" => "2015-10-26T20:05:53.096Z",
"host" => "15mbp-09796.local",
"path" => "/Users/anonymous/Documents/demo.json",
"type" => "json",
"tags" => [
[0] "_jsonparsefailure"
]
}
Decompose Logstash json message into fields
How to use logstash's json filter?
I test my JSON here JSONLint. Perhaps this will solve your problem. The error I am getting is that it is expecting string.
It seems that you have an unnecessary comma(',') at the end. Either remove it or add another JSON variable after that.
I've got a question regarding JSON in Logstash.
I have got a JSON Input that looks something like this:
{
"2": {
"name": "name2",
"state": "state2"
},
"1": {
"name": "name1",
"state": "state1"
},
"0": {
"name": "name0",
"state": "state0"
}
}
Now, let's say I want to add a field in the logstash config
json{
source => "message"
add_field => {
"NAME" => "%{ What to write here ?}"
"STATE" => "%{ What to write here ?}"
}
}
Is there a way to access the JSON Input such that I get a field Name with value name1, another field with name 2 and a third field with name 3. The first key in the JSON is changing, that means there can only be one or many more parts. So I don't want to hardcode it like
%{[0][name]}
Thanks for your help.
If you remove all new lines in your input you can simply use the json filter. You don't need any add_field action.
Working config without new lines:
filter {
json { source => message }
}
If you can't remove the new lines in your input you need to merge the lines with the multiline codec.
Working config with new lines:
input {
file {
path => ["/path/to/your/file"] # I suppose your input is a file.
start_position => "beginning"
sincedb_path => "/dev/null" # just for testing
codec => multiline {
pattern => "^}"
what => "previous"
negate => "true"
}
}
}
filter {
mutate { replace => { "message" => "%{message}}" } }
json { source => message }
}
I suppose that you use the file input. In case you don't, just change it.
Output (for both):
"2" => {
"name" => "name2",
"state" => "state2"
},
"1" => {
"name" => "name1",
"state" => "state1"
},
"0" => {
"name" => "name0",
"state" => "state0"
}