Parsing JSON from CouchDB to ElasticSearch via Logstash - json

I am not able to parse JSON from CouchDB to Elasticsearch index in the desired way.
My CouchDB data looks like this:
{
"_id": "56161609157031561692637",
"_rev": "4-4119e8df293a6354be4c9fd7e8b12e68",
"deleteFlag": "N",
"entryUser": "John",
"parameter": "{\"id\":\"14188\",\"rcs_p\":null,\"rcs_e\":null,\"dep_p\":null,\"dep_e\":null,\"dep_place\":null,\"rcf_p\":null,\"rcf_e\":null,\"rcf_place\":null,\"dlv_p\":\"3810\",\"dlv_e\":\"1569\",\"seg_no\":null,\"trans_type\":\"incoming\",\"trans_service\":\"delivery\"}",
"physicalId": "0",
"recordDate": "2020-12-28T17:50:16+05:45",
"tag": "CARGO",
"uId": "56161609157031561692637",
"~version": "CgMBKgA="
}
What I am trying to do is be able to search using the nested field of the parameter of the above JSON.
When I put the data in ES index it is stored like this:
{
"_index": "del3",
"_type": "_doc",
"_id": "XRCV9XYBx5PRwauO--qO",
"_version": 1,
"_score": 0,
"_source": {
"#version": "1",
"doc_as_upsert": true,
"doc": {
"physicalId": "0",
"recordDate": "2020-12-27T12:56:45+05:45",
"tag": "CARGO",
"~version": "CgMBGgA=",
"uId": "48541609052212485430933",
"_rev": "3-937bf92e6010afec13664b1d9d06844b",
"deleteFlag": "N",
"entryUser": "John",
"parameter": "{\"id\":\"4038\",\"rcs_p\":null,\"rcs_e\":null,\"dep_p\":null,\"dep_e\":null,\"dep_place\":null,\"rcf_p\":null,\"rcf_e\":null,\"rcf_place\":null,\"dlv_p\":\"5070\",\"dlv_e\":\"2015\",\"seg_no\":null,\"trans_type\":\"incoming\",\"trans_service\":\"delivery\"}"
},
"#timestamp": "2021-01-12T07:53:33.978Z"
},
"fields": {
"#timestamp": [
"2021-01-12T07:53:33.978Z"
],
"doc.recordDate": [
"2020-12-27T07:11:45.000Z"
]
}
}
I want to be able to access the fields inside the parameter (id, rcs_p, rcs_e, ..) in Elasticsearch.
Here is my logstash.conf file:
input {
couchdb_changes {
host => "<host_name>"
port => 5984
db => "mychannel_asset$management"
keep_id => false
keep_revision => true
#initial_sequence => 0
always_reconnect => true
sequence_path => "/usr/share/logstash/config/seqfile"
}
}
filter {
json {
source => "[parameter]"
remove_field => ["[parameter]"]
}
}
output {
if([doc][tag] == "CARGO") {
elasticsearch {
hosts => ["http://elasticsearch:9200"]
index => "del3"
user => elastic
password => changeme
}
}
}
How do I achieve my desired result? I also tried to do by creating a custom template by defining a nested type for parameter but no luck yet. Any help would be appreciated.

I think you did almost everything right. I'm not too sure about the actual structure, but one of these might work:
filter {
json {
source => "parameter"
target => "parameter"
}
}
filter {
json {
source => "[doc][parameter]"
target => "[doc][parameter]"
}
}
I don't know how CouchDB source input plugins works but it seems to be putting everything under doc object.

Related

Access nested JSON in React table

I want to display nested JSON data in a react-table.
I tried it like this:
render() {
const columns = [{
//Not Displaying
Header: 'Owner ID',
id: 'ownerid',
accessor: '_links.customer.href.ownerid', // <- i think this is wrong
Cell: this.renderEditable
},
{
//Displaying
Header: 'Price',
accessor: 'price',
Cell: this.renderEditable
}, {
The data i am getting back and have bound to the table is structured as follows:
[
{
"id": 1,
"date": "20.07.2019",
"price": 3.2,
"customer": {
"ownerid": 1,
"firstname": "John",
"lastname": "Johnson"
}
}
]
Here i am using the columns array:
import ReactTable from "react-table";
<ReactTable data={this.state.offers} columns={columns}
filterable={true} pageSize={10}/>
Binding the data:
fetchOffers = () => {
const token = sessionStorage.getItem("jwt");
fetch(SERVER_URL + 'api/offers',
{
headers : {'Authorization':token}
})
.then((response) => response.json())
.then((responsteData) => {
this.setState({
offers: responsteData._embedded.offers,
});
console.log(this.state);
})
.catch(err=> console.error(err));
}
The data i am using after binding:
Check the Accessors documentation. It has several examples for complex data structure.
I don't see _links or href in your sample data. So I think that you need just:
accessor: 'customer.ownerid'
The data structure from the console screenshot doesn't match your sample data. And it doesn't seem to contain ownerid. Try accessor: '_links.customer.href' to check whether it outputs anything to the table.
I figured it out.
I called the endpoint "localhost:8080/api/offers" and saved the following response:
"offers": [
{
"date": "20.07.2019",
"price": 3.2,
"_links": {
"self": {
"href": "http://localhost:8080/api/offers/1"
},
"offer": {
"href": "http://localhost:8080/api/offers/1"
},
"customer": {
"href": "http://localhost:8080/api/offers/1/customer"
}
}
}
]
there is no customer object
But when i call "localhost:8080/offers" i get:
[
{
"id": 1,
"date": "20.07.2019",
"price": 3.2,
"customer": {
"ownerid": 1,
"firstname": "John",
"lastname": "Johnson"
}
}
]
i changed the URI in my project and now the number is displaying.
I still don't know why i get data from "../api/offers" but i will research.
I had to access a nested object and display it with some styling, and this ended up working for me:
(Note: I was using typescript, so some of the typing might not be necessary)
{
Header: 'Thing To Display',
accessor: 'nested.thing.to.display',
Cell: ({ row }: { row: Row }) => (
<p>{row.values['nested.thing.to.display']}</p>
),
}

Logstash - import nested JSON into Elasticsearch

I have a large amount (~40000) of nested JSON objects I want to insert into elasticsearch an index.
The JSON objects are structured like this:
{
"customerid": "10932"
"date": "16.08.2006",
"bez": "xyz",
"birthdate": "21.05.1990",
"clientid": "2",
"address": [
{
"addressid": "1",
"tile": "Mr",
"street": "main str",
"valid_to": "21.05.1990",
"valid_from": "21.05.1990",
},
{
"addressid": "2",
"title": "Mr",
"street": "melrose place",
"valid_to": "21.05.1990",
"valid_from": "21.05.1990",
}
]
}
So a JSON field (address in this example) can have an array of JSON objects.
What would a logstash config look like to import JSON files/objects like this into elasticsearch? The elasticsearch mapping for this index should just look like the structure of the JSON. The elasticsearch document id should be set to customerid.
input {
stdin {
id => "JSON_TEST"
}
}
filter {
json{
source => "customerid"
....
....
}
}
output {
stdout{}
elasticsearch {
hosts => "https://localhost:9200/"
index => "customers"
document_id => "%{customerid}"
}
}
If you have control of what's being generated, the easiest thing to do is to format you input as single line json and then use the json_lines codec.
Just change your stdin to:
stdin { codec => "json_lines" }
and then it'll just work:
cat input_file.json | logstash -f json_input.conf
where input_file.json has lines like:
{"customerid":1,"nested": {"json":"here"}}
{"customerid":2,"nested": {"json":"there"}}
and then you won't need the json filter.

Parse JSON message in Logstash

I am sending my jenkins logs to logstash with following config:
redis {
host => "localhost"
key => "logstash"
data_type => "list"
codec => json
}
This works as smooth as expected, now i see follwoing message in KIBANA:
{
"_index": "logstash-2015.12.18",
"_type": "logs",
"_id": "AVG1BN5LXZBIbp7HE4xN",
"_score": null,
"_source": {
"data": {
"id": "965",
"projectName": "NicePJ",
"displayName": "#965",
"fullDisplayName": "NicePJ",
"url": "job/NIcePJ/965/",
"buildHost": "Jenkins",
"buildLabel": "master",
"buildNum": 965,
"buildDuration": 1,
"rootProjectName": "NicePJ",
"rootProjectDisplayName": "#965",
"rootBuildNum": 965,
"buildVariables": {
"target_SUT": "0201",
"report_warnings": "false",
"product": "Ours",
"testsuite": "Exciting_stuff5",
"qft_version": "current",
"target_task": "t324",
"branch": "test",
"testcase": "",
"revision": "HEAD",
"node": "hsqs960",
"client": "Desktop",
"run_specific_test": "false",
"user": "xxxxx"
}
},
"message": [
"A This is a message XYZ"
],
"source": "jenkins",
"source_host": "http://serverXL:8080/",
"#timestamp": "2015-12-18T12:16:02.000Z",
"#version": 1
},
"fields": {
"#timestamp": [
1450440962000
]
},
"sort": [
1450440962000
]
}
Now i want to filter the message field for certain messages, but i cant get it work. How can i filter the message field and how can i access the buildHost field to use it in an if statement in the pipeline?
Following i tried after many examples:
if[data][buildHost]== "jenkins"
{
grok
{
match => { "message[0]" => "\[exec\]\s*\<%{GREEDYDATA:test}\s*\[%{GREEDYDATA:result}\]" }
}
}
But this is just not working at all, please help me out.
Conditional
The == compares simple string and case sensitive, so "jenkins" will not match as your data shows ("buildHost": "Jenkins",):
if[data][buildHost]== "jenkins"
But following does:
if[data][buildHost]== "Jenkins"
If you need match both, you can either use || or regex =~.
Grok
The grok is a filter to parse message with regex pattern. You can test your regex pattern with
online grok debugger
Kibana dev tools's grok debugger

logstash json filter not parsing fields getting _jsonparsefailure

Hi I am trying to parse a json file. I have tried troubleshooting with suggestions from stackoverflow (links at bottom)but none have worked for me. I am hoping someone has some insight on probably a silly mistake I am making.
I have tried using only the json codec, only the json filter, as well as both. For some reason I am still getting this _jsonparsefailure. What can I do to get this to work?
Thanks in advance!
My json file:
{
"log": {
"version": "1.2",
"creator": {
"name": "WebInspector",
"version": "537.36"
},
"pages": [
{
"startedDateTime": "2015-10-13T20:28:46.081Z",
"id": "page_1",
"title": "https://demo.com",
"pageTimings": {
"onContentLoad": 377.8560000064317,
"onLoad": 377.66200001351535
}
},
{
"startedDateTime": "2015-10-13T20:29:01.734Z",
"id": "page_2",
"title": "https://demo.com",
"pageTimings": {
"onContentLoad": 1444.0670000039972,
"onLoad": 2279.20100002666
}
},
{
"startedDateTime": "2015-10-13T20:29:04.014Z",
"id": "page_3",
"title": "https://demo.com",
"pageTimings": {
"onContentLoad": 1802.0240000041667,
"onLoad": 2242.4060000048485
}
},
{
"startedDateTime": "2015-10-13T20:29:09.224Z",
"id": "page_4",
"title": "https://demo.com",
"pageTimings": {
"onContentLoad": 274.82699998654425,
"onLoad": 1453.034000005573
}
}
]
}
}
My logstash conf:
input {
file {
type => "json"
path => "/Users/anonymous/Documents/demo.json"
start_position => beginning
}
}
filter{
json{
source => "message"
}
}
output {
elasticsearch { host => localhost protocol => "http" port => "9200" }
stdout { codec => rubydebug }
}
Output I am getting from logstash hopefully with clues:
Trouble parsing json {:source=>"message", :raw=>" \"startedDateTime\": \"2015-10-19T18:05:37.887Z\",", :exception=>#<TypeError: can't convert String into Hash>, :level=>:warn}
{
"message" => " {",
"#version" => "1",
"#timestamp" => "2015-10-26T20:05:53.096Z",
"host" => "15mbp-09796.local",
"path" => "/Users/anonymous/Documents/demo.json",
"type" => "json",
"tags" => [
[0] "_jsonparsefailure"
]
}
Decompose Logstash json message into fields
How to use logstash's json filter?
I test my JSON here JSONLint. Perhaps this will solve your problem. The error I am getting is that it is expecting string.
It seems that you have an unnecessary comma(',') at the end. Either remove it or add another JSON variable after that.

Logstash: XML to JSON output from array to string

I am in the process of trying to use Logstash to convert an XML into JSON for ElasticSearch. I am able to get the the values read and sent to ElasticSearch. The issue is that all the values come out as arrays. I would like to make them come out as just strings. I know I can do a replace for each field individually, but then I run into an issue with nested fields being 3 levels deep.
XML
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<acs2:SubmitTestResult xmlns:acs2="http://tempuri.org/" xmlns:acs="http://schemas.sompleace.org" xmlns:acs1="http://schemas.someplace.org">
<acs2:locationId>Location Id</acs2:locationId>
<acs2:userId>User Id</acs2:userId>
<acs2:TestResult>
<acs1:CreatedBy>My Name</acs1:CreatedBy>
<acs1:CreatedDate>2015-08-07</acs1:CreatedDate>
<acs1:Output>10.5</acs1:Output>
</acs2:TestResult>
</acs2:SubmitTestResult>
Logstash Config
input {
file {
path => "/var/log/logstash/test.xml"
}
}
filter {
multiline {
pattern => "^\s\s(\s\s|\<\/acs2:SubmitTestResult\>)"
what => "previous"
}
if "multiline" in [tags] {
mutate {
replace => ["message", '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>%{message}']
}
xml {
target => "SubmitTestResult"
source => "message"
}
mutate {
remove_field => ["message", "#version", "host", "#timestamp", "path", "tags", "type"]
remove_field => ["entry", "[SubmitTestResult][xmlns:acs2]", "[SubmitTestResult][xmlns:acs]", "[SubmitTestResult][xmlns:acs1]"]
# This works
replace => [ "[SubmitTestResult][locationId]", "%{[SubmitTestResult][locationId]}" ]
# This does NOT work
replace => [ "[SubmitTestResult][TestResult][CreatedBy]", "%{[SubmitTestResult][TestResult][CreatedBy]}" ]
}
}
}
output {
stdout {
codec => "rubydebug"
}
elasticsearch {
index => "xmltest"
cluster => "logstash"
}
}
Example Output
{
"_index": "xmltest",
"_type": "logs",
"_id": "AU8IZBURkkRvuur_3YDA",
"_version": 1,
"found": true,
"_source": {
"SubmitTestResult": {
"locationId": "Location Id",
"userId": [
"User Id"
],
"TestResult": [
{
"CreatedBy": [
"My Name"
],
"CreatedDate": [
"2015-08-07"
],
"Output": [
"10.5"
]
}
]
}
}
}
As you can see, the output is an array for each element (except for the locationId I replaced with). I am trying to not have to do the replace for each element. Is there a way to adjust the config to make the output come put properly? If not, how do I get 3 levels deep in the replace?
--UPDATE--
I figured out how to get to the 3rd level in Test Results. The replace is:
replace => [ "[SubmitTestResult][TestResult][0][CreatedBy]", "%{[SubmitTestResult][TestResult][0][CreatedBy]}" ]
I figured it out. Here is the solution.
replace => [ "[SubmitTestResult][TestResult][0][CreatedBy]", "%{[SubmitTestResult][TestResult][0][CreatedBy]}" ]