fluentd nested json parsing

fluentd nested json parsing - json

I have logs like the following:
{
"log": {
"header": {
"key": "value",
"nested": "{\"key1\":\"value\",\"key2\":\"value\"}",
"dateTime": "2019-05-08T20:58:06+00:00"
},
"body": {
"path": "/request/path/",
"method": "POST",
"ua": "curl/7.54.0",
"resp": 200
}
}
}
I'm trying to aggregate logs using fluentd and I want the entire record to be JSON. The specific problem is the "$.log.header.nested" field, which is a JSON string. How can I parse and replace that string with its contents?
For clarity, I'd like the logs output by fluentd to look like this:
{
"log": {
"header": {
"key": "value",
"nested": {
"key1": "value",
"key2": "value"
},
"dateTime": "2019-05-08T20:58:06+00:00"
},
"body": {
"path": "/request/path/",
"method": "POST",
"ua": "curl/7.54.0",
"resp": 200
}
}
}
I've found a way to parse the nested field as JSON, but storing to back to the same key it was parsed from isn't clear. It doesn't seem like hash_value_field supports storing to a nested key. Is there some other way to accomplish this?

The following config seems to accomplish what I want. However, I'm not sure if this is the best way. I assume using ruby is far less performant. Any improvements to this are welcome.
<filter logs>
#type parser
key_name "$.log.header.nested"
hash_value_field "parsed_nested"
reserve_data true
remove_key_name_field true
<parse>
#type json
</parse>
</filter>
<filter logs>
#type record_transformer
enable_ruby true
<record>
parsed_nested ${record["log"]["header"]["nested"] = record["parsed_nested"]}
</record>
remove_keys parsed_nested
</filter>

Related

Can Filebeat parse JSON fields instead of the whole JSON object into kibana?

I am able to get a single JSON object in Kibana:
By having this in the filebeat.yml file:
output.elasticsearch:
hosts: ["localhost:9200"]
How can I get the individual elements in the JSON string. So say if I wanted to compare all the "pseudorange" fields of all my JSON objects. How would I:
Select "pseudorange" field from all my JSON messages to compare them.
Compare them visually in kibana. At the moment I can't even find the message let alone the individual fields in the visualisation tab...
I have heard of people using logstash to parse the string somehow but is there no way of doing this simply with filebeat? If there isn't then what do I do with logstash to help filter the individual fields in the json instead of have my message just one big json string that I cannot interact with?
I get the following output from output.console, note I am putting some information in <> to hide it:
"#timestamp": "2021-03-23T09:37:21.941Z",
"#metadata": {
"beat": "filebeat",
"type": "doc",
"version": "6.8.14",
"truncated": false
},
"message": "{\n\t\"Signal_data\" : \n\t{\n\t\t\"antenna type:\" : \"GPS\",\n\t\t\"frequency type:\" : \"GPS\",\n\t\t\"position x:\" : 0.0,\n\t\t\"position y:\" : 0.0,\n\t\t\"position z:\" : 0.0,\n\t\t\"pseudorange:\" : 20280317.359730639,\n\t\t\"pseudorange_error:\" : 0.0,\n\t\t\"pseudorange_rate:\" : -152.02620448094211,\n\t\t\"svid\" : 18\n\t}\n}\u0000",
"source": <ip address>,
"log": {
"source": {
"address": <ip address>
}
},
"input": {
"type": "udp"
},
"prospector": {
"type": "udp"
},
"beat": {
"name": <ip address>,
"hostname": "ip-<ip address>",
"version": "6.8.14"
},
"host": {
"name": "ip-<ip address>",
"os": {
<ubuntu info>
},
"id": <id>,
"containerized": false,
"architecture": "x86_64"
},
"meta": {
"cloud": {
<cloud info>
}
}
}

In Filebeat, you can leverage the decode_json_fields processor in order to decode a JSON string and add the decoded fields into the root obejct:
processors:
- decode_json_fields:
fields: ["message"]
process_array: false
max_depth: 2
target: ""
overwrite_keys: true
add_error_key: false

Credit to Val for this. His answer worked however as he suggested my JSON string had a \000 at the end which stops it being JSON and prevented the decode_json_fields processor from working as it should...
Upgrading to version 7.12 of Filebeat (also ensure version 7.12 of Elasticsearch and Kibana because mismatched versions between them can cause issues) allows us to use the script processor: https://www.elastic.co/guide/en/beats/filebeat/current/processor-script.html.
Credit to Val here again, this script removed the null terminator:
- script:
lang: javascript
id: trim
source: >
function process(event) {
event.Put("message", event.Get("message").trim());
}
After the null terminator was removed the decode_json_fields processor did its job as Val suggested and I was able to extract the individual elements of the JSON field which allowed Kibana visualisation to look at the elements I wanted!

Elasticsearch dynamic mapping for object within attribute

Wondering if I can create a "dynamic mapping" within an elasticsearch index. The problem I am trying to solve is the following: I have a schema that has an attribute that contains an object that can differ greatly between records. I would like to mirror this data within elasticsearch if possible but believe that automatic mapping may get in the way.
Imagine a scenario where I have a schema like the following:
{
name: string
origin: string
payload: object // can be of any type / schema
}
Is it possible to create a mapping that supports this? I do not need to query the records by this payload attribute, but it would be great if I can.
Note that I have checked the documentation but am confused on if what elastic calls dynamic mapping is what I am looking for.

It's certainly possible to specify which queryable fields you expect the payload to contain and what those fields' mappings should be.
Let's say each doc will include the fields payload.livemode and payload.created_at. If these are the only two fields you'll want to perform queries on, and you'd like to disable dynamic, index-time mappings autogenerated by Elasticsearch for the rest of the fields, you can use dynamic templates like so:
PUT my-payload-index
{
"mappings": {
"dynamic_templates": [
{
"variable_payload": {
"path_match": "payload",
"mapping": {
"type": "object",
"dynamic": false,
"properties": {
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"livemode": {
"type": "boolean"
}
}
}
}
}
],
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"origin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Then, as you ingest your docs:
POST my-payload-index/_doc
{
"name": "abc",
"origin": "web.dev",
"payload": {
"created_at": "2021-04-05 08:00:00",
"livemode": false,
"abc":"def"
}
}
POST my-payload-index/_doc
{
"name": "abc",
"origin": "web.dev",
"payload": {
"created_at": "2021-04-05 08:00:00",
"livemode": true,
"modified_at": "2021-04-05 09:00:00"
}
}
and verify with
GET my-payload-index/_mapping
no new mappings will be generated for the fields payload.abc nor payload.modified_at.
Not only that — the new fields will also be ignored, as per the documentation:
These fields will not be indexed or searchable, but will still appear in the _source field of returned hits.
Side note: if fields are neither stored nor searchable, they're effectively the opposite of enabled.
The Big Picture
Working with variable contents of a single, top-level object is quite standard. Take for instance the stripe event object — each event has an id, an api_version and a few other shared params. Then there's the data object that's analogous to your payload field.
Now, all is fine, until you need to aggregate on the contents of your payload. See, since the content is variable, so are the data paths / accessors. But wildcards in aggregation paths don't work in Elasticsearch. Scripts do but are onerous to maintain.
Back to stripe. They partially solved it through what they call polymorphic, typed hashes — as discussed in their blog on API design:
A pretty neat approach that's worth emulating.
P.S. I discuss dynamic templates in more detail in the chapter "Mapping Automation" of my ES Handbook.

Stringify JSON in Logic App

We are sending messages to a service bus using a logic app. These messages will later be consumed by another service, the service expects the message content to be a string - essentially a stringified JSON object, with escape characters.
We are not able to find a method to stringify a JSON object in Logic Apps. Even if we explicitly provide a escaped string the logic app itself detects that it's stringified JSON and unescapes it and then sends it as a JSON object. We don't want that, we simply want it to send the string as it is. We have already tried changing the content type to text/plain, it does not work. The logic app always sends the unescaped string as JSON.
This post on MSDN: https://social.msdn.microsoft.com/Forums/office/en-US/e5dee958-09a7-4784-b1bf-facdd6b8a568/post-json-from-logic-app-how-to-escape-data?forum=azurelogicapps is of no help because doing this will violate the request contract of the message consuming service

Do you need the stringified message to include opening and closing double quotes?
I've tried this and it worked for me.
I have my JSON object as an output of a compose
Then, I initialised a variable with the Base64 encoded value of the escaped stringified JSON (you need to add ALL the proper escaping required,
mine was just a PoC)
Then, you send the variable already in Base64 to Service Bus. (You need to remove the encoding on that action).
"actions": {
"Compose_JSON_Object": {
"inputs": {
"message": "I want this as a string"
},
"runAfter": {},
"type": "Compose"
},
"Initialise_Variable_with_Stringified_JSON_Base64_Encoded": {
"inputs": {
"variables": [
{
"name": "jsonAsStringBase64",
"type": "String",
"value": "#base64(concat('\"', replace(string(outputs('Compose_JSON_Object')), '\"', '\\\"'), '\"'))"
}
]
},
"runAfter": {
"Compose_JSON_Object": [
"Succeeded"
]
},
"type": "InitializeVariable"
},
"Send_message": {
"inputs": {
"body": {
"ContentData": "#variables('jsonAsStringBase64')",
"ContentType": "text/plain"
},
"host": {
"connection": {
"name": "#parameters('$connections')['servicebus']['connectionId']"
}
},
"method": "post",
"path": "/#{encodeURIComponent(encodeURIComponent('temp'))}/messages",
"queries": {
"systemProperties": "None"
}
},
"runAfter": {
"Initialise_Variable_with_Stringified_JSON_Base64_Encoded": [
"Succeeded"
]
},
"type": "ApiConnection"
}
},
This way, I got the message stringified.
HTH

Accepting an image as an input by reading a JSON schema?

In a JSON Schema we can specify what type of entity we are expecting such as a string like
"Name": {
"type": [
"string",
"null"
]
}
Can we do something same for expecting images as input?

JSON Hyper-Schema defines a media keyword that can allow you to specify an image as an input. Most JSON Schema validators don't support Hyper-Schema, but if you have one that does, this could be useful.
{
"type": "string",
"media": {
"binaryEncoding": "base64",
"type": "image/png"
}
}
http://json-schema.org/latest/json-schema-hypermedia.html#anchor10

JSON document inserted as binary object in Couchbase

I'm trying to insert a java POJO into the couchbase store and the json just below the cas call looks like this -
{
"key": "sampleKey",
"myMap": {
"Messages": [
{
"field": "f1",
"label": "l1"
},
{
"field": "f2",
"label": "l2"
},
{
"field": "f3",
"label": "l3"
},
{
"field": "f4",
"label": "l4"
}
],
"Orders": [
{
"field": "f1",
"label": "l1"
},
{
"field": "f2",
"label": "l2"
},
{
"field": "f3",
"label": "l3"
},
{
"field": "f4",
"label": "l4"
},
{
"field": "f5",
"label": "l5"
}
]
}
}
I have verified that this is a valid JSON and it's still being inserted as binary object as I try to look up this document via couchbase GUI, it shows up the base64 encoded string. A couple of other documents are fine though. I am wondering if this is happening only for the cas method and not set.
The relevant java code is this:
String myJson = objectMapper.writeValueAsString(cacheObject);
CASResponse response = couchbaseClient.cas(cacheObject.getKey(), casValue.getCas(), myJson, PersistTo.MASTER);
// Java pojo
public class CacheObject
{
private String key;
private Map<String, List<FieldLabel>> myMap = new HashMap<String, List<FieldLabel>>();
// setters and getters
}
Any pointers on why this could be happening will be appreciated.
Update1: I'm using Couchbase java client version 1.4.4, server's 2.5
Update2: I don't think this has to do with my code or json, I tried replacing my json with a large json document (a valid one) and I saw the same result in the couchbase GUI. I think this's happening because size of the document may go over 2.5KB. The json I pasted above has the actual field and labels removed, they are slightly longer strings.
Strangely, when I modify my document, documents below 960 characters generally show up as Json, however slightly above ones are stored as binary.

If the size of the document is above 2.5KB the document will not be editable in console and this value can be changed in a file called documents.js

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

fluentd nested json parsing - json

Related

Can Filebeat parse JSON fields instead of the whole JSON object into kibana?

Elasticsearch dynamic mapping for object within attribute

Stringify JSON in Logic App

Accepting an image as an input by reading a JSON schema?

JSON document inserted as binary object in Couchbase

Categories

Resources