JOLT Spec to build arrays from group of similarly named keys - json

Trying to build a JOLT spec to put similarly named keys into an array, and strip off leading 6 characters. In below example, all keys starting with "fd1lk1" would go into array "Link1", keys starting with "fd1lk2" would go into array "Link2", etc.
Thanks for any help!
Source JSON:
{
"EventName": "WidgetFeedImpression",
"WidgetName": "_blah_2019.08.17",
"WidgetID": "5d56ef313db7c300018d9c66",
"WidgetVariationName": "_blah_2019.08.17",
"WidgetVariationID": "5b5f524eb1932300014d0928",
"WidgetTemplate": "blah-six-grid-wth-image",
"fd1lk1Title": "link 1 title",
"fd1lk1Image": "link 1 image",
"fd1lk1TargetURL": "link 1 url",
"fd1lk1Position": "1",
"fd1lk1Id": "fd5da8ce0f8701a3000190efbdlk1",
"fd1lk2Title": "link 2 title",
"fd1lk2Image": "link 2 image",
"fd1lk2TargetURL": "link 2 url",
"fd1lk2Position": "2",
"fd1lk2Id": "fd5da8ce0f8701a3000190efbdlk2",
"gtmcb": "1878625665",
"CreatedAtUtc": "2019-10-24T16:57:01.5274702Z"
}
Desired Output:
{
"EventName": "WidgetFeedImpression",
"WidgetName": "_blah_2019.08.17",
"WidgetID": "5d56ef313db7c300018d9c66",
"WidgetVariationName": "_blah_2019.08.17",
"WidgetVariationID": "5b5f524eb1932300014d0928",
"WidgetTemplate": "blah-six-grid-wth-image",
"Link1" : [ {
"Title" : "link 1 title",
"Image" : "link 1 image",
"TargetURL" : "link 1 url",
"Position" : "1",
"Id" : "fd5da8ce0f8701a3000190efbdlk1"
} ],
"Link2" : [ {
"Title" : "link 2 title",
"Image" : "link 2 image",
"TargetURL" : "link 2 url",
"Position" : "2",
"Id" : "fd5da8ce0f8701a3000190efbdlk2"
} ],
"gtmcb": "1878625665",
"CreatedAtUtc": "2019-10-24T16:57:01.5274702Z"
}

The * in fd1lk*Title is extracted and used with &(0,1):
[
{
"operation": "shift",
"spec": {
"fd1lk*Title": "Link&(0,1).[0].Title",
"fd1lk*Image": "Link&(0,1).[0].Image",
"fd1lk*TargetURL": "Link&(0,1).[0].TargetURL",
"fd1lk*Position": "Link&(0,1).[0].Position",
"fd1lk*Id": "Link&(0,1).[0].Id",
"*": {
"#": "&"
}
}
}
]
See here for the basic concept.

Related

how to use $ and * at the same level in a spec

I am new to jolt and whilst i like lots of it one thing thats really hurting me right now is how to use * and $ at the same level in a spec. I have the following desired input and output. But try as i might i cannot seem to transform both the list of action ids (there are the "1" and "2" into attribute values and move the list of action data associated with the id into a sub attribute.
Input
{
"Attr1": "Attr1_data",
"Actions": {
"1": [
"Action data 1 line 1",
"Action data 1 line 2",
"Action data 1 line 3"
],
"2": [
"Action data 2 line 1",
"Action data 2 line 2",
"Action data 2 line 3"
]
},
"Attr2": "Attr2_data"
}
Desired
{
"Attr1": "Attr1_data",
"Action": [
{
"id" : "1",
"data" : [
"Action data 1 line 1",
"Action data 1 line 2",
"Action data 1 line 3"
]
},
{
"id" : "2",
"data" : [
"Action data 2 line 1",
"Action data 2 line 2",
"Action data 2 line 3"
]
}
],
"Attr2": "Attr2_data"
}
using the following spec
[
{
"operation" : "shift",
"spec": {
"Actions": {
"*" : {
"$": "Action[].id"
}
},
"*": "&"
}
}
]
I can generate
{
"Attr1": "Attr1_data",
"Action": [
{
"id": "1"
},
{
"id": "2"
}
],
"Attr2": "Attr2_data"
}
But try as i might i cannot seem to copy the data lines in to a new data attribute.
Can anyone pls point me in the right direction ?
You can convert yours to the following transformation spec
[
{
"operation": "shift",
"spec": {
"*s": { // represents a tag(of an object/array/attribute) with a trailing letter "s". The reason of this reform is to be able use "Action" as the key of the inner array without repeatedly writing it.
"*": {
"$": "&(2,1)[#2].id", // "$" looks one level up and copies the tag name, &(2,1) goes two levels up the tree and pick the first piece separated by asterisk, [#2] goes two level up in order to reach the level of "Actions" array to combine the subelements distributed from that level in arrayic manner with "Action"(&(2,1)) label, and the leaf node "id" stands for tag of the current attribute
"*": "&(2,1)[#2].data"
}
},
"*": "&" // the rest of the attributes(/objects/arrays) other than "Actions"
}
}
]
the demo on the site http://jolt-demo.appspot.com/ is

converted json result structure not same as source

i have test case to compare against the source kept in Kafka message.
I noticed the structured is not same.
no missing field, but the structure is not arranged in the same sequence.
how do i make the result converted same as the source structure?
code to retrieve the message, then decode the base64 format and prettyprint the result.
def responseList = new JsonSlurper().parseText(consumeMessage.getResponseText())
println('response text: \n' + JsonOutput.prettyPrint(JsonOutput.toJson(responseList)))
def decoded = new JsonSlurper().parseText(new String(responseList[0].value.decodeBase64()))
println('response decoded text: \n' + JsonOutput.prettyPrint(JsonOutput.toJson(decoded)))
below is the result printed at console
2019-11-20 16:36:44.934 DEBUG oingDRToAllocationVerification-DynamicID - 10: decoded = JsonSlurper().parseText(new java.lang.String(responseList[0].value.decodeBase64()))
2019-11-20 16:36:44.945 DEBUG oingDRToAllocationVerification-DynamicID - 11: println("response decoded text:
" + JsonOutput.prettyPrint(JsonOutput.toJson(decoded)))
response decoded text:
{
"contexts": [
{
"activityId": "c2884e63-d30d-48a3-965c-0b33202885c2",
"incomingTimestamp": "2019-11-20T08:36:29.0829958Z",
"sourceName": "DispenseOrderService",
"timestamp": "2019-11-20T08:36:29.0829958+00:00",
"userId": "unknown"
}
],
"dispenseOrder": [
{
"dispenseRequestType": "DISPENSEORDER",
"id": "6320112019043628",
"items": [
{
"administrationInstructions": "drug intake information test 123",
"dispenseAsWritten": false,
"id": "cda92ec7-3191-4b7b-a972-7f4545146db4",
"itemId": "Augmentn",
"quantity": 100
},
{
"administrationInstructions": "drug intake information test 234",
"dispenseAsWritten": false,
"id": "19e00776-b08d-47c8-930b-76ddc01f0ff4",
"itemId": "Clopidogrl",
"quantity": 200
},
{
"administrationInstructions": "drug intake information test 456",
"dispenseAsWritten": true,
"id": "0a5b0f4a-366d-4fa7-a0b8-2e8c83f4af13",
"itemId": "Adenosine",
"quantity": 300
}
],
"locationId": "Pharmacy Jewel East",
"piiIdentifiers": {
"doctorId": "b502f046-fb1e-4fcf-8135-a7a13cfb47f6",
"patientId": "fe49b461-8eeb-46d5-b995-a31cdaaa35f3",
"pharmacistId": "b502f046-fb1e-4fcf-8135-a7a13cfb47f6"
},
"priority": 4,
"state": "NEW",
"type": "Test ingest type"
}
],
"messageClass": "DispenseRequestV1",
"messageId": "83e94dac-dfb6-49d7-8ca0-219d155fecce",
"notifications": [
],
"operation": "Add",
"timestamp": "2019-11-20T08:36:29.0952632+00:00"
}
below is the source. the result after conversion is not same as source. as in the structure is not arranged accordingly.
{
"operation" : "Add",
"dispenseOrder" : [ {
"id" : "6320112019043628",
"locationId" : "Pharmacy Jewel East",
"piiIdentifiers" : {
"patientId" : "fe49b461-8eeb-46d5-b995-a31cdaaa35f3",
"doctorId" : "b502f046-fb1e-4fcf-8135-a7a13cfb47f6",
"pharmacistId" : "b502f046-fb1e-4fcf-8135-a7a13cfb47f6"
},
"priority" : 4,
"state" : "NEW",
"type" : "Test ingest type",
"dispenseRequestType" : "DISPENSEORDER",
"items" : [ {
"id" : "cda92ec7-3191-4b7b-a972-7f4545146db4",
"itemId" : "Augmentn",
"quantity" : 100,
"dispenseAsWritten" : false,
"administrationInstructions" : "drug intake information test 123"
}, {
"id" : "19e00776-b08d-47c8-930b-76ddc01f0ff4",
"itemId" : "Clopidogrl",
"quantity" : 200,
"dispenseAsWritten" : false,
"administrationInstructions" : "drug intake information test 234"
}, {
"id" : "0a5b0f4a-366d-4fa7-a0b8-2e8c83f4af13",
"itemId" : "Adenosine",
"quantity" : 300,
"dispenseAsWritten" : true,
"administrationInstructions" : "drug intake information test 456"
} ]
} ],
"messageId" : "83e94dac-dfb6-49d7-8ca0-219d155fecce",
"timestamp" : "2019-11-20T08:36:29.0952632+00:00",
"messageClass" : "DispenseRequestV1",
"contexts" : [ {
"userId" : "unknown",
"timestamp" : "2019-11-20T08:36:29.0829958+00:00",
"activityId" : "c2884e63-d30d-48a3-965c-0b33202885c2",
"incomingTimestamp" : "2019-11-20T08:36:29.0829958Z",
"sourceName" : "DispenseOrderService"
} ],
"notifications" : [ ]
}
As json.org says:
An object is an unordered set of name/value pairs.
So, different JSON methods/libraries might order them in a different way. You shouldn't rely on order of name/value pairs when working with JSON.
(If order is very important to you, you might try using suggested solution from this post.)

1-1 Mapping (with no unique key identifier) of JSON object in Jolt

I have a list of JSON objects, converted from the result of a SQL query. The JSON looks like this:
[ {
"CREATE_DATE_TIME" : "2018-02-04 11:00:03.0",
"EXTERNAL_ID" : "1111",
"CERT_NUMBER" : "123",
"DESCRIPTION" : "DESC 1",
"SOURCE_SYSTEM" : "WOULDIWAS"
}, {
"CREATE_DATE_TIME" : "2018-03-01 11:25:03.0",
"EXTERNAL_ID" : "2222",
"CERT_NUMBER" : "456",
"DESCRIPTION" : "DESC 2",
"SOURCE_SYSTEM" : "SHOOKSPEARE"
},
...
]
The output after JSON transform should be something like this:
{
"Jobs": [
{
"Notification": {
"ActivityDate" : "2018-02-04 11:00:03.0",
"ExternalId" : "1111",
"CertNum" : "123",
"Description" : "DESC 1",
"SourceSystem" : "WOULDIWAS",
"RecordType" : "Task Notification"
}, {
"Notification": {
"ActivityDate" : "2018-03-01 11:25:03.0",
"ExternalId" : "2222",
"CertNum" : "456",
"Description" : "DESC 2",
"SourceSystem" : "SHOOKSPEARE",
"RecordType" : "Task Notification"
},
...
]
}
(The RecordType is a literal string, not derived from the input JSON)
Each row / entry (JSON object enclosed in {}) in the input JSON is guaranteed to be unique but there is no key here that would indicate that. The row / entry in the input should correspond 1-1 to { Notification: {...} } in the output. How should I construct my Jolt Spec to do this?
Not to sound offending or anything, but you should've posted what you've already tried.
Anyway here's the spec to get your intended output format
[
{
"operation": "shift",
"spec": {
"*": "Jobs[].Notification"
}
}
]
I would suggest you try out renaming the fields yourself, because practicing JOLT is the best way to learn
If you still need help, I'll complete the answer for you.
Here's a few reading material Documentation, the Slide deck.
And you can learn a lot from the issues page where Milo Simpson has already solved queries for most of your questions.

jq: group and key by property

I have a list of objects that look like this:
[
{
"ip": "1.1.1.1",
"component": "name1"
},
{
"ip": "1.1.1.2",
"component": "name1"
},
{
"ip": "1.1.1.3",
"component": "name2"
},
{
"ip": "1.1.1.4",
"component": "name2"
}
]
Now I'd like to group and key that by the component and assign a list of ips to each of the components:
{
"name1": [
"1.1.1.1",
"1.1.1.2"
]
},{
"name2": [
"1.1.1.3",
"1.1.1.4"
]
}
I figured it out myself. I first group by .component and then just create new lists of ips that are indexed by the component of the first object of each group:
jq ' group_by(.component)[] | {(.[0].component): [.[] | .ip]}'
The accepted answer doesn't produce valid json, but:
{
"name1": [
"1.1.1.1",
"1.1.1.2"
]
}
{
"name2": [
"1.1.1.3",
"1.1.1.4"
]
}
name1 as well as name2 are valid json objects, but the output as a whole isn't.
The following jq statement results in the desired output as specified in the question:
group_by(.component) | map({ key: (.[0].component), value: [.[] | .ip] }) | from_entries
Output:
{
"name1": [
"1.1.1.1",
"1.1.1.2"
],
"name2": [
"1.1.1.3",
"1.1.1.4"
]
}
Suggestions for simpler approaches are welcome.
If human readability is preferred over valid json, I'd suggest something like ...
jq -r 'group_by(.component)[] | "IPs for " + .[0].component + ": " + (map(.ip) | tostring)'
... which results in ...
IPs for name1: ["1.1.1.1","1.1.1.2"]
IPs for name2: ["1.1.1.3","1.1.1.4"]
As a further example of #replay's technique, after many failures using other methods, I finally built a filter that condenses this Wazuh report (excerpted for brevity):
{
"took" : 228,
"timed_out" : false,
"hits" : {
"total" : {
"value" : 2806,
"relation" : "eq"
},
"hits" : [
{
"_source" : {
"agent" : {
"name" : "100360xx"
},
"data" : {
"vulnerability" : {
"severity" : "High",
"package" : {
"condition" : "less than 78.0",
"name" : "Mozilla Firefox 68.11.0 ESR (x64 en-US)"
}
}
}
}
},
{
"_source" : {
"agent" : {
"name" : "100360xx"
},
"data" : {
"vulnerability" : {
"severity" : "High",
"package" : {
"condition" : "less than 78.0",
"name" : "Mozilla Firefox 68.11.0 ESR (x64 en-US)"
}
}
}
}
},
...
Here is the jq filter I use to provide an array of objects, each consisting of an agent name followed by an array of names of the agent's vulnerable packages:
jq ' .hits.hits |= unique_by(._source.agent.name, ._source.data.vulnerability.package.name) | .hits.hits | group_by(._source.agent.name)[] | { (.[0]._source.agent.name): [.[]._source.data.vulnerability.package | .name ]}'
Here is an excerpt of the output produced by the filter:
{
"100360xx": [
"Mozilla Firefox 68.11.0 ESR (x64 en-US)",
"VLC media player",
"Windows 10"
]
}
{
"WIN-KD5C4xxx": [
"Windows Server 2019"
]
}
{
"fridxxx": [
"java-1.8.0-openjdk",
"kernel",
"kernel-headers",
"kernel-tools",
"kernel-tools-libs",
"python-perf"
]
}
{
"mcd-xxx-xxx": [
"dbus",
"fribidi",
"gnupg2",
"graphite2",
...

Custom analyzer appearing in type mapping but not working in Elasticsearch

I'm trying to add a custom analyzer to my index while also mapping that analyzer to a property on a type. Here is my JSON object for doing this:
{ "settings" : {
"analysis" : {
"analyzer" : {
"test_analyzer" : {
"type" : "custom",
"tokenizer": "standard",
"filter" : ["lowercase", "asciifolding"],
"char_filter": ["html_strip"]
}
}
}
},
"mappings" : {
"test" : {
"properties" : {
"checkanalyzer" : {
"type" : "string",
"analyzer" : "test_analyzer"
}
}
}
}
}
I know this analyzer works because I've tested it using /wp2/_analyze?analyzer=test_analyzer -d '<p>Testing analyzer.</p>' and also it shows up as the analyzer for the checkanalyzer property when I check /wp2/test/_mapping. However, if I add a document like {"checkanalyzer": "<p>The tags should not show up</p>"}, the HTML tags don't get stripped out when I retrieve the document using the _search endpoint. Am I misunderstanding how the mapping works or is there something wrong with my JSON object? I'm dynamically creating the wp2 index and also the test type when I make this call to Elasticsearch, not sure if that matters.
The html doesn't get removed from the source, it gets removed from the terms generated by that source. You can see this if you use a terms aggregation:
POST /test_index/_search
{
"aggs": {
"checkanalyzer_field_terms": {
"terms": {
"field": "checkanalyzer"
}
}
}
}
{
"took": 77,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "test",
"_id": "1",
"_score": 1,
"_source": {
"checkanalyzer": "<p>The tags should not show up</p>"
}
}
]
},
"aggregations": {
"checkanalyzer_field_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "not",
"doc_count": 1
},
{
"key": "should",
"doc_count": 1
},
{
"key": "show",
"doc_count": 1
},
{
"key": "tags",
"doc_count": 1
},
{
"key": "the",
"doc_count": 1
},
{
"key": "up",
"doc_count": 1
}
]
}
}
}
Here's some code I used to test it:
http://sense.qbox.io/gist/2971767aa0f5949510fa0669dad6729bbcdf8570
Now if you want to completely strip out the html prior to indexing and storing the content as is, you can use the mapper attachment plugin - in which when you define the mapping, you can categorize the content_type to be "html."
The mapper attachment is useful for many things especially if you are handling multiple document types, but most notably - I believe just using this for the purpose of stripping out the html tags is sufficient enough (which you cannot do with the html_strip char filter).
Just a forewarning though - NONE of the html tags will be stored. So if you do need those tags somehow, I would suggest defining another field to store the original content. Another note: You cannot specify multifields for mapper attachment documents, so you would need to store that outside of the mapper attachment document. See my working example below.
You'll need to result in this mapping:
{
"html5-es" : {
"aliases" : { },
"mappings" : {
"document" : {
"properties" : {
"delete" : {
"type" : "boolean"
},
"file" : {
"type" : "attachment",
"fields" : {
"content" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets",
"analyzer" : "autocomplete"
},
"author" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets"
},
"title" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets",
"analyzer" : "autocomplete"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
},
"language" : {
"type" : "string"
}
}
},
"hash_id" : {
"type" : "string"
},
"path" : {
"type" : "string"
},
"raw_content" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets",
"analyzer" : "raw"
},
"title" : {
"type" : "string"
}
}
}
},
"settings" : { //insert your own settings here },
"warmers" : { }
}
}
Such that in NEST, I will assemble the content as such:
Attachment attachment = new Attachment();
attachment.Content = Convert.ToBase64String(File.ReadAllBytes("path/to/document"));
attachment.ContentType = "html";
Document document = new Document();
document.File = attachment;
document.RawContent = InsertRawContentFromString(originalText);
I have tested this in Sense - results are as follows:
"file": {
"_content": "PGh0bWwgeG1sbnM6TWFkQ2FwPSJodHRwOi8vd3d3Lm1hZGNhcHNvZnR3YXJlLmNvbS9TY2hlbWFzL01hZENhcC54c2QiPg0KICA8aGVhZCAvPg0KICA8Ym9keT4NCiAgICA8aDE+VG9waWMxMDwvaDE+DQogICAgPHA+RGVsZXRlIHRoaXMgdGV4dCBhbmQgcmVwbGFjZSBpdCB3aXRoIHlvdXIgb3duIGNvbnRlbnQuIENoZWNrIHlvdXIgbWFpbGJveC48L3A+DQogICAgPHA+wqA8L3A+DQogICAgPHA+YXNkZjwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD4xMDwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD5MYXZlbmRlci48L3A+DQogICAgPHA+wqA8L3A+DQogICAgPHA+MTAvNiAxMjowMzwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD41IDA5PC9wPg0KICAgIDxwPsKgPC9wPg0KICAgIDxwPjExIDQ3PC9wPg0KICAgIDxwPsKgPC9wPg0KICAgIDxwPkhhbGxvd2VlbiBpcyBpbiBPY3RvYmVyLjwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD5qb2c8L3A+DQogIDwvYm9keT4NCjwvaHRtbD4=",
"_content_length": 0,
"_content_type": "html",
"_date": "0001-01-01T00:00:00",
"_title": "Topic10"
},
"delete": false,
"raw_content": "<h1>Topic10</h1><p>Delete this text and replace it with your own content. Check your mailbox.</p><p> </p><p>asdf</p><p> </p><p>10</p><p> </p><p>Lavender.</p><p> </p><p>10/6 12:03</p><p> </p><p>5 09</p><p> </p><p>11 47</p><p> </p><p>Halloween is in October.</p><p> </p><p>jog</p>"
},
"highlight": {
"file.content": [
"\n <em>Topic10</em>\n\n Delete this text and replace it with your own content. Check your mailbox.\n\n  \n\n asdf\n\n  \n\n 10\n\n  \n\n Lavender.\n\n  \n\n 10/6 12:03\n\n  \n\n 5 09\n\n  \n\n 11 47\n\n  \n\n Halloween is in October.\n\n  \n\n jog\n\n "
]
}