Search an XML attrbiute and Group it - json

I need to search for an XML Attribute Value, group it and form a corresponding JSON element.
Now for example: Label is my AttributeID and I have 2 attributes with Label.
(In realtime - it can be 'n' number of times.)
Need to group 2 Labels as a single JSON element.
VendorClass, VendorDivision and VendorDept should all come inside Vendor Json attribute.
VendorClass attributeValue should be mapped to classCode inside Vendor and other 2 codes should be null.
Likewise, VendorDept attributeValue should be mapped to DeptCode., VedorDivision attrbiuteValue mapped to DivisionCode.
O/P JSON (mapping):
type - should always be in upper characters - mapped to attributeID,
name - in lower character - mapped to "attributeId_attributeValue",
id/code - is attribute_value
Below is my i/p XML
<CurrentRule Currency="USD"
CurrentStatus="ACTIVE"
Priority="0"
RuleCategory="Current"
RuleType="COMBINATION"
>
<CurrentRuleTargetAttributeValueList/>
<CurrentRuleAttributeValueList>
<CurrentRuleAttributeValue
TriggerAttributeID="Label"
TriggerAttributeValue="10"/>
<CurrentRuleAttributeValue
TriggerAttributeID="Label"
TriggerAttributeValue="1003"/>
<CurrentRuleAttributeValue
TriggerAttributeID="ABCDCode"
TriggerAttributeValue="AC"/>
<CurrentRuleAttributeValue
TriggerAttributeID="ABCDCode"
TriggerAttributeValue="FD"/>
<CurrentRuleAttributeValue
TriggerAttributeID="VendorClass"
TriggerAttributeValue="00N"/>
<CurrentRuleAttributeValue
TriggerAttributeID="VendorDept"
TriggerAttributeValue="100"/>
<CurrentRuleAttributeValue
TriggerAttributeID="VendorDivision"
TriggerAttributeValue="10"/>
<CurrentRuleAttributeValue
TriggerAttributeID="VendorMarket"
TriggerAttributeValue="QVC"/>
<CurrentRuleAttributeValue
TriggerAttributeID="PriceCode"
TriggerAttributeValue="FP"/>
<CurrentRuleAttributeValue
TriggerAttributeID="ProductNumber"
TriggerAttributeValue="A0000"/>
<CurrentRuleAttributeValue
TriggerAttributeID="Trader"
TriggerAttributeValue="1010"/>
<CurrentRuleAttributeValue
TriggerAttributeID="Trader"
TriggerAttributeValue="1046"/>
</CurrentRuleAttributeValueList>
</CurrentRule>
Expected JSON o/p:
{
"header": {
"type": "ORDER",
"name": "merch-attribute-example",
"createUser": "admin"
},
"filters": {
"Labels": [
{
"type": "LABEL",
"name": "Label_10",
"LabelId": 10
},
{
"type": "LABEL",
"name": "Label_1003",
"LabelId": 1003
}
],
"vendor": [
{
"type": "VENDOR",
"name": "vendorclass_00n",
"divisionCode": null,
"departmentCode": null,
"classCode": "00N"
},
{
"type": "VENDOR",
"name": "vendordept_100",
"divisionCode": null,
"departmentCode": "100",
"classCode": null
},
{
"type": "VENDOR",
"name": "vendordivision_10",
"divisionCode": "10",
"departmentCode": null,
"classCode": null
}
],
"abcd": [
{
"type": "ABCD",
"name": "abcdcode_ac",
"abcdCode": "AC"
},
{
"type": "ABCD",
"name": "abcdcode_fd",
"abcdCode": "FD"
}
],
"priceCodes": [
{
"type": "PRICE_CODE",
"name": "pricecode_fp",
"code": "FP"
}
],
"products": [
{
"type": "PRODUCT",
"name": "productnumber_a0000",
"productNumbers": [
"A0000"
]
}
],
"traders": [
{
"type": "TRADER",
"name": "trader_1010",
"vendorCode": "1010"
},
{
"type": "TRADER",
"name": "trader_1046",
"vendorCode": "1046"
}
]
}
}
enter code here

Based on the rules described this is the best approximation. I'll leave to you how to decide if it is an id or code and the rest of the output that is not clearly explained.
%dw 2.0
output application/json
---
{
header: {
"type": "ORDER",
"name": "merch-attribute-example",
"createUser": "admin"
},
filters: payload.CurrentRule.CurrentRuleAttributeValueList
mapObject ((value, key, index) ->
{
attr: key.#
}
)
pluck (($$):$)
groupBy ((item, index) -> item.attr.TriggerAttributeID)
mapObject ((value1, key1, index1) ->
(key1): value1 map
{
"type": upper(key1),
name: lower(key1) ++ "_" ++ $.attr.TriggerAttributeValue,
id: $.attr.TriggerAttributeValue
}
)
}
Output:
{
"header": {
"type": "ORDER",
"name": "merch-attribute-example",
"createUser": "admin"
},
"filters": {
"Label": [
{
"type": "LABEL",
"name": "label_10",
"id": "10"
},
{
"type": "LABEL",
"name": "label_1003",
"id": "1003"
}
],
"ABCDCode": [
{
"type": "ABCDCODE",
"name": "abcdcode_AC",
"id": "AC"
},
{
"type": "ABCDCODE",
"name": "abcdcode_FD",
"id": "FD"
}
],
"VendorClass": [
{
"type": "VENDORCLASS",
"name": "vendorclass_00N",
"id": "00N"
}
],
"VendorDept": [
{
"type": "VENDORDEPT",
"name": "vendordept_100",
"id": "100"
}
],
"VendorDivision": [
{
"type": "VENDORDIVISION",
"name": "vendordivision_10",
"id": "10"
}
],
"VendorMarket": [
{
"type": "VENDORMARKET",
"name": "vendormarket_QVC",
"id": "QVC"
}
],
"PriceCode": [
{
"type": "PRICECODE",
"name": "pricecode_FP",
"id": "FP"
}
],
"ProductNumber": [
{
"type": "PRODUCTNUMBER",
"name": "productnumber_A0000",
"id": "A0000"
}
],
"Trader": [
{
"type": "TRADER",
"name": "trader_1010",
"id": "1010"
},
{
"type": "TRADER",
"name": "trader_1046",
"id": "1046"
}
]
}
}

Related

find on id and append value to json parameter

I have the following data frame, df1:
A B C
123 B1 C1
456 B2 C2
And data frame df2:
A
[
{
"id": "123",
"details": {
"id": "123",
"color": null,
"param_1": {
"name": "mike"
},
"location": "US",
"items": [
{
"item_1": "#227858",
"offer_id": null,
"item_details": {
"detials_1": [{ "notes": "other:", "quantity": 1 }]
}
}
],
"version": 1,
}
}
]
[
{
"id": "456",
"details": {
"id": "456",
"color": null,
"param_1": {
"name": "james"
},
"location": "KR",
"items": [
{
"item_1": "#2221",
"offer_id": null,
"item_details": {
"detials_1": [{ "notes": "other", "quantity": 1 }]
}
}
],
"version": 2,
}
}
]
I want to find all values in df1[A] inside the JSON found inside df2[A] under the first instance of the id parameter. Once found, I want to replace the NULL values inside the color parameter with the df1[B] and offer_id with df1[C].
The output should create a new column with the appended values:
df2[B]:
[
{
"id": "123",
"details": {
"id": "123",
"color": B1,
"param_1": {
"name": "mike"
},
"location": "US",
"items": [
{
"item_1": "#227858",
"offer_id": C1,
"item_details": {
"detials_1": [{ "notes": "other:", "quantity": 1 }]
}
}
],
"version": 1,
}
}
]
[
{
"id": "456",
"details": {
"id": "456",
"color": B2,
"param_1": {
"name": "james"
},
"location": "KR",
"items": [
{
"item_1": "#2221",
"offer_id": C2,
"item_details": {
"detials_1": [{ "notes": "other", "quantity": 1 }]
}
}
],
"version": 2,
}
}
]
I just started researching how to approach this, but I need guidance on the most efficient way. Any insight would be greatly appreciated.

Cannot get jq to query json object [duplicate]

This question already has answers here:
How to use jq when the variable has reserved characters?
(3 answers)
Closed 6 months ago.
I have a JSON file that I am trying to query with jq. I am unable to retrieve the observations. I am trying to retieve each of the "observations using the following command and not able to get to the result:
cat sample3.json | jq .dataSets[0].series.0:0:0:0:0.observations.0[0]
I am able to retieve up to the series using:
cat sample3.json | jq .dataSets[0].series
But once I try to drill down further I am getting a compile error:
$ cat sample3.json | jq .dataSets[0].series.0:0:0:0:0
jq: error: syntax error, unexpected LITERAL, expecting end of file (Unix shell quoting issues?) at <top-level>, line 1:
.dataSets[0].series.0:0:0:0:0
jq: 1 compile error
I am not sure what I am doing wrong here....
The input file is:
{
"header": {
"id": "b8be2cd5-33bf-4687-9e81-eb032f6f8a71",
"test": false,
"prepared": "2022-09-01T13:30:57.013+02:00",
"sender": {
"id": "ECB"
}
},
"dataSets": [
{
"action": "Replace",
"validFrom": "2022-09-01T13:30:57.013+02:00",
"series": {
"0:0:0:0:0": {
"attributes": [
0,
null,
0,
null,
null,
null,
null,
null,
null,
null,
null,
null,
0,
null,
0,
null,
0,
0,
0,
0
],
"observations": {
"0": [
1.4529,
0,
0,
null,
null
],
"1": [
1.4472,
0,
0,
null,
null
],
"2": [
1.4591,
0,
0,
null,
null
]
}
}
}
}
],
"structure": {
"links": [
{
"title": "Exchange Rates",
"rel": "dataflow",
"href": "https://sdw-wsrest.ecb.europa.eu:443/service/dataflow/ECB/EXR/1.0"
}
],
"name": "Exchange Rates",
"dimensions": {
"series": [
{
"id": "FREQ",
"name": "Frequency",
"values": [
{
"id": "D",
"name": "Daily"
}
]
},
{
"id": "CURRENCY",
"name": "Currency",
"values": [
{
"id": "AUD",
"name": "Australian dollar"
}
]
},
{
"id": "CURRENCY_DENOM",
"name": "Currency denominator",
"values": [
{
"id": "EUR",
"name": "Euro"
}
]
},
{
"id": "EXR_TYPE",
"name": "Exchange rate type",
"values": [
{
"id": "SP00",
"name": "Spot"
}
]
},
{
"id": "EXR_SUFFIX",
"name": "Series variation - EXR context",
"values": [
{
"id": "A",
"name": "Average"
}
]
}
],
"observation": [
{
"id": "TIME_PERIOD",
"name": "Time period or range",
"role": "time",
"values": [
{
"id": "2022-08-29",
"name": "2022-08-29",
"start": "2022-08-29T00:00:00.000+02:00",
"end": "2022-08-29T23:59:59.999+02:00"
},
{
"id": "2022-08-30",
"name": "2022-08-30",
"start": "2022-08-30T00:00:00.000+02:00",
"end": "2022-08-30T23:59:59.999+02:00"
},
{
"id": "2022-08-31",
"name": "2022-08-31",
"start": "2022-08-31T00:00:00.000+02:00",
"end": "2022-08-31T23:59:59.999+02:00"
}
]
}
]
},
"attributes": {
"series": [
{
"id": "TIME_FORMAT",
"name": "Time format code",
"values": [
{
"name": "P1D"
}
]
},
{
"id": "BREAKS",
"name": "Breaks",
"values": []
},
{
"id": "COLLECTION",
"name": "Collection indicator",
"values": [
{
"id": "A",
"name": "Average of observations through period"
}
]
},
{
"id": "COMPILING_ORG",
"name": "Compiling organisation",
"values": []
},
{
"id": "DISS_ORG",
"name": "Data dissemination organisation",
"values": []
},
{
"id": "DOM_SER_IDS",
"name": "Domestic series ids",
"values": []
},
{
"id": "PUBL_ECB",
"name": "Source publication (ECB only)",
"values": []
},
{
"id": "PUBL_MU",
"name": "Source publication (Euro area only)",
"values": []
},
{
"id": "PUBL_PUBLIC",
"name": "Source publication (public)",
"values": []
},
{
"id": "UNIT_INDEX_BASE",
"name": "Unit index base",
"values": []
},
{
"id": "COMPILATION",
"name": "Compilation",
"values": []
},
{
"id": "COVERAGE",
"name": "Coverage",
"values": []
},
{
"id": "DECIMALS",
"name": "Decimals",
"values": [
{
"id": "4",
"name": "Four"
}
]
},
{
"id": "NAT_TITLE",
"name": "National language title",
"values": []
},
{
"id": "SOURCE_AGENCY",
"name": "Source agency",
"values": [
{
"id": "4F0",
"name": "European Central Bank (ECB)"
}
]
},
{
"id": "SOURCE_PUB",
"name": "Publication source",
"values": []
},
{
"id": "TITLE",
"name": "Title",
"values": [
{
"name": "Australian dollar/Euro"
}
]
},
{
"id": "TITLE_COMPL",
"name": "Title complement",
"values": [
{
"name": "ECB reference exchange rate, Australian dollar/Euro, 2:15 pm (C.E.T.)"
}
]
},
{
"id": "UNIT",
"name": "Unit",
"values": [
{
"id": "AUD",
"name": "Australian dollar"
}
]
},
{
"id": "UNIT_MULT",
"name": "Unit multiplier",
"values": [
{
"id": "0",
"name": "Units"
}
]
}
],
"observation": [
{
"id": "OBS_STATUS",
"name": "Observation status",
"values": [
{
"id": "A",
"name": "Normal value"
}
]
},
{
"id": "OBS_CONF",
"name": "Observation confidentiality",
"values": [
{
"id": "F",
"name": "Free"
}
]
},
{
"id": "OBS_PRE_BREAK",
"name": "Pre-break observation value",
"values": []
},
{
"id": "OBS_COM",
"name": "Observation comment",
"values": []
}
]
}
}
}
The .foo syntax cannot be used if the key name has anything but alphanumeric characters or the underscore, or if the first character of the key name is numeric.
Assuming you are using a recent version of jq,
you can always use the form: ."foo", which is actually an abbreviation of the basic form, .["foo"].
So assuming you're using a sufficiently recent version of jq, your query could begin with:
.dataSets[0].series."0:0:0:0:0"
If you are presenting the jq query on a command line, then you may have to escape the double-quotes appropriately, e.g. in a bash shell, by enclosing the jq query in single-quotes.

JSON Schema with Nested Objects with different properties

The entire JSON file is rather large so I've only taken out the subsection I've had an issue with.
{
"diagrams": {
"5f759d15cd046720c28531dd": {
"_id": "5f759d15cd046720c28531dd",
"offsetX": 320,
"offsetY": 42,
"zoom": 80,
"modified": 1604279356,
"nodes": {
"5f9f5c3ccd046720c28531e4": {
"nodeID": "5f9f5c3ccd046720c28531e4",
"type": "start",
"coords": [
360,
120
],
"data": {
"name": "Start",
"color": "standard",
"ports": [
{
"type": "",
"target": "5f9f5c3ccd046720c28531e6"
}
],
"steps": []
}
},
"5f9f5c3ccd046720c28531e5": {
"nodeID": "5f9f5c3ccd046720c28531e5",
"type": "block",
"coords": [
760,
120
],
"data": {
"name": "Help Message",
"color": "standard",
"steps": [
"5f9f5c3ccd046720c28531e6",
"5f9f5c3ccd046720c28531e7"
]
}
},
"5f9f5c3ccd046720c28531e6": {
"nodeID": "5f9f5c3ccd046720c28531e6",
"type": "speak",
"data": {
"randomize": false,
"dialogs": [
{
"voice": "Alexa",
"content": "You said help. Do you want to continue?"
}
],
"ports": [
{
"type": "",
"target": "5f9f5c3ccd046720c28531e7"
}
]
}
},
"5f9f5c3ccd046720c28531e7": {
"nodeID": "5f9f5c3ccd046720c28531e7",
"type": "interaction",
"data": {
"name": "Choice",
"else": {
"type": "path",
"randomize": false,
"reprompts": []
},
"choices": [
{
"intent": "",
"mappings": []
},
{
"intent": "",
"mappings": []
}
],
"reprompt": null,
"ports": [
{
"type": "else",
"target": null
},
{
"type": "",
"target": null
},
{
"type": "",
"target": "5f9f5c3ccd046720c28531e9"
}
]
}
},
"5f9f5c3ccd046720c28531e8": {
"nodeID": "5f9f5c3ccd046720c28531e8",
"type": "block",
"coords": [
1170,
260
],
"data": {
"name": "Exit",
"color": "standard",
"steps": [
"5f9f5c3ccd046720c28531e9"
]
}
},
"5f9f5c3ccd046720c28531e9": {
"nodeID": "5f9f5c3ccd046720c28531e9",
"type": "exit",
"data": {
"ports": []
}
}
},
"children": [],
"creatorID": 42661,
"variables": [],
"name": "Help Flow",
"versionID": "5f759d15cd046720c28531db"
}
}
}
The Current JSON Schema Definition I have is:
{
"$schema":"http://json-schema.org/schema#",
"type":"object",
"properties":{
"diagrams":{
"type":"object"
}
},
"required":[
"diagrams",
]
}
The problem I am having is that within diagrams contains multiple objects with a random string as the name e.g "5f759d15cd046720c28531dd".
Then within that object there are properties such as (_id, offsetX) which I want to express as well as a nodes object, which again contains multiple objects with arbitrary names e.g ("5f9f5c3ccd046720c28531e4", "5f9f5c3ccd046720c28531e5", ...) which have a unique node definition where some nodes have different properties to other nodes (nodeID, type, data vs nodeID, type, data, coords).
My question is with all these arbitrary things such as random names as well as different properties per each node. How do I turn it into 1 JSON schema definition which covers all the cases of how a diagram/node can be made.
You can do this with additionalProperties or patternProperties.
additionalProperties applies to any property that isn't declared in properties or patternProperties.
{
"type": "object",
"additionalProperties": {
"type": "object",
"properties": {
"_id": { ... },
"offsetX": { ... },
...
}
}
}
Your property names appear to always be hex numbers. If you want to enforce that those property names are always hex numbers, you can use patternProperties. Any property that matches the regex must conform to that schema.
{
"type": "object",
"patternProperties": {
"^[0-9a-f]{24}$": {
"type": "object",
"properties": {
"_id": { ... },
"offsetX": { ... },
...
}
}
},
"additionalProperties": false
}

jq sort by value of key

Given the following JSON (oversimplified for the sake of the example), I need to order the keys by their value. In this case, the order should be id > name > type.
{
"link": [{
"attributes": [{
"value": "ConfigurationElement",
"name": "type"
}, {
"value": "NAME1",
"name": "name"
}, {
"value": "0026a8b4-ced6-410e-9213-e3fcb28b3aab",
"name": "id"
}
],
"href": "href1",
"rel": "down"
}, {
"attributes": [{
"value": "0026a8b4-ced6-410e-9213-k23g15h2u1l5",
"name": "id"
}, {
"value": "ConfigurationElement",
"name": "type"
}, {
"value": "NAME2",
"name": "name"
}
],
"href": "href2",
"rel": "down"
}
],
"total": 2
}
EXPECTED RESULT:
{
"link": [{
"attributes": [{
"value": "0026a8b4-ced6-410e-9213-e3fcb28b3aab",
"name": "id"
}, {
"value": "NAME1",
"name": "name"
}, {
"value": "ConfigurationElement",
"name": "type"
}
],
"href": "href1",
"rel": "down"
}, {
"attributes": [{
"value": "0026a8b4-ced6-410e-9213-k23g15h2u1l5",
"name": "id"
}, {
"value": "NAME2",
"name": "name"
}, {
"value": "ConfigurationElement",
"name": "type"
}
],
"href": "href2",
"rel": "down"
}
],
"total": 2
}
I would be very grateful if anyone could help me out. I tried jq with -S and -s with sort_by(), but this example is way too complex for me to figure it out with my current experience with jq. Thank you a lot!
You can do:
jq '.link[].attributes|=sort_by(.name)'
The |= takes all the paths matched by .link[].attributes, i.e. each "attributes" array, and applies the filter sort_by(.name) to each of them, leaving everything else unchanged.

JSON Schema for tree structure

I have to build tree like structure of Json data.Each node has an id (an integer, required), a label (a string, optional), and an array of child nodes (optional). Can you help me how to write JSON schema for this Json data. I need to set Id as required in child node as well.
{
"Id": 1,
"Label": "A",
"Child": [
{
"Id": 2,
"Label": "B",
"Child": [
{
"Id": 5,
"Label": "E"
}, {
"Id": 6,
"Label": "E"
}, {
"Id": 7,
"Label": "E"
}
]
}, {
"Id": 3,
"Label": "C"
}, {
"Id": 4,
"Label": "D",
"Child": [
{
"Id": 8,
"Label": "H"
}, {
"Id": 9,
"Label": "I"
}
]
}
]
}
A schema for this structure only needs a definition of a node and a reference to that node. The property Children (renamed from Child) references the node as well.
Here's the schema:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"$ref": "#/definitions/node",
"definitions": {
"node": {
"properties": {
"Id": {
"type": "integer"
},
"Label": {
"type": "string"
},
"Children": {
"type": "array",
"items": {
"$ref": "#/definitions/node"
}
}
},
"required": [
"Id"
]
}
}
}