Flatten / normalize json array of objects with jq - json

I have a large json array of objects. Each object contains a foreignKeyId, a url, (optionally) a urlMirror1, and (optionally) a urlMirror2.
Here's a sample:
[
{
"foreignKeyId": 1,
"url": "https://1-url.com"
},
{
"foreignKeyId": 2,
"url": "https://2-url.com",
"urlMirror1": "https://2-url-mirror-1.com",
},
{
"foreignKeyId": 3,
"url": "https://3-url.com",
"urlMirror1": "https://3-url-mirror-1.com",
"urlMirror2": "https://3-url-mirror-2.com"
}
}
I want to normalize this json to something like below:
[
{
"foreignKeyId": 1,
"primariness": 1,
"url": "https://1-url.com"
},
{
"foreignKeyId": 2,
"primariness": 1,
"url": "https://2-url.com",
},
{
"foreignKeyId": 2,
"primariness": 2,
"url": "https://2-url-mirror-1.com",
},
{
"foreignKeyId": 3,
"primariness": 1,
"url": "https://3-url.com"
},
{
"foreignKeyId": 3,
"primariness": 2,
"url": "https://3-url-mirror-1.com",
},
{
"foreignKeyId": 3,
"primariness": 3,
"url": "https://3-url-mirror-2.com"
}
}
Is there a way to do something like this using jq? If not, any other suggestions to accomplish this quickly without writing too much custom code? This only needs to be run one time, so any kind of hacky one-off solution could work (bash script, etc.).
Thanks!
Update:
primariness should be derived from the key names (url => 1, urlMirror1 => 2, urlMirror2 => 3. Order of the keys inside any given object is insignificant. There is a fixed number of mirrors (e.g., there is never a urlMirror3).

Here is a simple script with hardcoded number of mirrors and primariness. Hope it will do the trick.
jq '
map(
{ foreinKeyId } +
(
{ primariness: 1, url },
(.urlMirror1 // empty | { primariness: 2, url: . }),
(.urlMirror2 // empty | { primariness: 3, url: . })
)
)
' input.json

Here's a general solution, that is, it will handle arbitrarily many urlMirrors.
For the sake of clarity, let's begin by defining a helper function that emits a stream of {foreignKeyId, primariness, url} objects for a single input object:
def primarinesses:
{foreinKeyId} +
({primariness:1, url},
(to_entries[]
| (.key | capture( "^urlMirror(?<n>[0-9]+)")) as $n
| {primariness: ($n.n | tonumber + 1), url : .value } )) ;
The solution is then simply:
[.[] | primarinesses]
which can also be written with less punctuation as:
map(primarinesses)

Given that OP has limited the query from generic down to a more specific criteria, the answer provided by #luciole75w is the best (most probably), refer to that one.
Now, for #oguzismail, this is a generic jtc approach (which will handle an arbitrary number of "urlMirror"s) made of 3 JSON transformation steps (updated solution):
<file.json jtc -w'<foreignKeyId>l:<f>v[-1]<urlM>L:<u>v[^0]' \
-i'{"url":{{u}},"foreignKeyId":{f}}' /\
-w'[foreignKeyId]:<f>q:<p:0>v[^0][foreignKeyId]:<f>s:[-1]<p>I1' \
-i'{"primeriness":{{p}}}' /\
-pw'<urlM>L:' -tc
[
{ "foreignKeyId": 1, "primeriness": 1, "url": "https://1-url.com" },
{ "foreignKeyId": 2, "primeriness": 1, "url": "https://2-url.com" },
{ "foreignKeyId": 3, "primeriness": 1, "url": "https://3-url.com" },
{ "foreignKeyId": 2, "primeriness": 2, "url": "https://2-url-mirror-1.com" },
{ "foreignKeyId": 3, "primeriness": 2, "url": "https://3-url-mirror-1.com" },
{ "foreignKeyId": 3, "primeriness": 3, "url": "https://3-url-mirror-2.com" }
]
bash $
Explanation and visualization:
- all the 3 steps can be observed in a "slow-mo":
1. for each found "foreignKeyId" and each "urlMirror" found within the same record extend (insert into) the array with {"url":... , "foreignKeyId": ...}:
<file.json jtc -w'<foreignKeyId>l:<f>v[-1]<urlM>L:<u>v[^0]' \
-i'{"url":{{u}},"foreignKeyId":{f}}' -tc
[
{ "foreignKeyId": 1, "url": "https://1-url.com" },
{ "foreignKeyId": 2, "url": "https://2-url.com", "urlMirror1": "https://2-url-mirror-1.com" },
{ "foreignKeyId": 3, "url": "https://3-url.com", "urlMirror1": "https://3-url-mirror-1.com", "urlMirror2": "https://3-url-mirror-2.com" },
{ "foreignKeyId": 2, "url": "https://2-url-mirror-1.com" },
{ "foreignKeyId": 3, "url": "https://3-url-mirror-1.com" },
{ "foreignKeyId": 3, "url": "https://3-url-mirror-2.com" }
]
bash $
2. now insert "primariness": N records based on the index of the occurrence of the foreignKeyId:
<file.json jtc -w'<foreignKeyId>l:<f>v[-1]<urlM>L:<u>v[^0]' \
-i'{"url":{{u}},"foreignKeyId":{f}}' /\
-w'[foreignKeyId]:<f>q:<p:0>v[^0][foreignKeyId]:<f>s:[-1]<p>I1' \
-i'{"primeriness":{{p}}}' -tc
[
{ "foreignKeyId": 1, "primeriness": 1, "url": "https://1-url.com" },
{ "foreignKeyId": 2, "primeriness": 1, "url": "https://2-url.com", "urlMirror1": "https://2-url-mirror-1.com" },
{ "foreignKeyId": 3, "primeriness": 1, "url": "https://3-url.com", "urlMirror1": "https://3-url-mirror-1.com", "urlMirror2": "https://3-url-mirror-2.com" },
{ "foreignKeyId": 2, "primeriness": 2, "url": "https://2-url-mirror-1.com" },
{ "foreignKeyId": 3, "primeriness": 2, "url": "https://3-url-mirror-1.com" },
{ "foreignKeyId": 3, "primeriness": 3, "url": "https://3-url-mirror-2.com" }
]
bash $
3. and final step (-pw'<urlM>L:') - rid of all redundant "urlMirror"s records.
Optionally: if there's a requirement to sort all the records within the top array as per the OP's example, then this additional step will do: -jw'[foreignKeyId]:<>g:[-1]'
PS. it so happens that I'm also a developer of the jtc unix tool

Related

How add deeply nested json-formatted dictionary into pandas dataframe

How would I get dictionary from the second key named 'intervals' into my dataframe from this json file?
{
"system_id": 3212644,
"total_devices": 1,
"intervals": [
{
"end_at": 1656504900,
"devices_reporting": 1,
"wh_del": 0
},
{
"end_at": 1656505800,
"devices_reporting": 1,
"wh_del": 0
}
],
"meta": {
"status": "normal",
"last_report_at": 1656588634,
"last_energy_at": 1656588600,
"operational_at": 1655953200
},
"meter_intervals": [
{
"meter_serial_number": "122147019814EIM1",
"envoy_serial_number": "122147019814",
"intervals": [ ## <<-- I want the dictionaries in below here
{
"channel": 1,
"wh_del": 0.0,
"curr_w": -2,
"end_at": 1656504900
},
{
"channel": 1,
"wh_del": 0.0,
"curr_w": -3,
"end_at": 1656505800
}
]
}
]
}
So far I've tried the following:
pd.json_normalize(converted,record_path='intervals') - But only recognises the first 'intervals' key
df = pd.json_normalize(data) - Still groups intervals under "meter_intervals"
So tried referencing df['meter_intervals'] - this got rid of the first "duplicate key,different depth" issue, but since it is still deeply nested, I wanted to find a more elegant solution. I dont know whether the pandas library can help me here. Any suggestions would be much appreciated.
{
"0": [
{
"meter_serial_number": "122147019814EIM1",
"envoy_serial_number": "122147019814",
"intervals": [
{
"channel": 1,
"wh_del": 0.0,
"curr_w": -2,
"end_at": 1656504900
},
{
"channel": 1,
"wh_del": 0.0,
"curr_w": -3,
"end_at": 1656505800
}
]
}
]
}

format the results in kusto

example
in the following example we need to summerize the user logs.
datatable(user:string, dt:datetime,page: string, value:int)
[
'chmod', datetime(2019-07-15), "page1", 1,
'ls', datetime(2019-07-02), "page2", 2,
'dir', datetime(2019-07-22), "page3", 3,
'mkdir', datetime(2019-07-14), "page4", 4,
'rm', datetime(2019-07-27), "page5", 5,
'pwd', datetime(2019-07-25), "page6", 6,
'rm', datetime(2019-07-23), "page7", 7,
'pwd', datetime(2019-07-25), "page8", 8,
]
| summarize commands_details = make_list(pack('dt', dt, 'page', page, "value", value)) by user
results
the results in the last example query will be like
"user": pwd,
"commands_details": [
{
"dt": "2019-07-25T00:00:00.0000000Z",
"page": "page6",
"value": 6
},
{
"dt": "2019-07-25T00:00:00.0000000Z",
"page": "page8",
"value": 8
}
],
expected results
but i need the results to be like the following
"user": pwd,
"commands_details": [
{
"dt": "2019-07-25T00:00:00.0000000Z",
"data":[
{"page": "page6", "value": 6},
{"page": "page8", "value": 8}
]
}
],
question
is there any way in Kusto to formate the results like the expected section?
You can use next query to achieve this:
datatable(user:string, dt:datetime,page: string, value:int)
[
'chmod', datetime(2019-07-15), "page1", 1,
'ls', datetime(2019-07-02), "page2", 2,
'dir', datetime(2019-07-22), "page3", 3,
'mkdir', datetime(2019-07-14), "page4", 4,
'rm', datetime(2019-07-27), "page5", 5,
'pwd', datetime(2019-07-25), "page6", 6,
'rm', datetime(2019-07-23), "page7", 7,
'pwd', datetime(2019-07-25), "page8", 8,
]
| summarize commands_details = make_list(pack('page', page, "value", value)) by user, dt
| project result = pack('user', user, 'data', pack('dt', dt, 'data', commands_details))
A result (for 'pwd'):
{
"user": "pwd",
"data": {
"dt": "2019-07-25T00:00:00Z",
"data": [
{
"page": "page6",
"value": 6
},
{
"page": "page8",
"value": 8
}
]
}
}

How do I make a facet aggregation output into true key:value json?

I wrote a script to aggregate some data, but the output isn't in true json.
I tried modifying the $project part of the aggregation pipeline, but I don't think I'm doing it right.
pipeline = [
{
"$match": {
"manu": {"$ne": "randomized"},
}},
{
"$match": {
"rssi": {"$lt": "-65db"}
}
},
{"$sort": {"time": -1}},
{
"$group": {"_id": "$mac",
"lastSeen": {"$first": "$time"},
"firstSeen": {"$last": "$time"},
}
},
{
"$project":
{
"_id": 1,
"lastSeen": 1,
"firstSeen": 1,
"minutes":
{
"$trunc":
{
"$divide": [{"$subtract": ["$lastSeen", "$firstSeen"]}, 60000]
}
},
}
},
{
"$facet": {
"0-5": [
{"$match": {"minutes": {"$gte": 1, "$lte": 5}}},
{"$count": "0-5"},
],
"5-10": [
{"$match": {"minutes": {"$gte": 5, "$lte": 10}}},
{"$count": "5-10"},
],
"10-20": [
{"$match": {"minutes": {"$gte": 10, "$lte": 20}}},
{"$count": "10-20"},
],
}
},
{"$project": {
"0-5": {"$arrayElemAt": ["$0-5.0-5", 0]},
"5-10": {"$arrayElemAt": ["$5-10.5-10", 0]},
"10-20": {"$arrayElemAt": ["$10-20.10-20", 0]},
}},
{"$sort": SON([("_id", -1)])}
]
data = list(collection.aggregate(pipeline, allowDiskUse=True))
So I basically get the output as {'0-5': 2914, '5-10': 1384, '10-20': 1295} - which cannot be used to iterate through.
Ideally it should be something like
{'timeframe': '0-5', 'count': 262}
Any suggestions?
Thanks in advance.
You can try below aggregation (replacing your current $facet and below stages):
db.col.aggregate([{
"$facet": {
"0-5": [
{"$match": {"minutes": {"$gte": 1, "$lte": 5}}},
{"$count": "total"},
],
"5-10": [
{"$match": {"minutes": {"$gte": 5, "$lte": 10}}},
{"$count": "total"},
],
"10-20": [
{"$match": {"minutes": {"$gte": 10, "$lte": 20}}},
{"$count": "total"},
]
},
},
{
$project: {
result: { $objectToArray: "$$ROOT" }
}
},
{
$unwind: "$result"
},
{
$unwind: "$result.v"
},
{
$project: {
timeframe: "$result.k",
count: "$result.v.total"
}
}
])
$facet returns single document that contains three fields (results of sub-aggregations). You can use $objectToArray to get it in a shape of k and v fields and then use $unwind to get single document per key.

How to parse this file with jq?

I just started using jq and json files, and I'm trying to parse a specific file.
I'm tring to do it with jq in command line, but if there's any other way to do it properly, I'm in to give it a try.
The file itself looks like this :
{
"Status": "ok",
"Code": 200,
"Message": "",
"Result": [
{
"ID": 123456,
"Activity": 27,
"Name": Example1",
"Coordinate": {
"Galaxy": 1,
"System": 22,
"Position": 3
},
"Administrator": false,
"Inactive": false,
"Vacation": false,
"HonorableTarget": false,
"Debris": {
"Metal": 0,
"Crystal": 0,
"RecyclersNeeded": 0
},
"Moon": null,
"Player": {
"ID": 111111,
"Name": "foo",
"Rank": 4
},
"Alliance": null
},
{
"ID": 223344,
"Activity": 17,
"Name": "Example2",
"Coordinate": {
"Galaxy": 3,
"System": 44,
"Position": 5
},
"Administrator": false,
"Inactive": false,
"Vacation": false,
"StrongPlayer": false,
"HonorableTarget": false,
"Debris": {
"Metal": 0,
"Crystal": 0,
"RecyclersNeeded": 0
},
"Moon": null,
"Player": {
"ID": 765432,
"Name": "Player 2",
"Rank": 3
},
"Alliance": null
},
(...)
]
}
I would need to extract information based on the galaxy/system/position.
For example, having a script with the proper filters in it and execute something like that :
./parser --galaxy=1 --system=22 --position=3
And it would give me :
ID : 123456
Name : Example1
Activity : 27
...
I tried to do that with curl to grab my json file and jq to parse my file, but I have no idea how I can make that kind of request.
The following should be sufficient to get you on your way.
First, let's assume the JSON is in a file name galaxy.json; second, let's assume the file galaxy.jq contains the following:
.Result[]
| select(.Coordinate | (.Galaxy==$galaxy and .System==$system and .Position==$position))
Then the invocation:
jq -f so-galaxy.jq --argjson galaxy 1 --argjson system 22 --argjson position 3 galaxy.json
would yield the corresponding object:
{
"ID": 123456,
"Activity": 27,
"Name": "Example1",
"Coordinate": {
"Galaxy": 1,
"System": 22,
"Position": 3
},
"Administrator": false,
"Inactive": false,
"Vacation": false,
"HonorableTarget": false,
"Debris": {
"Metal": 0,
"Crystal": 0,
"RecyclersNeeded": 0
},
"Moon": null,
"Player": {
"ID": 111111,
"Name": "foo",
"Rank": 4
},
"Alliance": null
}
Key: Value format
If you want the output to be in key: value format, simply add -r to the command-line options, and append the following to the jq filter:
| to_entries[]
| "\(.key): \(.value)"
Output
ID: 123456
Activity: 27
Name: Example1
Coordinate: {"Galaxy":1,"System":22,"Position":3}
Administrator: false
Inactive: false
Vacation: false
HonorableTarget: false
Debris: {"Metal":0,"Crystal":0,"RecyclersNeeded":0}
Moon: null
Player: {"ID":111111,"Name":"foo","Rank":4}
Alliance: null

Delete element of Solr

I deleted an item you do not need to solr, but I solr response still appears.
The json:
{
"responseHeader": {
"status": 0,
"QTime": 1,
"params": {
"facet": "true",
"q": "*:*",
"facet.limit": "-1",
"facet.field": "manufacturer",
"wt": "json",
"rows": "0"
}
},
"response": {
"numFound": 84,
"start": 0,
"docs": []
},
"facet_counts": {
"facet_queries": {},
"facet_fields": {
"manufacturer": [
"Chevrolet",
0,
"abarth",
1,
"audi",
7,
"austin",
1,
"bmw",
2,
"daewoo",
2,
"ford",
1,
"fso",
1,
"honda",
1,
"hyundai",
1,
"jaguar",
3,
"lexus",
1,
"mazda",
1,
"mitsubishi",
1,
"nissan",
1,
"pontiac",
1,
"seat",
1
]
},
"facet_dates": {},
"facet_ranges": {}
}
}
the deleted item is "chevrolet", now this to '0 'but it still appears.
"manufacturer":["Chevrolet",0,
I wish I could delete the item completely, is that possible.. Thanks.
Here is a two step approach I would follow:
Make sure changes(deletion) is committed. You may issue a commit
If it still shows facets with zero count, you may append &facet.mincount=1 to your query
&facet.mincount=1 will make sure facets with zero count do not show up.
For more details, please refer to: http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount
In your case probably it is because of uninverted index created by solr.
Pass facet.mincount=1 in your query to get rid of this problem.