How add deeply nested json-formatted dictionary into pandas dataframe - json

How would I get dictionary from the second key named 'intervals' into my dataframe from this json file?
{
"system_id": 3212644,
"total_devices": 1,
"intervals": [
{
"end_at": 1656504900,
"devices_reporting": 1,
"wh_del": 0
},
{
"end_at": 1656505800,
"devices_reporting": 1,
"wh_del": 0
}
],
"meta": {
"status": "normal",
"last_report_at": 1656588634,
"last_energy_at": 1656588600,
"operational_at": 1655953200
},
"meter_intervals": [
{
"meter_serial_number": "122147019814EIM1",
"envoy_serial_number": "122147019814",
"intervals": [ ## <<-- I want the dictionaries in below here
{
"channel": 1,
"wh_del": 0.0,
"curr_w": -2,
"end_at": 1656504900
},
{
"channel": 1,
"wh_del": 0.0,
"curr_w": -3,
"end_at": 1656505800
}
]
}
]
}
So far I've tried the following:
pd.json_normalize(converted,record_path='intervals') - But only recognises the first 'intervals' key
df = pd.json_normalize(data) - Still groups intervals under "meter_intervals"
So tried referencing df['meter_intervals'] - this got rid of the first "duplicate key,different depth" issue, but since it is still deeply nested, I wanted to find a more elegant solution. I dont know whether the pandas library can help me here. Any suggestions would be much appreciated.
{
"0": [
{
"meter_serial_number": "122147019814EIM1",
"envoy_serial_number": "122147019814",
"intervals": [
{
"channel": 1,
"wh_del": 0.0,
"curr_w": -2,
"end_at": 1656504900
},
{
"channel": 1,
"wh_del": 0.0,
"curr_w": -3,
"end_at": 1656505800
}
]
}
]
}

Related

delete object from nested document in mongodb

I have this JSON above with _id, date, and two objects: sessionData and metaData. I want to delete from every document the object metaData where "userId":123456
{
"_id": {
"$oid": "60feedd4b3aefff2629b93b7"
},
"insertionDate": {
"$date": "2021-07-26T18:15:57.564Z"
},
"sessionData": {
"time": [1364, 1374, 1384],
"yaw": [0.15, 0.3, 0.45],
"pitch": [0.36, 0.76, 1.08],
"roll": [-0.13, -0.25, -0.35],
"heading": [-3.24, -3.25, -3.17],
"ax": [-0.42, -0.41, -0.41],
"ay": [-0.15, -0.13, -0.1],
"az": [0.9, 0.91, 1],
"gx": [0, 0, 0],
"gy": [-0.01, 0, -0.01],
"gz": [0.02, 0.02, 0.02],
"mx": [0.26, 0.26, 0.26],
"my": [0.01, 0.01, 0.01],
"mz": [-0.04, -0.04, -0.07]
},
"metaData": {
"userId": 123456,
"gender": "M",
"ageGroup": "SENIOR",
"weightKg": 70,
"heightCm": 175,
"poolSizeM": 50
}
}
The code I have:
#app.route('/data/anonymousData/<int:l>', methods = ['POST'])
def makeanonymous(l):
result = collection.update_many({}, {"$pull":{ "metaData": {"$in": {"userId": l }}}}, multi=True )
t=result.deleted_count
return f'Deleted {t} documents.'
You can use $unset to remove the object like this:
collection.update_many({
"metaData.userId": 123456
},
{
"$unset": {
"metaData": ""
}
})
Check this example.

Flatten / normalize json array of objects with jq

I have a large json array of objects. Each object contains a foreignKeyId, a url, (optionally) a urlMirror1, and (optionally) a urlMirror2.
Here's a sample:
[
{
"foreignKeyId": 1,
"url": "https://1-url.com"
},
{
"foreignKeyId": 2,
"url": "https://2-url.com",
"urlMirror1": "https://2-url-mirror-1.com",
},
{
"foreignKeyId": 3,
"url": "https://3-url.com",
"urlMirror1": "https://3-url-mirror-1.com",
"urlMirror2": "https://3-url-mirror-2.com"
}
}
I want to normalize this json to something like below:
[
{
"foreignKeyId": 1,
"primariness": 1,
"url": "https://1-url.com"
},
{
"foreignKeyId": 2,
"primariness": 1,
"url": "https://2-url.com",
},
{
"foreignKeyId": 2,
"primariness": 2,
"url": "https://2-url-mirror-1.com",
},
{
"foreignKeyId": 3,
"primariness": 1,
"url": "https://3-url.com"
},
{
"foreignKeyId": 3,
"primariness": 2,
"url": "https://3-url-mirror-1.com",
},
{
"foreignKeyId": 3,
"primariness": 3,
"url": "https://3-url-mirror-2.com"
}
}
Is there a way to do something like this using jq? If not, any other suggestions to accomplish this quickly without writing too much custom code? This only needs to be run one time, so any kind of hacky one-off solution could work (bash script, etc.).
Thanks!
Update:
primariness should be derived from the key names (url => 1, urlMirror1 => 2, urlMirror2 => 3. Order of the keys inside any given object is insignificant. There is a fixed number of mirrors (e.g., there is never a urlMirror3).
Here is a simple script with hardcoded number of mirrors and primariness. Hope it will do the trick.
jq '
map(
{ foreinKeyId } +
(
{ primariness: 1, url },
(.urlMirror1 // empty | { primariness: 2, url: . }),
(.urlMirror2 // empty | { primariness: 3, url: . })
)
)
' input.json
Here's a general solution, that is, it will handle arbitrarily many urlMirrors.
For the sake of clarity, let's begin by defining a helper function that emits a stream of {foreignKeyId, primariness, url} objects for a single input object:
def primarinesses:
{foreinKeyId} +
({primariness:1, url},
(to_entries[]
| (.key | capture( "^urlMirror(?<n>[0-9]+)")) as $n
| {primariness: ($n.n | tonumber + 1), url : .value } )) ;
The solution is then simply:
[.[] | primarinesses]
which can also be written with less punctuation as:
map(primarinesses)
Given that OP has limited the query from generic down to a more specific criteria, the answer provided by #luciole75w is the best (most probably), refer to that one.
Now, for #oguzismail, this is a generic jtc approach (which will handle an arbitrary number of "urlMirror"s) made of 3 JSON transformation steps (updated solution):
<file.json jtc -w'<foreignKeyId>l:<f>v[-1]<urlM>L:<u>v[^0]' \
-i'{"url":{{u}},"foreignKeyId":{f}}' /\
-w'[foreignKeyId]:<f>q:<p:0>v[^0][foreignKeyId]:<f>s:[-1]<p>I1' \
-i'{"primeriness":{{p}}}' /\
-pw'<urlM>L:' -tc
[
{ "foreignKeyId": 1, "primeriness": 1, "url": "https://1-url.com" },
{ "foreignKeyId": 2, "primeriness": 1, "url": "https://2-url.com" },
{ "foreignKeyId": 3, "primeriness": 1, "url": "https://3-url.com" },
{ "foreignKeyId": 2, "primeriness": 2, "url": "https://2-url-mirror-1.com" },
{ "foreignKeyId": 3, "primeriness": 2, "url": "https://3-url-mirror-1.com" },
{ "foreignKeyId": 3, "primeriness": 3, "url": "https://3-url-mirror-2.com" }
]
bash $
Explanation and visualization:
- all the 3 steps can be observed in a "slow-mo":
1. for each found "foreignKeyId" and each "urlMirror" found within the same record extend (insert into) the array with {"url":... , "foreignKeyId": ...}:
<file.json jtc -w'<foreignKeyId>l:<f>v[-1]<urlM>L:<u>v[^0]' \
-i'{"url":{{u}},"foreignKeyId":{f}}' -tc
[
{ "foreignKeyId": 1, "url": "https://1-url.com" },
{ "foreignKeyId": 2, "url": "https://2-url.com", "urlMirror1": "https://2-url-mirror-1.com" },
{ "foreignKeyId": 3, "url": "https://3-url.com", "urlMirror1": "https://3-url-mirror-1.com", "urlMirror2": "https://3-url-mirror-2.com" },
{ "foreignKeyId": 2, "url": "https://2-url-mirror-1.com" },
{ "foreignKeyId": 3, "url": "https://3-url-mirror-1.com" },
{ "foreignKeyId": 3, "url": "https://3-url-mirror-2.com" }
]
bash $
2. now insert "primariness": N records based on the index of the occurrence of the foreignKeyId:
<file.json jtc -w'<foreignKeyId>l:<f>v[-1]<urlM>L:<u>v[^0]' \
-i'{"url":{{u}},"foreignKeyId":{f}}' /\
-w'[foreignKeyId]:<f>q:<p:0>v[^0][foreignKeyId]:<f>s:[-1]<p>I1' \
-i'{"primeriness":{{p}}}' -tc
[
{ "foreignKeyId": 1, "primeriness": 1, "url": "https://1-url.com" },
{ "foreignKeyId": 2, "primeriness": 1, "url": "https://2-url.com", "urlMirror1": "https://2-url-mirror-1.com" },
{ "foreignKeyId": 3, "primeriness": 1, "url": "https://3-url.com", "urlMirror1": "https://3-url-mirror-1.com", "urlMirror2": "https://3-url-mirror-2.com" },
{ "foreignKeyId": 2, "primeriness": 2, "url": "https://2-url-mirror-1.com" },
{ "foreignKeyId": 3, "primeriness": 2, "url": "https://3-url-mirror-1.com" },
{ "foreignKeyId": 3, "primeriness": 3, "url": "https://3-url-mirror-2.com" }
]
bash $
3. and final step (-pw'<urlM>L:') - rid of all redundant "urlMirror"s records.
Optionally: if there's a requirement to sort all the records within the top array as per the OP's example, then this additional step will do: -jw'[foreignKeyId]:<>g:[-1]'
PS. it so happens that I'm also a developer of the jtc unix tool

How do I make a facet aggregation output into true key:value json?

I wrote a script to aggregate some data, but the output isn't in true json.
I tried modifying the $project part of the aggregation pipeline, but I don't think I'm doing it right.
pipeline = [
{
"$match": {
"manu": {"$ne": "randomized"},
}},
{
"$match": {
"rssi": {"$lt": "-65db"}
}
},
{"$sort": {"time": -1}},
{
"$group": {"_id": "$mac",
"lastSeen": {"$first": "$time"},
"firstSeen": {"$last": "$time"},
}
},
{
"$project":
{
"_id": 1,
"lastSeen": 1,
"firstSeen": 1,
"minutes":
{
"$trunc":
{
"$divide": [{"$subtract": ["$lastSeen", "$firstSeen"]}, 60000]
}
},
}
},
{
"$facet": {
"0-5": [
{"$match": {"minutes": {"$gte": 1, "$lte": 5}}},
{"$count": "0-5"},
],
"5-10": [
{"$match": {"minutes": {"$gte": 5, "$lte": 10}}},
{"$count": "5-10"},
],
"10-20": [
{"$match": {"minutes": {"$gte": 10, "$lte": 20}}},
{"$count": "10-20"},
],
}
},
{"$project": {
"0-5": {"$arrayElemAt": ["$0-5.0-5", 0]},
"5-10": {"$arrayElemAt": ["$5-10.5-10", 0]},
"10-20": {"$arrayElemAt": ["$10-20.10-20", 0]},
}},
{"$sort": SON([("_id", -1)])}
]
data = list(collection.aggregate(pipeline, allowDiskUse=True))
So I basically get the output as {'0-5': 2914, '5-10': 1384, '10-20': 1295} - which cannot be used to iterate through.
Ideally it should be something like
{'timeframe': '0-5', 'count': 262}
Any suggestions?
Thanks in advance.
You can try below aggregation (replacing your current $facet and below stages):
db.col.aggregate([{
"$facet": {
"0-5": [
{"$match": {"minutes": {"$gte": 1, "$lte": 5}}},
{"$count": "total"},
],
"5-10": [
{"$match": {"minutes": {"$gte": 5, "$lte": 10}}},
{"$count": "total"},
],
"10-20": [
{"$match": {"minutes": {"$gte": 10, "$lte": 20}}},
{"$count": "total"},
]
},
},
{
$project: {
result: { $objectToArray: "$$ROOT" }
}
},
{
$unwind: "$result"
},
{
$unwind: "$result.v"
},
{
$project: {
timeframe: "$result.k",
count: "$result.v.total"
}
}
])
$facet returns single document that contains three fields (results of sub-aggregations). You can use $objectToArray to get it in a shape of k and v fields and then use $unwind to get single document per key.

parse json output for primary and secondary hosts from replSetGetStatus

I've used pymongo to connect to mongo replica set and print the status of replica set using json dump. I want to parse this output and display "name" and "stateStr" into a list or array for the user to be able to pick a particular host.Here is my json dump output:
{
{
"replSetGetStatus": {
"date": "2016-10-07T14:21:25",
"members": [
{
"_id": 0,
"health": 1.0,
"name": "xxxxxxxxxxx:27017",
"optime": null,
"optimeDate": "2016-10-07T13:50:11",
"self": true,
"state": 1,
"stateStr": "PRIMARY",
"uptime": 32521
},
{
"_id": 1,
"health": 1.0,
"lastHeartbeat": "2016-10-07T14:21:24",
"lastHeartbeatRecv": "2016-10-07T14:21:24",
"name": "xxxxxxxxxxxx:27017",
"optime": null,
"optimeDate": "2016-10-07T13:50:11",
"pingMs": 0,
"state": 2,
"stateStr": "SECONDARY",
"syncingTo": "xxxxxxxxxxxx:27017",
"uptime": 27297
},
{
"_id": 2,
"health": 1.0,
"lastHeartbeat": "2016-10-07T14:21:24",
"lastHeartbeatRecv": "2016-10-07T14:21:24",
"name": "xxxxxxxxxxxxx:27020",
"pingMs": 0,
"state": 7,
"stateStr": "ARBITER",
"uptime": 32517
}
],
"myState": 1,
"ok": 1.0,
"set": "replica1"
}
}
Please try below Javascript code. It worked for me.
use admin;
var result = rs.status();
var length = result.members.length;
for (var i=0;i<length;i++){
print ("Server Name-" +result.members[i].name);
print ("Server State-" +result.members[i].stateStr);
}

Delete element of Solr

I deleted an item you do not need to solr, but I solr response still appears.
The json:
{
"responseHeader": {
"status": 0,
"QTime": 1,
"params": {
"facet": "true",
"q": "*:*",
"facet.limit": "-1",
"facet.field": "manufacturer",
"wt": "json",
"rows": "0"
}
},
"response": {
"numFound": 84,
"start": 0,
"docs": []
},
"facet_counts": {
"facet_queries": {},
"facet_fields": {
"manufacturer": [
"Chevrolet",
0,
"abarth",
1,
"audi",
7,
"austin",
1,
"bmw",
2,
"daewoo",
2,
"ford",
1,
"fso",
1,
"honda",
1,
"hyundai",
1,
"jaguar",
3,
"lexus",
1,
"mazda",
1,
"mitsubishi",
1,
"nissan",
1,
"pontiac",
1,
"seat",
1
]
},
"facet_dates": {},
"facet_ranges": {}
}
}
the deleted item is "chevrolet", now this to '0 'but it still appears.
"manufacturer":["Chevrolet",0,
I wish I could delete the item completely, is that possible.. Thanks.
Here is a two step approach I would follow:
Make sure changes(deletion) is committed. You may issue a commit
If it still shows facets with zero count, you may append &facet.mincount=1 to your query
&facet.mincount=1 will make sure facets with zero count do not show up.
For more details, please refer to: http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount
In your case probably it is because of uninverted index created by solr.
Pass facet.mincount=1 in your query to get rid of this problem.