Extract data from Json into Pandas (Python) - json

I'm trying to extract data in a dataframe. My attempts with pd.json_normalize did not work... I must be doing something wrong.
Exemple :
{
"data": [
{
"date": {
"01_07_2020": [
{
"customerId": "977869f4e181e656d",
"data": [
{
"_id": "5e1c75498de14f0bb5d",
"sensorType": "FLAT",
"external": 0.0,
"stats": {
"min": 19.5,
"max": 20.75,
"avg": 20.0714285714,
"diff": -7.9478021978,
"last": 19.75
}
},
...
}
}
]
},
{
"customerId": "5efaf52b0b26e2ae31816",
"data": [
{
"_id": "5efb44604bd91a7cde4c",
"sensorType": "FLAT",
"external": 0.0,
"stats": {
"min": 23.0,
"max": 23.0,
"avg": 23.0,
"diff": null,
"last": 23.0
}
},
{
"_id": "5efb44604bd9126e2de4d",
"sensorType": "FLAT",
"external": 0.0,
"stats": {
"min": 17.75,
"max": 19.75,
"avg": 18.5833333333,
"diff": null,
"last": 17.75
}
}
]
}
]
},
"year": 2020
},
{
"date": {
"01_07_2021":
etc...
Expected result :
_id
sensorType
extarnal
min
max
avg
diff
last
5e1c75498de14f0bb5d
FLAT
0.0
17.75
19.5
20.75
20.0714285714
-7.9478021978
I don't show my results, I am very far from getting what I want.

You can try:
import json
import pandas as pd
json_data = r"""{
"data": [
{
"date": {
"01_07_2020": [
{
"customerId": "977869f4e181e656d",
"data": [
{
"_id": "5e1c75498de14f0bb5d",
"sensorType": "FLAT",
"external": 0.0,
"stats": {
"min": 19.5,
"max": 20.75,
"avg": 20.0714285714,
"diff": -7.9478021978,
"last": 19.75
}
}
]
},
{
"customerId": "5efaf52b0b26e2ae31816",
"data": [
{
"_id": "5efb44604bd91a7cde4c",
"sensorType": "FLAT",
"external": 0.0,
"stats": {
"min": 23.0,
"max": 23.0,
"avg": 23.0,
"diff": null,
"last": 23.0
}
},
{
"_id": "5efb44604bd9126e2de4d",
"sensorType": "FLAT",
"external": 0.0,
"stats": {
"min": 17.75,
"max": 19.75,
"avg": 18.5833333333,
"diff": null,
"last": 17.75
}
}
]
}
]
}
}
]
}"""
def get_data(o):
if isinstance(o, dict):
if "_id" in o and "stats" in o:
yield o
else:
for v in o.values():
yield from get_data(v)
elif isinstance(o, list):
for v in o:
yield from get_data(v)
data = json.loads(json_data)
all_data = []
for d in get_data(data):
all_data.append(
{"_id": d["_id"], "sensorType": d["sensorType"], **d["stats"]}
)
df = pd.DataFrame(all_data)
print(df)
Prints:
_id sensorType min max avg diff last
0 5e1c75498de14f0bb5d FLAT 19.50 20.75 20.071429 -7.947802 19.75
1 5efb44604bd91a7cde4c FLAT 23.00 23.00 23.000000 NaN 23.00
2 5efb44604bd9126e2de4d FLAT 17.75 19.75 18.583333 NaN 17.75
EDIT: Different method to create the dataframe (with customerId and date):
data = json.loads(json_data)
all_data = []
for d in data["data"]:
for dt, dd in d["date"].items():
for ddd in dd:
customer_id = ddd["customerId"]
for dddd in ddd["data"]:
all_data.append(
{
"date": dt,
"customerId": customer_id,
"_id": dddd["_id"],
"sensorType": dddd["sensorType"],
**dddd["stats"],
}
)
df = pd.DataFrame(all_data)
print(df)
Prints:
date customerId _id sensorType min max avg diff last
0 01_07_2020 977869f4e181e656d 5e1c75498de14f0bb5d FLAT 19.50 20.75 20.071429 -7.947802 19.75
1 01_07_2020 5efaf52b0b26e2ae31816 5efb44604bd91a7cde4c FLAT 23.00 23.00 23.000000 NaN 23.00
2 01_07_2020 5efaf52b0b26e2ae31816 5efb44604bd9126e2de4d FLAT 17.75 19.75 18.583333 NaN 17.75

Related

Getting JSON data in Groovy

I need to get some datas from JSON, I could manage to transform it into String. For example, I need to get the amount value if team role id is 4.( The last scope in the JSON.) When I run the code below, the "result" output is
{id=1, effectiveDate=2003-01-01, currencyCode=USD, rates=[{id=1, rateTable={id=1, effectiveDate=2003-01-01, currencyCode=USD, name=Tempo Default Price Table, defaultTable=false}, amount=0.0, link={type=DEFAULT_RATE}}], name=Tempo Default Price Table, defaultTable=true}
How can I get the whole data?
Thanks.
http.request(Method.GET) {
response.success = { resp, json ->
arrayDen = JsonOutput.toJson(json).substring(1, JsonOutput.toJson(json).length()-1)
}
}
def slurper = new groovy.json.JsonSlurper()
def result = slurper.parseText(arrayDen)
log.warn(result)
[
{
"id": 1,
"rateTable": {
"id": 1,
"effectiveDate": "2003-01-01",
"currencyCode": "USD",
"name": "Tempo Default Price Table",
"defaultTable": false
},
"amount": 0.0,
"link": {
"type": "DEFAULT_RATE"
}
},
{
"id": 2,
"rateTable": {
"id": 3,
"effectiveDate": "2022-03-21",
"currencyCode": "USD",
"name": "Rate",
"defaultTable": false
},
"amount": 0.0,
"link": {
"type": "DEFAULT_RATE"
}
},
{
"id": 3,
"rateTable": {
"id": 3,
"effectiveDate": "2022-03-21",
"currencyCode": "USD",
"name": "Rate",
"defaultTable": false
},
"amount": 200.0,
"link": {
"type": "TEAM_ROLE",
"id": 8
}
},
{
"id": 4,
"rateTable": {
"id": 3,
"effectiveDate": "2022-03-21",
"currencyCode": "USD",
"name": "Rate",
"defaultTable": false
},
"amount": 500.0,
"link": {
"type": "TEAM_ROLE",
"id": 5
}
},
{
"id": 5,
"rateTable": {
"id": 3,
"effectiveDate": "2022-03-21",
"currencyCode": "USD",
"name": "Rate",
"defaultTable": false
},
"amount": 1000.0,
"link": {
"type": "TEAM_ROLE",
"id": 4
}
}
]
`
Hmmm. The following worked fine for me:
def arrayDen = '<YOUR_URL_GOES_HERE>'.toURL().text
def slurper = new groovy.json.JsonSlurper()
def result = slurper.parseText(arrayDen)
def desiredData = result.find { it.id == 4 }
println desiredData.amount
You might give it a try.

Convert nested json to multiple dataframes - dynamically

i have a question:
Is it possible read a json file and convert to dataframe dynamically?
My example is the next code:
Having this json file, i need 3 table dataframes:
{
"date_time": "01-03-2022, 15:18:32",
"regions": {
"Home Region": "Madrid",
"Primary Region": "Barcelona",
"Secondary Region": "Rio"
},
"customers": [
{
"name": "campo santo",
"address": "rua trebal 1",
"phone": 987456321,
"parking": true
},
{
"name": "santo da silva",
"address": "rua sama 6",
"phone": 654321987,
"parking": false
},
{
"name": "roger campos",
"address": "av casal 10",
"phone": 684426654,
"parking": true
}
],
"office": [
{
"location": "madrid",
"co_working_spaces": 25,
"kitchen": false,
"food_track": 2,
"internal_staff": [
{
"id": 123,
"name": "pablo"
},
{
"id": 874,
"name": "saul"
},
{
"id": 741,
"name": "maria"
}
]
},
{
"location": "rio",
"co_working_spaces": 55,
"kitchen": true,
"food_track": 4,
"internal_staff": [
{
"id": 784,
"name": "raquel"
},
{
"id": 874,
"name": "pedro"
},
{
"id": 145,
"name": "maria"
},
{
"id": 365,
"name": "rocio"
}
]
},
{
"location": "barcelona",
"co_working_spaces": 5,
"kitchen": false,
"food_track": 1,
"internal_staff": [
]
},
{
"location": "la",
"co_working_spaces": 5,
"kitchen": true,
"food_track": 4,
"internal_staff": [
{
"id": 852,
"name": "maria"
},
{
"id": 748,
"name": "sara"
}
]
}
]
}
this is my python code:
import pandas as pd
# from pandas.io.json import json_normalize
import json
with open('offices.json') as f:
dt = json.load(f)
# df = pd.json_normalize(dt)
df1 = pd.json_normalize(dt, 'customers', 'date_time')[['name', 'address', 'phone', 'parking', 'date_time']]
print(df1)
df2 = pd.json_normalize(dt, 'office', 'date_time')[['location', 'co_working_spaces', 'kitchen', 'food_track']]
print(df2)
df3 = pd.json_normalize(dt['office'], 'internal_staff', 'location')
print(df3)
With this code, i got my 3 table dataframes. But i have to know the json structure to create the dataframes.
So is it possible to do it dynamically ?
Regards

JsonDecodingException on valid Json with Ktor/Kotlinx

Why I get the following error at offset 6 with the following code. It makes an HTTP Request, gets a Json back, should go through the Json and create the object IMDBInfo. The json is valid, obviously processing, and the data class is as easy as it could be..., but I don't get the point of the error:
Error:
Exception in thread "main" kotlinx.serialization.json.internal.JsonDecodingException: Unexpected JSON token at offset 6: Expected beginning of the string, but got {
JSON input: {"d":[{"i":{"height":741,"imageUrl":.....
at kotlinx.serialization.json.internal.JsonExceptionsKt.JsonDecodingException(JsonExceptions.kt:24)
...
Code:
class StreamingAvailability() {
var IMDBName : String = ""
fun findOriginalTitle(title: String) = runBlocking {
val client = HttpClient(Apache) {
install(JsonFeature) {
serializer = KotlinxSerializer(kotlinx.serialization.json.Json {
prettyPrint = true
isLenient = true
ignoreUnknownKeys = true
coerceInputValues = true
allowStructuredMapKeys = true
})
}
}
val result : IMDBInfo = client.get {
url {
protocol = URLProtocol.HTTPS
encodedPath = "auto-complete"
host = "imdb8.p.rapidapi.com"
}
parameter("q", title)
headers {
append(HttpHeaders.Accept, "application/json")
append(HttpHeaders.ContentType, ContentType.Application.Json)
append(HttpHeaders.UserAgent, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36")
append(HttpHeaders.Authorization, API_KEY)
append("X-Rapidapi-Key", API_KEY)
append("X-Rapidapi-Host", "imdb8.p.rapidapi.com")
}
}
println(result.d)
}
}
Data Classes:
import kotlinx.serialization.Serializable
#Serializable data class D(
val l: String
)
#Serializable data class IMDBInfo(
val d: List<D>,
)
JSON which is valid and processed in the response:
{
"d": [{
"i": {
"height": 800,
"imageUrl": "https://m.media-amazon.com/images/M/MV5BMTMzNDkzMTcyOV5BMl5BanBnXkFtZTcwNDIzMjM2MQ##._V1_.jpg",
"width": 550
},
"id": "tt1080016",
"l": "Ice Age: Dawn of the Dinosaurs",
"q": "feature",
"rank": 7936,
"s": "Ray Romano, John Leguizamo",
"v": [{
"i": {
"height": 360,
"imageUrl": "https://m.media-amazon.com/images/M/MV5BMTMyODMyMDY3MF5BMl5BanBnXkFtZTcwMTg2MTM0Mg##._V1_.jpg",
"width": 480
},
"id": "vi3380019993",
"l": "Ice Age: Dawn of the Dinosaurs -- Trailer #2",
"s": "2:30"
}, {
"i": {
"height": 360,
"imageUrl": "https://m.media-amazon.com/images/M/MV5BMjFkMjY3NzYtNTkzOS00ZWM4LThhN2MtZTk0MTczMGRjZmNiXkEyXkFqcGdeQXVyNzU1NzE3NTg#._V1_.jpg",
"width": 480
},
"id": "vi64291353",
"l": "Ice Age: Dawn of the Dinosaurs",
"s": "0:59"
}, {
"i": {
"height": 360,
"imageUrl": "https://m.media-amazon.com/images/M/MV5BMTg1NTgwNzg5M15BMl5BanBnXkFtZTgwOTc4NzkxMzE#._V1_.jpg",
"width": 480
},
"id": "vi2023162649",
"l": "Ice Age: Dawn of the Dinosaurs -- Trailer #1",
"s": "2:34"
}],
"vt": 5,
"y": 2009
}, {
"i": {
"height": 500,
"imageUrl": "https://m.media-amazon.com/images/M/MV5BMjE1NTEwMTEwOF5BMl5BanBnXkFtZTcwMDA2MDQyOQ##._V1_.jpg",
"width": 357
},
"id": "tt1907779",
"l": "The Dinosaur Project",
"q": "feature",
"rank": 39963,
"s": "Richard Dillane, Peter Brooke",
"v": [{
"i": {
"height": 480,
"imageUrl": "https://m.media-amazon.com/images/M/MV5BMTg3ODAxMTg4OF5BMl5BanBnXkFtZTcwNTg0OTI4OA##._V1_.jpg",
"width": 640
},
"id": "vi3951666969",
"l": "The Dinosaur Project Trailer",
"s": "2:11"
}],
"vt": 1,
"y": 2012
}, {
"i": {
"height": 789,
"imageUrl": "https://m.media-amazon.com/images/M/MV5BMTk0MTI1NTI1MF5BMl5BanBnXkFtZTcwMDg2Mzc4OQ##._V1_.jpg",
"width": 603
},
"id": "tt2303110",
"l": "Rise of the Dinosaurs",
"q": "feature",
"rank": 46988,
"s": "Gary Stretch, Corin Nemec",
"y": 2013
}, {
"i": {
"height": 1285,
"imageUrl": "https://m.media-amazon.com/images/M/MV5BMWI4ZjZmYTktOWIxNS00MmMyLTk5YzctNGQ4ZDg3MmIxYmZkXkEyXkFqcGdeQXVyODg1Njg2Njc#._V1_.jpg",
"width": 900
},
"id": "tt7818384",
"l": "Dino the Dinosaur",
"q": "TV series",
"rank": 65868,
"s": "June Yoon",
"y": 2016,
"yr": "2016-2019"
}, {
"i": {
"height": 475,
"imageUrl": "https://m.media-amazon.com/images/M/MV5BMTU2NDkyODcxM15BMl5BanBnXkFtZTcwNTA0MzQyMQ##._V1_.jpg",
"width": 301
},
"id": "tt0136639",
"l": "Extreme Dinosaurs",
"q": "TV series",
"rank": 83132,
"s": "Scott McNeil, Cusse Mankuma",
"y": 1997,
"yr": "1997-1997"
}, {
"i": {
"height": 475,
"imageUrl": "https://m.media-amazon.com/images/M/MV5BMTIzODM4NTYyMV5BMl5BanBnXkFtZTcwODYzMzAzMQ##._V1_.jpg",
"width": 253
},
"id": "tt0103400",
"l": "The Dinosaurs!",
"q": "TV series",
"rank": 189562,
"s": "Barbara Feldon, Robert Bakker",
"y": 1992,
"yr": "1992-"
}, {
"i": {
"height": 2048,
"imageUrl": "https://m.media-amazon.com/images/M/MV5BNGFlNDczMjMtNmQ1OS00MTJjLTk4NDQtNWU2OGY4Y2M2NDdlXkEyXkFqcGdeQXVyNjE4OTE4OTc#._V1_.jpg",
"width": 1418
},
"id": "tt14162824",
"l": "The Dinosaur",
"q": "feature",
"rank": 211135,
"s": "Veikko Aaltonen, Rauni Mollberg",
"y": 2021
}, {
"i": {
"height": 720,
"imageUrl": "https://m.media-amazon.com/images/M/MV5BYjQ3MTU3ZjgtNDA5Zi00N2EwLWExNjctZWJhMzFlMGVjOGJjXkEyXkFqcGdeQXVyNTg0NTkzNTk#._V1_.jpg",
"width": 1280
},
"id": "tt6877360",
"l": "The Day the Dinosaurs Died",
"q": "TV movie",
"rank": 212976,
"s": "Alice Roberts, Ben Garrod",
"y": 2017
}],
"q": "diedinos",
"v": 1
}
Just rename the class D to something that has more than 1 symbol. I've created an issue for this really weird behavior.

jq filter to aggregate into an object's property

I need to transform JSON to update a firebase firestore.
My incoming json looks like this:
[{
"ItemID": 1,
"Size": 10,
"Price": 5
},
{
"ItemID": 1,
"Size": 11,
"Price": 7
},
{
"ItemID": 1,
"Size": 12,
"Price": 10
},
{
"ItemID": 2,
"Size": 11,
"Price": 15
},
{
"ItemID": 2,
"Size": 12,
"Price": 20
}]
And I need JSON to look like this:
[{
"ItemID": 1,
"Price": {
"10": 5,
"11": 7,
"12": 10
}
},
{
"ItemID": 2,
"Price": {
"11": 15,
"12": 20
}
}]
What jq filter do I need to do that please?
My other alternatives are to loop through it in javascript, however I want to make this extendable so the pattern can be added, saved and run.
My other alternative is not to store values as keys, to do something like:
[
{
"ItemID":1,
"Prices":[
{"Size":10, "Price":5}
]
}
]
After fixing the JSON, the following filter produces the output as shown below:
group_by(.ItemID)
| map( reduce .[] as $x ( .[0] | {ItemID};
.Price += ($x | {(.Size|tostring): .Price}) ) )
Output:
[
{
"ItemID": 1,
"Price": {
"10": 5,
"11": 7,
"12": 10
}
},
{
"ItemID": 2,
"Price": {
"11": 15,
"12": 20
}
}
]

jq: group objects by string

I have some json from eurostat, which looks like this:
{
"version": "2.0",
"label": "Principaux agrégats des administrations publiques, y compris recettes et dépenses",
"href": "http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/fr/gov_10a_main?unit=PC_GDP&na_item=TE&sector=S13&time=2008&time=2009&time=2010&time=2011&time=2012&time=2013&time=2014&time=2015&time=2016&time=2017&geo=DE&geo=AT&geo=BE&geo=BG&geo=CY&geo=HR&geo=FI",
"source": "Eurostat",
"updated": "2018-10-26",
"status": {
"57": "b"
},
"extension": {
"datasetId": "gov_10a_main",
"lang": "FR",
"description": null,
"subTitle": null,
"status": {
"label": {
"b": "rupture de série"
}
}
},
"class": "dataset",
"value": {
"0": 49.9,
"1": 54.1,
"2": 52.8,
"3": 50.9,
"4": 51.2,
"5": 51.6,
"6": 52.4,
"7": 51.1,
"8": 50.3,
"9": 49.2,
"10": 50.3,
"11": 54.2,
"12": 53.3,
"13": 54.5,
"14": 55.9,
"15": 55.8,
"16": 55.3,
"17": 53.7,
"18": 53,
"19": 52.2,
"20": 37.1,
"21": 39.4,
"22": 36.2,
"23": 33.8,
"24": 34.5,
"25": 37.7,
"26": 43.1,
"27": 40.5,
"28": 35.1,
"29": 35.1,
"30": 38.4,
"31": 42.1,
"32": 42,
"33": 42.3,
"34": 41.9,
"35": 41.9,
"36": 48.8,
"37": 40.6,
"38": 38,
"39": 37.5,
"40": 43.6,
"41": 47.6,
"42": 47.3,
"43": 44.7,
"44": 44.3,
"45": 44.7,
"46": 44,
"47": 43.7,
"48": 43.9,
"49": 43.9,
"50": 48.3,
"51": 54.8,
"52": 54.8,
"53": 54.4,
"54": 56.2,
"55": 57.5,
"56": 58.1,
"57": 57.1,
"58": 55.9,
"59": 54,
"60": 45.3,
"61": 48.3,
"62": 48,
"63": 48.5,
"64": 47.8,
"65": 47.6,
"66": 48.1,
"67": 48.3,
"68": 46.9,
"69": 45
},
"dimension": {
"unit": {
"label": "unit",
"category": {
"index": {
"PC_GDP": 0
},
"label": {
"PC_GDP": "Pourcentage du produit intérieur brut (PIB)"
}
}
},
"sector": {
"label": "sector",
"category": {
"index": {
"S13": 0
},
"label": {
"S13": "Administrations publiques"
}
}
},
"na_item": {
"label": "na_item",
"category": {
"index": {
"TE": 0
},
"label": {
"TE": "Total des dépenses des administrations publiques"
}
}
},
"geo": {
"label": "geo",
"category": {
"index": {
"AT": 0,
"BE": 1,
"BG": 2,
"CY": 3,
"DE": 4,
"FI": 5,
"HR": 6
},
"label": {
"AT": "Autriche",
"BE": "Belgique",
"BG": "Bulgarie",
"CY": "Chypre",
"DE": "Allemagne (jusqu'en 1990, ancien territoire de la RFA)",
"FI": "Finlande",
"HR": "Croatie"
}
}
},
"time": {
"label": "time",
"category": {
"index": {
"2008": 0,
"2009": 1,
"2010": 2,
"2011": 3,
"2012": 4,
"2013": 5,
"2014": 6,
"2015": 7,
"2016": 8,
"2017": 9
},
"label": {
"2008": "2008",
"2009": "2009",
"2010": "2010",
"2011": "2011",
"2012": "2012",
"2013": "2013",
"2014": "2014",
"2015": "2015",
"2016": "2016",
"2017": "2017"
}
}
}
},
"id": [
"unit",
"sector",
"na_item",
"geo",
"time"
],
"size": [
1,
1,
1,
7,
10
]
}
I would like to produce a csv file.
First, I need to concatenate .status with .value by string (sorry for my poor json knowledge) --> "status":{"57": "b"} with "value":{"57": 57.1}.
Second, I need to produce the same table as the original one (downloaded from eurostat).
I try many jq commands, like:
.status,.value | to_entries
I'm far from finding a solution.
Any help? I think map or map_values/group_by command are needed, but I don't really understand these functions.
EDIT :
I download data from eurostat.
I use their web service here, where I can download data in json format.
I would like to reproduce in shell same table as original, with jq. In my exemple, it should look like :
GEO/TIME,2010,2011,2012,2013,2014,2015,2016,2017
Belgique,"53,3","54,5","55,9","55,8","55,3","53,7","53,0","52,2"
Bulgarie,"36,2","33,8","34,5","37,7","43,1","40,5","35,1","35,1"
"Allemagne (jusqu'en 1990, ancien territoire de la RFA)","47,3","44,7","44,3","44,7","44,0","43,7","43,9","43,9"
Croatie,"48,0","48,5","47,8","47,6","48,1","48,3","46,9","45,0"
Chypre,"42,0","42,3","41,9","41,9","48,8","40,6","38,0","37,5"
Finlande,"54,8","54,4","56,2","57,5","58,1","57,1","55,9","54,0"
But json contain metadata, and Finlande must have 57,1b value.
I hope it's more clear with this edit.
And many thanks for your help.
Your question doesn't indicate very precisely what output you want, but hopefully you'll be able to adapt the following:
.value as $dict
| .status
| to_entries
| map( [.key, .value, $dict[.key]] )
| .[]
| #csv
With your input, and invoking jq with the -r option, this produces:
"57","b",57.1