Related
Edit: sample json of details column:
{6591: '[]',
8112: "[{'name': 'start', 'time': 1659453223851}, {'name': 'arrival', 'time': 1659454209024, 'location': [-73.7895605, 40.6869539]}, {'name': 'departure', 'time': 1659453289013, 'location': [-73.8124575, 40.7091602]}]",
5674: '[]',
4236: '[]',
3148: "[{'name': 'start', 'time': 1659121571280}, {'name': 'arrival', 'time': 1659122768105, 'location': [-74.220351348, 40.748419051]}, {'name': 'departure', 'time': 1659121605076, 'location': [-74.189452444, 40.715865856]}]",
3408: "[{'name': 'start', 'time': 1659113772531}, {'name': 'arrival', 'time': 1659114170204, 'location': [-73.9469142, 40.671488]}, {'name': 'departure', 'time': 1659113832693, 'location': [-73.956379, 40.6669802]}]",
1438: '[]',
3634: '[]',
5060: "[{'name': 'start', 'time': 1659190337964}, {'name': 'arrival', 'time': 1659190367182, 'location': [-76.614058283, 39.292697049]}, {'name': 'departure', 'time': 1659190345722, 'location': [-76.614058283, 39.292697049]}]",
6614: '[]',
7313: '[]',
7653: '[]',
9446: '[]',
1237: '[]',
6974: "[{'name': 'start', 'time': 1659383554887}, {'name': 'adminCompletion', 'time': 1659386192031, 'data': {'adminId': 'ZFQCAL6aeS', 'sendNotificationFromAdminComplete': False}}, {'name': 'arrival', 'time': 1659385764198, 'location': [-73.943001009, 40.705886527]}, {'name': 'departure', 'time': 1659383653199, 'location': [-73.94038015, 40.814893186]}]",
762: '[]',
4843: '[]',
8682: '[]',
7271: '[]',
4672: "[{'name': 'start', 'time': 1659131562088}, {'name': 'arrival', 'time': 1659131937387, 'location': [-87.62621, 41.9015626]}, {'name': 'departure', 'time': 1659131637316, 'location': [-87.6263294, 41.9094856]}]"}
I have a dataframe columns like 'details' and 'id'. It looks like this. I want to completely flatten details column.
details id
[{'name': 'start', 'time': 1659479418}, {'name': 'arrival', 'time': 1659452651073, 'location': [-75.040536278, 40.034055]}, {'name': 'departure', 'time': 1659451650, 'location': [-75.1609003, 39.947729034]}] 1
[] 2
[] 3
[{'name': 'start', 'time': 1659126581459}, {'name': 'arrival', 'time': 1659128206850, 'location': [-80.3165751, 25.8625698]}, {'name': 'departure', 'time': 1659126641679, 'location': [-80.2511886, 25.921769]}] 4
[{'name': 'start', 'time': 1659120813100}, {'name': 'arrival', 'time': 1659121980125, 'location': [-76.642292, 39.307895253]}, {'name': 'departure', 'time': 1659120903093, 'location': [-76.741190426, 39.34240617]}] 5
[] 6
[] 7
[{'name': 'start', 'time': 1659217203753}, {'name': 'adminCompletion', 'time': 1659217336224, 'data': {'adminId': '~R~WZt7bKO979BRTqHyarS2p', 'sendNotification': False}}, {'name': 'arrival', 'time': 1659217308939, 'location': [-73.941830752, 40.702405857]}, {'name': 'departure', 'time': 1659217288936, 'location': [-73.941830752, 40.702405857]}] 8
[{'name': 'start', 'time': 1659189824814}, {'name': 'arrival', 'time': 1659191937100, 'location': [-76.406627, 39.984]}, {'name': 'departure', 'time': 1659189915191, 'location': [-76.614515552, 39.292407218]}] 9
[] 10
what is expected from this is:
start_time admincompletiontime adminId sendnotification arrival_time arrival_location departure_time departure_location id
1659479418 1.65945E+12 [-75.040536278, 40.034055] 1659451650 [-75.1609003, 39.947729034] 1
2
3
1.65913E+12 1.65913E+12 [-80.3165751, 25.8625698] 1.65913E+12 [-80.2511886, 25.921769] 4
1.65922E+12 1.65922E+12 ~R~WZt7bKO979BRTqHyarS2p FALSE 1.65922E+12 [-73.941830752, 40.702405857] 1.65922E+12 [-73.941830752, 40.702405857] 8
I want to extract all the columns that are passed as values. pd.json_normalize() did not work for me in this case. please suggest.
Your data is pretty scuffed, you need to clean it up, but following a pattern like this should start you in the right direction:
from ast import literal_eval
data = {key:literal_eval(value) for key, value in data.items()}
data = [[{y['name']:{'time':y['time'],'location':y.get('location')}} for y in x] for x in data.values() if x]
df = pd.concat([pd.json_normalize(x) for x in data])
df = (df.dropna(how='all', axis=1)
.bfill()
.dropna()
.drop_duplicates('start.time')
.reset_index(drop=True))
print(df)
Output:
start.time arrival.time arrival.location departure.time departure.location adminCompletion.time
0 1.659453e+12 1.659454e+12 [-73.7895605, 40.6869539] 1.659453e+12 [-73.8124575, 40.7091602] 1.659386e+12
1 1.659122e+12 1.659454e+12 [-73.7895605, 40.6869539] 1.659453e+12 [-73.8124575, 40.7091602] 1.659386e+12
2 1.659114e+12 1.659123e+12 [-74.220351348, 40.748419051] 1.659122e+12 [-74.189452444, 40.715865856] 1.659386e+12
3 1.659190e+12 1.659114e+12 [-73.9469142, 40.671488] 1.659114e+12 [-73.956379, 40.6669802] 1.659386e+12
4 1.659384e+12 1.659190e+12 [-76.614058283, 39.292697049] 1.659190e+12 [-76.614058283, 39.292697049] 1.659386e+12
5 1.659132e+12 1.659386e+12 [-73.943001009, 40.705886527] 1.659384e+12 [-73.94038015, 40.814893186] 1.659386e+12
I have a json string stored in a field in BigQuery which has this structure::
{'language': 'Eng', 'date_started': '2021-02-08 16: 56: 55 GMT', 'link_id': '111', 'url_variables': {'touchpoint': {'key': 'touchpoint', 'value': 'phone', 'type': 'url'
}, 'interaction_id': {'key': 'interaction_id', 'value': '111', 'type': 'url'
}
}, 'ip_address': None, 'referer': '', 'user_agent': None, 'response_time': 111, 'data_quality': [], 'longitude': '', 'latitude': '', 'country': '', 'city': '', 'region': '', 'postal': '', 'dma': '', 'survey_data': {'25': {'id': 25, 'type': 'TEXTBOX', 'question': 'feedback_source', 'section_id': 1, 'shown': False
}, '229': {'id': 229, 'type': 'TEXTBOX', 'question': 'recruitment_method', 'section_id': 1, 'shown': False
}, '227': {'id': 227, 'type': 'TEXTBOX', 'question': 'meeting_point', 'section_id': 1, 'answer': 'phone', 'shown': True
}, '221': {'id': 221, 'type': 'TEXTBOX', 'question': 'interaction_id', 'section_id': 1, 'answer': '222', 'shown': True
}, '217': {'id': 217, 'type': 'TEXTBOX', 'question': 'session_id', 'section_id': 1, 'answer': '333', 'shown': True
}, '231': {'id': 231, 'type': 'ESSAY', 'question': 'BlaBla question 4', 'section_id': 3, 'answer': 'Bla Bla answer', 'shown': True
}, '255': {'id': 255, 'type': 'TEXTBOX', 'question': 'tz_offset', 'section_id': 3, 'answer': '-120', 'shown': True
}, '77': {'id': 77, 'type': 'parent', 'question': 'Bla Bla 1', 'section_id': 35, 'options': {'10395': {'id': 10395, 'option': 'Neutraal', 'answer': '3'
}
}, 'shown': True
}, '250': {'id': 250, 'type': 'RADIO', 'question': 'Bla Bla?', 'section_id': 66, 'original_answer': '1', 'answer': '1', 'answer_id': 10860, 'shown': True
}, '251': {'id': 251, 'type': 'RADIO', 'question': 'Bla Bla', 'section_id': 66, 'original_answer': '0', 'answer': '0', 'answer_id': 10863, 'shown': True
}
}
}
I'm able to extract some of the values with the query below, but I cannot extract response_time or any of the values inside the survey_data structure.
They always come out as null.
DECLARE resp STRING
DEFAULT "{'id': '111', 'contact_id': '', 'status': 'Complete', 'is_test_data': '0', 'date_submitted': '2021-07-08 17: 02: 16 GMT', 'session_id': '111', 'language': 'Eng', 'date_started': '2021-02-08 16: 56: 55 GMT', 'link_id': '111', 'url_variables': {'touchpoint': {'key': 'touchpoint', 'value': 'phone', 'type': 'url' }, 'interaction_id': {'key': 'interaction_id', 'value': '111', 'type': 'url' } }, 'ip_address': None, 'referer': '', 'user_agent': None, 'response_time': 111, 'data_quality': [], 'longitude': '', 'latitude': '', 'country': '', 'city': '', 'region': '', 'postal': '', 'dma': '', 'survey_data': {'25': {'id': 25, 'type': 'TEXTBOX', 'question': 'feedback_source', 'section_id': 1, 'shown': False }, '229': {'id': 229, 'type': 'TEXTBOX', 'question': 'recruitment_method', 'section_id': 1, 'shown': False }, '227': {'id': 227, 'type': 'TEXTBOX', 'question': 'meeting_point', 'section_id': 1, 'answer': 'phone', 'shown': True }, '221': {'id': 221, 'type': 'TEXTBOX', 'question': 'interaction_id', 'section_id': 1, 'answer': '222', 'shown': True }, '217': {'id': 217, 'type': 'TEXTBOX', 'question': 'session_id', 'section_id': 1, 'answer': '333', 'shown': True }, '231': {'id': 231, 'type': 'ESSAY', 'question': 'BlaBla question 4', 'section_id': 3, 'answer': 'Bla Bla answer', 'shown': True }, '255': {'id': 255, 'type': 'TEXTBOX', 'question': 'tz_offset', 'section_id': 3, 'answer': '-120', 'shown': True }, '77': {'id': 77, 'type': 'parent', 'question': 'Bla Bla 1', 'section_id': 35, 'options': {'10395': {'id': 10395, 'option': 'Neutraal', 'answer': '3' } }, 'shown': True }, '250': {'id': 250, 'type': 'RADIO', 'question': 'Bla Bla?', 'section_id': 66, 'original_answer': '1', 'answer': '1', 'answer_id': 10860, 'shown': True }, '251': {'id': 251, 'type': 'RADIO', 'question': 'Bla Bla', 'section_id': 66, 'original_answer': '0', 'answer': '0', 'answer_id': 10863, 'shown': True } } }";
SELECT
JSON_VALUE( resp, '$.url_variables.interaction_id.value') as url_interaction_id_value ,
JSON_VALUE( resp, '$.url_variables.interaction_id.type') as url_interaction_id_type,
JSON_VALUE( resp, '$.language') as language,
JSON_QUERY( resp, '$.response_time') as response_time, -- NOT WORKING
JSON_QUERY( resp, '$.survey_data') as survey_data -- NOT WORKING
I tried with jq in bash from the CLI and it seems to complain about the fact that some of the None values are not quoted.
Question:
Does it mean that BigQuery attempts to extract values from the JSON string as far as it can, "until" it encounters something that it is not well formatted (e.g. the unquoted None values) and then it just cannot parse further and returns nulls ?
NB: In another app, I have been able to parse the json file in Python and extract values from inside the json string.
Looks like you have few formatting issues with your resp field which you can fix with few REPLACEs as in below example
SELECT
JSON_VALUE( resp, '$.url_variables.interaction_id.value') as url_interaction_id_value ,
JSON_VALUE( resp, '$.url_variables.interaction_id.type') as url_interaction_id_type,
JSON_VALUE( resp, '$.language') as language,
JSON_QUERY( resp, '$.response_time') as response_time, -- WORKING NOW
JSON_QUERY( resp, '$.survey_data') as survey_data -- WORKING NOW,
FROM (
SELECT REPLACE(REPLACE(REPLACE(resp, "None,", "'None',"), "True", "true"), "False", "false") as resp
FROM `project.dataset.table`
)
if applied to sample data in your question - now it gets you all you need
I have a JSON string that returns device info and if devices are found, the devices will be listed as device0, device1, device2, etc. In this simple code below, how can I discover all devices found in the JSON and then print the the info below for each device? I currently lookup each device statically and I want this discovery to be dynamic and print the results for each one found.
r1 = requests.get(url = url_api, params = PARAMS)
devicedata = r1.json()
if 'device0' in devicedata:
print('')
device0Name = (devicedata['device0']['device_name'])
print(device0Name)
print('Temp: {}'.format (devicedata['device0']['obs'][0]['ambient_temp']))
print('Probe Temp: {}'.format (devicedata['device0']['obs'][0]['probe_temp']))
print('Humidity: {}%'.format (devicedata['device0']['obs'][0]['humidity']))
print('')
# JSON info looks like this...
{'device0': {'success': True, 'device_type': 'TX60', 'obs': [{'device_id': '1111', 'device_type': 'TX60', 'u_timestamp': '1580361017', 'ambient_temp': '45.7', 'probe_temp': '45.5', 'humidity': '82', 'linkquality': '100', 'lowbattery': '0', 'success': '9', 's_interval': '99', 'timestamp': '1/29/2020 11:10 PM', 'utctime': 1580361017}], 'alerts': {'miss': {'id': '520831', 'alert_type': 'miss', 's_id': '1111', 'max': '-100', 'min': '30', 'wet': '0', 'alert_id': '1', 'phone': 'yes', 'email': '', 'state': None}, 'batt': {'id': '520832', 'alert_type': 'batt', 's_id': '1111', 'max': '-100', 'min': '-100', 'wet': '0', 'alert_id': '1', 'phone': 'yes', 'email': '', 'state': None}}, 'ispws': 0, 'unit': {'temp': '°F', 'temp2': '°F', 'rh': '%'}, 'device_id': '1111', 'expired': '0', 'interval': '30', 'reg_date': '2020-01-17 22:06:48', 'create_date': 1579298808, 'device_name': 'Back Yard', 'assocGateway': '1', 'problem': False}, 'device1': {'success': True, 'device_type': 'TX60', 'obs': [{'device_id': '2222', 'device_type': 'TX60', 'u_timestamp': '1580360303', 'ambient_temp': '63.6', 'probe_temp': 'N/C', 'humidity': '64', 'linkquality': '100', 'lowbattery': '0', 'success': '9', 's_interval': '99', 'timestamp': '1/29/2020 10:58 PM', 'utctime': 1580360303}], 'alerts': {'miss': {'id': '520220', 'alert_type': 'miss', 's_id': '2222', 'max': '-100', 'min': '30', 'wet': '0', 'alert_id': '1', 'phone': 'yes', 'email': '', 'state': None}, 'batt': {'id': '520221', 'alert_type': 'batt', 's_id': '2222', 'max': '-100', 'min': '-100', 'wet': '0', 'alert_id': '1', 'phone': 'yes', 'email': '', 'state': None}}, 'ispws': 0, 'unit': {'temp': '°F', 'temp2': '°F', 'rh': '%'}, 'device_id': '3333', 'expired': '1', 'interval': '30', 'reg_date': '2016-03-19 01:45:04', 'create_date': 1500868369, 'device_name': 'Crawl Space', 'assocGateway': '1', 'problem': False}, 'device2': {'success': True, 'device_type': 'TX60', 'obs': [{'device_id': '3333', 'device_type': 'TX60', 'u_timestamp': '1580360195', 'ambient_temp': '70.2', 'probe_temp': 'N/C', 'humidity': '48', 'linkquality': '100', 'lowbattery': '0', 'success': '9', 's_interval': '99', 'timestamp': '1/29/2020 10:56 PM', 'utctime': 1580360195}], 'alerts': None, 'ispws': 0, 'unit': {'temp': '°F', 'temp2': '°F', 'rh': '%'}, 'device_id': '3333', 'expired': '0', 'interval': '15', 'reg_date': '2020-01-30 04:34:00', 'create_date': 1580358840, 'device_name': 'Basement', 'assocGateway': '2', 'problem': False}, 'tz': 'America/Chicago'}
The output for a single device looks like this..
Back Yard
Temp: 50.9
Probe Temp: 51.2
Humidity: 92%
Crawl Space
Temp: 65.4
Probe Temp: N/C
Humidity: 55%
Basement
Temp: 70
Probe Temp: N/C
Humidity: 48%
Found it.
for devKey in devicedata.keys():
if "device" in devKey:
dev = devicedata[devKey]
name = dev["device_name"]
obs = dev["obs"][0]
temp = obs["ambient_temp"]
probeTemp = obs["probe_temp"]
humidity = obs["humidity"]
print(name)
print('Temp: {}'.format(temp))
print('Probe Temp: {}'.format(probeTemp))
print('Humidity: {}%'.format(humidity))
print('')
I'm trying to convert a .csv to json/dict such that the data in its current form:
cat1,cat2,cat3,name
1,2,3,a
4,5,6,b
7,8,9,c
I'm currently using something like this(as well as importing using pandas.df bc it will be used for graphing from json file):
with open('Data.csv') as f:
reader = csv.DictReader(f)
rows = list(reader)
print (rows)
[{'cat1': '1', 'name': 'a', 'cat3': '3', 'cat2': '2'},
{'cat1': '4', 'name': 'b', 'cat3': '6', 'cat2': '5'},
{'cat1': '7', 'name': 'c', 'cat3': '9', 'cat2': '8'}]
and I want it to look like this in json/dict format:
{"data: [{"all_cats": {"cat1": 1}, {"cat2": 2}, {"cat3": 3}}, "name": a},
{"all_cats": {"cat1": 4}, {"cat2": 5}, {"cat3": 6}}, "name": b},
{"all_cats": {"cat1": 7}, {"cat2": 8}, {"cat3": 8}}, "name": c}]}
Importing directly doesn't allow me to include: 'cat1', 'cat2', 'cat3' under 'all_cats' and keep 'name' separate.
Any help would be appreciated.
Since it's space separated and not comma separated you have to add delimiter=" ". Additionally since some of your rows have whitespace beforehand, that means you also have to add skipinitialspace=True.
reader = csv.DictReader(f, delimiter=" ", skipinitialspace=True)
rows = list(dict(row) for row in reader)
Thus if you now do:
for row in rows:
print(row)
The output will be:
{'cat1': '1', 'cat2': '2', 'cat3': '3', 'name': 'a'}
{'cat1': '4', 'cat2': '5', 'cat3': '6', 'name': 'b'}
{'cat1': '7', 'cat2': '8', 'cat3': '9', 'name': 'c'}
As already mentioned in the other answer you don't specify valid JSON format for what you want to achieve. You can check if a string contains valid JSON format using json.loads(jsonDATAstring) function:
import json
jsonDATAstring_1 = """
{"data: [{"all_cats": {"cat1": 1}, {"cat2": 2}, {"cat3": 3}}, "name": a},
{"all_cats": {"cat1": 4}, {"cat2": 5}, {"cat3": 6}}, "name": b},
{"all_cats": {"cat1": 7}, {"cat2": 8}, {"cat3": 8}}, "name": c}]}
"""
json.loads(jsonDATAstring_1)
what in case of the by you specified expected JSON format results in:
json.decoder.JSONDecodeError: Expecting ':' delimiter: line 2 column 12 (char 12)
From what is known to me from your question I assume, that the JSON string you want to get is a following one:
jsonDATAstring_2 = """
{"data": [{"all_cats": {"cat1": 1, "cat2": 2, "cat3": 3}, "name": "a"},
{"all_cats": {"cat1": 4, "cat2": 5, "cat3": 6}, "name": "b"},
{"all_cats": {"cat1": 7, "cat2": 8, "cat3": 8}, "name": "c"}]}
"""
json.loads(jsonDATAstring_2)
This second string loads OK, so assuming:
rows = [{'cat1': '1', 'name': 'a', 'cat3': '3', 'cat2': '2'},
{'cat1': '4', 'name': 'b', 'cat3': '6', 'cat2': '5'},
{'cat1': '7', 'name': 'c', 'cat3': '9', 'cat2': '8'}]
you can get what you want as follows:
dctData = {"data": []}
lstCats = ['cat1', 'cat2', 'cat3']
for row in rows:
dctAllCats = {"all_cats":{}, "name":"?"}
for cat in lstCats:
dctAllCats["all_cats"][cat] = row[cat]
dctAllCats["name"] = row["name"]
dctData["data"].append(dctAllCats)
import pprint
pp = pprint.PrettyPrinter()
pp.pprint(dctData)
what gives:
{'data': [{'all_cats': {'cat1': '1', 'cat2': '2', 'cat3': '3'}, 'name': 'a'},
{'all_cats': {'cat1': '4', 'cat2': '5', 'cat3': '6'}, 'name': 'b'},
{'all_cats': {'cat1': '7', 'cat2': '8', 'cat3': '9'}, 'name': 'c'}]}
Now it is possible to serialize the Python dictionary object to JSON string (or file):
jsonString = json.dumps(dctData)
print(jsonString)
what gives:
{"data": [{"all_cats": {"cat1": "1", "cat2": "2", "cat3": "3"}, "name": "a"}, {"all_cats": {"cat1": "4", "cat2": "5", "cat3": "6"}, "name": "b"}, {"all_cats": {"cat1": "7", "cat2": "8", "cat3": "9"}, "name": "c"}]}
I have a list like this.
all_chords = [['C', 'C', 'E', 'G'],
['CM7', 'C', 'E', 'G', 'B'],
['C7', 'C', 'E', 'G', 'Bb'],
['Cm7', 'C', 'Eb', 'G', 'Bb'],
['Cm7b5', 'C', 'Eb', 'Gb', 'Bb'],
['Cdim7', 'C', 'Eb', 'Gb', 'Bbb(A)'],
['Caug7', 'C', 'E', 'G#', 'Bb'],
['C6', 'C', 'E', 'G', 'A'],
['Cm6', 'C', 'Eb', 'G', 'A'],
]
I want to print out to a csv file, something like this.
C_chords.csv
C;C,E,G
CM7;C,E,G,B
C7;C,E,G,Bb
Cm7;C,Eb,G,Bb
Cm7b5;C,Eb,Gb,Bb
Cdim7;C,Eb,Gb,Bbb(A)
Caug7;C,E,G#,Bb
C6;C,E,G,A
Cm6;C,Eb,G,A
It has two fileds which are separted by semicolon. (not by a comma)
I used csv module, like this.
myfile = open('C_chords.csv','w')
wr = csv.writer(myfile, quotechar=None)
wr.writerows(all_chords)
myfile.close()
The result is..
C,C,E,G
CM7,C,E,G,B
C7,C,E,G,Bb
Cm7,C,Eb,G,Bb
Cm7b5,C,Eb,Gb,Bb
Cdim7,C,Eb,Gb,Bbb(A)
Caug7,C,E,G#,Bb
C6,C,E,G,A
Cm6,C,Eb,G,A
Should I modify the list? Somewhat like this?
[['C',';', 'C', 'E', 'G'],.......]
or any other brilliant ideas do you guys have?
Thanks in advance.
You're writing four columns, not two, if you want the last list elements be one single column, you need to join them first manually.
And you need to change the delimiter if you want the csv semicolon separated, not the quote character:
import csv
all_chords = [['C', 'C', 'E', 'G'],
['CM7', 'C', 'E', 'G', 'B'],
['C7', 'C', 'E', 'G', 'Bb'],
['Cm7', 'C', 'Eb', 'G', 'Bb'],
['Cm7b5', 'C', 'Eb', 'Gb', 'Bb'],
['Cdim7', 'C', 'Eb', 'Gb', 'Bbb(A)'],
['Caug7', 'C', 'E', 'G#', 'Bb'],
['C6', 'C', 'E', 'G', 'A'],
['Cm6', 'C', 'Eb', 'G', 'A'],
]
myfile = open('C_chords.csv','w')
wr = csv.writer(myfile, delimiter=';')
wr.writerows([c[0], ','.join(c[1:])] for c in all_chords)
myfile.close()
I think it's easier to do it without the csv module:
with open('C_chords.csv','w') as out_file:
for row in all_chords:
print('{};{}'.format(row[0], ','.join(row[1:])), file=out_file)