Splitting Json array values to k/v and sequencing - json

So I managed to split the following data to k/v pairs
"tags": [
"category--Cola",
"sugar--3.000000",
"barcode--cola001",
"barcode--cola001_1",
"language--en",
"sku--cola_classic",
"sku--cola_cherry",
],
like so...
t = product['tags']
t_filtered = [k for k in t if '--' in k]
product['tags'] = dict(s.split('--') for s in t_filtered)
I want the output to be something like this
{
"category": [Cola],
"sugar":[3.0],
"barcode":[cola001,cola001_1],
"language":[en],
"sku": [cola_classic, cola_cherry],
}
so I tried this... (ref: https://docs.python.org/3/library/collections.html#collections.defaultdict)
product['tags'] = dict(s.split('--') for s in t_filtered)
s = product['tags']
d = {}
for k, v in s:
d.setdefault(k, []).append(v)
print(d)
but getting this error:
ValueError: too many values to unpack (expected 2)
Also, just to verify s is a <classic 'dict'> so I can't figure out the issue.

Related

Pulling specific Parent/Child JSON data with Python

I'm having a difficult time figuring out how to pull specific information from a json file.
So far I have this:
# Import json library
import json
# Open json database file
with open('jsondatabase.json', 'r') as f:
data = json.load(f)
# assign variables from json data and convert to usable information
identifier = data['ID']
identifier = str(identifier)
name = data['name']
name = str(name)
# Collect data from user to compare with data in json file
print("Please enter your numerical identifier and name: ")
user_id = input("Numerical identifier: ")
user_name = input("Name: ")
if user_id == identifier and user_name == name:
print("Your inputs matched. Congrats.")
else:
print("Your inputs did not match our data. Please try again.")
And that works great for a simple JSON file like this:
{
"ID": "123",
"name": "Bobby"
}
But ideally I need to create a more complex JSON file and can't find deeper information on how to pull specific information from something like this:
{
"Parent": [
{
"Parent_1": [
{
"Name": "Bobby",
"ID": "123"
}
],
"Parent_2": [
{
"Name": "Linda",
"ID": "321"
}
]
}
]
}
Here is an example that you might be able to pick apart.
You could either:
Make a custom de-jsonify object_hook as shown below and do something with it. There is a good tutorial here.
Just gobble up the whole dictionary that you get without a custom de-jsonify and drill down into it and make a list or set of the results. (not shown)
Example:
import json
from collections import namedtuple
data = '''
{
"Parents":
[
{
"Name": "Bobby",
"ID": "123"
},
{
"Name": "Linda",
"ID": "321"
}
]
}
'''
Parent = namedtuple('Parent', ['name', 'id'])
def dejsonify(json_str: dict):
if json_str.get("Name"):
parent = Parent(json_str.get('Name'), int(json_str.get('ID')))
return parent
return json_str
res = json.loads(data, object_hook=dejsonify)
print(res)
# then we can do whatever... if you need lookups by name/id,
# we could put the result into a dictionary
all_parents = {(p.name, p.id) : p for p in res['Parents']}
lookup_from_input = ('Bobby', 123)
print(f'found match: {all_parents.get(lookup_from_input)}')
Result:
{'Parents': [Parent(name='Bobby', id=123), Parent(name='Linda', id=321)]}
found match: Parent(name='Bobby', id=123)

Auto-generate json paths from given json structure using Python

I am trying to generate auto json paths from given json structure but stuck in the programatic part. Can someone please help out with the idea to take it further?
Below is the code so far i have achieved.
def iterate_dict(dict_data, key, tmp_key):
for k, v in dict_data.items():
key = key + tmp_key + '.' + k
key = key.replace('$$', '$')
if type(v) is dict:
tmp_key = key
key = '$'
iterate_dict(v, key, tmp_key)
elif type(v) is list:
str_encountered = False
for i in v:
if type(i) is str:
str_encountered = True
tmp_key = key
break
tmp_key = key
key = '$'
iterate_dict(i, key, tmp_key)
if str_encountered:
print(key, v)
if tmp_key is not None:
tmp_key = str(tmp_key)[:-str(k).__len__() - 1]
key = '$'
else:
print(key, v)
key = '$'
import json
iterate_dict_new(dict(json.loads(d_data)), '$', '')
consider the below json structure
{
"id": "1",
"categories": [
{
"name": "author",
"book": "fiction",
"leaders": [
{
"ref": ["wiki", "google"],
"athlete": {
"$ref": "some data"
},
"data": {
"$data": "some other data"
}
}
]
},
{
"name": "dummy name"
}
]
}
Expected output out of python script:
$id = 1
$categories[0].name = author
$categories[0].book = fiction
$categories[0].leaders[0].ref[0] = wiki
$categories[0].leaders[0].ref[1] = google
$categories[0].leaders[0].athlete.$ref = some data
$categories[0].leaders[0].data.$data = some other data
$categories[1].name = dummy name
Current output with above python script:
$.id 1
$$.categories.name author
$$.categories.book fiction
$$$.categories.leaders.ref ["wiki", "google"]
$$$$$.categories.leaders.athlete.$ref some data
$$$$$$.categories.leaders.athlete.data.$data some other data
$$.name dummy name
The following recursive function is similar to yours, but instead of just displaying a dictionary, it can also take a list. This means that if you passed in a dictionary where one of the values was a nested list, then the output would still be correct (printing things like dict.key[3][4] = element).
def disp_paths(it, p='$'):
for k, v in (it.items() if type(it) is dict else enumerate(it)):
if type(v) is dict:
disp_paths(v, '{}.{}'.format(p, k))
elif type(v) is list:
for i, e in enumerate(v):
if type(e) is dict or type(e) is list:
disp_paths(e, '{}.{}[{}]'.format(p, k, i))
else:
print('{}.{}[{}] = {}'.format(p, k, i, e))
else:
f = '{}.{} = {}' if type(it) is dict else '{}[{}] = {}'
print(f.format(p, k, v))
which, when ran with your dictionary (disp_paths(d)), gives the expected output of:
$.categories[0].leaders[0].athlete.$ref = some data
$.categories[0].leaders[0].data.$data = some other data
$.categories[0].leaders[0].ref[0] = wiki
$.categories[0].leaders[0].ref[1] = google
$.categories[0].book = fiction
$.categories[0].name = author
$.categories[1].name = dummy name
$.id = 1
Note that this is unfortunately not ordered, but that is unavoidable as dictionaries have no inherent order (they are just sets of key:value pairs)
If you need help understanding my modifications, just drop a comment!

Livy Server: return a dataframe as JSON?

I am executing a statement in Livy Server using HTTP POST call to localhost:8998/sessions/0/statements, with the following body
{
"code": "spark.sql(\"select * from test_table limit 10\")"
}
I would like an answer in the following format
(...)
"data": {
"application/json": "[
{"id": "123", "init_date": 1481649345, ...},
{"id": "133", "init_date": 1481649333, ...},
{"id": "155", "init_date": 1481642153, ...},
]"
}
(...)
but what I'm getting is
(...)
"data": {
"text/plain": "res0: org.apache.spark.sql.DataFrame = [id: string, init_date: timestamp ... 64 more fields]"
}
(...)
Which is the toString() version of the dataframe.
Is there some way to return a dataframe as JSON using the Livy Server?
EDIT
Found a JIRA issue that addresses the problem: https://issues.cloudera.org/browse/LIVY-72
By the comments one can say that Livy does not and will not support such feature?
I recommend using the built-in (albeit hard to find documentation for) magics %json and %table:
%json
session_url = host + "/sessions/1"
statements_url = session_url + '/statements'
data = {
'code': textwrap.dedent("""\
val d = spark.sql("SELECT COUNT(DISTINCT food_item) FROM food_item_tbl")
val e = d.collect
%json e
""")}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()
%table
session_url = host + "/sessions/21"
statements_url = session_url + '/statements'
data = {
'code': textwrap.dedent("""\
val x = List((1, "a", 0.12), (3, "b", 0.63))
%table x
""")}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()
Related: Apache Livy: query Spark SQL via REST: possible?
I don't have a lot of experience with Livy, but as far as I know this endpoint is used as an interactive shell and the output will be a string with the actual result that would be shown by a shell. So, with that in mind, I can think of a way to emulate the result you want, but It may not be the best way to do it:
{
"code": "println(spark.sql(\"select * from test_table limit 10\").toJSON.collect.mkString(\"[\", \",\", \"]\"))"
}
Then, you will have a JSON wrapped in a string, so your client could parse it.
I think in general your best bet is to write your output to a database of some kind. If you write to a randomly named table, you could have your code read it after the script is done.

Scala 2.10: Array + JSON arrays to hashmap

After reading a JSON result from a web service response:
val jsonResult: JsValue = Json.parse(response.body)
Containing content something like:
{
result: [
["Name 1", "Row1 Val1", "Row1 Val2"],
["Name 2", "Row2 Val1", "Row2 Val2"]
]
}
How can I efficiently map the contents of the result array in the JSON with a list (or something similar) like:
val keys = List("Name", "Val1", "Val2")
To get an array of hashmaps?
Something like this ?
This solution is functional and handles None/Failure cases "properly" (by returning a None)
val j = JSON.parseFull( json ).asInstanceOf[ Option[ Map[ String, List[ List[ String ] ] ] ] ]
val res = j.map { m ⇒
val r = m get "result"
r.map { ll ⇒
ll.foldRight( List(): List[ Map[ String, String ] ] ) { ( l, acc ) ⇒
Map( ( "Name" -> l( 0 ) ), ( "Val1" -> l( 1 ) ), ( "Val2" -> l( 2 ) ) ) :: acc
}
}.getOrElse(None)
}.getOrElse(None)
Note 1: I had to put double quotes around result in the JSON String to get the JSON parser to work
Note 2: the code could look nicer using more "monadic" sugar such as for comprehensions or using applicative functors

Using RJSONIO and AsIs class

I am writing some helper functions to convert my R variables to JSON. I've come across this problem: I would like my values to be represented as JSON arrays, this can be done using the AsIs class according to the RJSONIO documentation.
x = "HELLO"
toJSON(list(x = I(x)), collapse="")
"{ \"x\": [ \"HELLO\" ] }"
But say we have a list
y = list(a = "HELLO", b = "WORLD")
toJSON(list(y = I(y)), collapse="")
"{ \"y\": {\n \"a\": \"HELLO\",\n\"b\": \"WORLD\" \n} }"
The value found in y -> a is NOT represented as an array. Ideally I would have
"{ \"y\": [{\n \"a\": \"HELLO\",\n\"b\": \"WORLD\" \n}] }"
Note the square brackets. Also I would like to get rid of all "\n"s, but collapse does not eliminate the line breaks in nested JSON. Any ideas?
try writing as
y = list(list(a = "HELLO", b = "WORLD"))
test<-toJSON(list(y = I(y)), collapse="")
when you write to file it appears as:
{ "y": [
{
"a": "HELLO",
"b": "WORLD"
}
] }
I guess you could remove the \n as
test<-gsub("\n","",test)
or use RJSON package
> rjson::toJSON(list(y = I(y)))
[1] "{\"y\":[{\"a\":\"HELLO\",\"b\":\"WORLD\"}]}"
The reason
> names(list(a = "HELLO", b = "WORLD"))
[1] "a" "b"
> names(list(list(a = "HELLO", b = "WORLD")))
NULL
examining the rjson::toJSON you will find this snippet of code
if (!is.null(names(x)))
return(toJSON(as.list(x)))
str = "["
so it would appear to need an unnamed list to treat it as a JSON array. Maybe RJSONIO is similar.