Convert plain JSON to SWI-Prolog Dict - json

I have a simple rule in SWI-Prolog which I want to implement in an AWS Lambda function.
I will receive an Event in the following json form:
{
"key1": "value1",
"key2": "value2",
"key3": "value3"
}
My problem is that I can only read from atom-like arrays or json files but not plain json in a compound form.
What I would like to do is something like this:
lambda_handler(Event, Context, Response) :-
atom_json_dict(Event, Dict, []),
my_simple_rule(Dict.key1, Dict.key2, Dict.key3),
Response = '{"result": "yes"}'.
my_simple_rule is a condition which returns true or false depending on the values passed.
What I've tried so far does not work because SWI-Prolog expects either a Stream o a String when using atom_json_term/3, json_read/2,3 or json_read_dict/2,3.
I also tried to force the JSON into a string this way:
format(atom(A), "~w", {"key1": "value1", "key2": "value2", "key3":"value3"}).
Expecting this so that I can then convert it to a Term (Prolog dict):
{"key1": "value1", "key2": "value2", "key3":"value3"}
But the result is the following:
'{key1:value1,key2:value2,key3:value3}'
Which fails.
Does any one know how I can use a plain JSON within Prolog?

Event it's already a structured term, so here is an 'ad hoc' adapter.
Let's say we have a file j2d.pl containing
:- module(j2d,
[ j2d/2
]).
j2d(Event,Dict) :-
Event={CommaSequence},
l2d(CommaSequence,_{},Dict).
l2d((A,R),D,U) :- !, updd(A,D,V), l2d(R,V,U).
l2d(A,D,U) :- updd(A,D,U).
updd(K:V,D,U) :- atom_string(A,K), put_dict(A,D,V,U).
then it's possible to test the code from the SWI-Prolog console:
?- use_module(j2d).
true.
?- Event={
"key1": "value1",
"key2": "value2",
"key3": "value3"
}.
Event = {"key1":"value1", "key2":"value2", "key3":"value3"}.
?- j2d($Event,Dict).
Dict = _14542{key1:"value1", key2:"value2", key3:"value3"},
Event = {"key1":"value1", "key2":"value2", "key3":"value3"}.
The unusual $Event syntax it's an utility of the console (a.k.a REPL), that replaces the variable Event with its last value (a.k.a binding).
Your code could become
:- use_module(j2d).
lambda_handler(Event, Context, Response) :-
j2d(Event,Dict),
my_simple_rule(Dict.key1, Dict.key2, Dict.key3),
Response = '{"result": "yes"}'.

So... I was wrong about how SWI-Prolog handles the Event json in Lambda, so I will post my findings here in case someone encounters a similar challenge.
At first I though that the Event json arrived like this: {"key1": 1, "key2": 2, "key3": 3}.
However, it looks a little more like this: json([key1=1, key2=2, key3=3]. Which makes the parsing task different.
To solve it I used the following code which I hope will be self explanatory:
:- use_module(library(http/json)).
:- use_module(library(http/json_convert)).
% use this to test locally. If it works this way i should work in lambda:
handler(json([key1=10, key2=2, key3=3]), _, Response).
% Function handler
handler(json(Event), _, Response) :-
json_to_prolog(json(Event), Json_term), % I transform the event json into a prolog term
atom_json_term(A, Json_term, []), % I convert the JSON_term into an atom
atom_json_dict(A, Dicty, []), % I convert the atom into a dict
my_simple_function(Dicty.key1, Dicty.key2, Dicty.key3, Result), % Function evaluation
json_to_prolog(json([result_key1="some_message", result_key2=Result]), Result_json), % Transform json term into actual json
atom_json_term(Response, Result_json, []). % Transform json into atom so that lambda does not complain
my_simple_function(N1, N2, N3, Result) :-
Result is N1 + N2 + N3.
The input needed (your test json in lambda) would be:
{
"key1": 1,
"key2": 2,
"key3": 3
}
While the output should look like this:
{
"result_key1": "some_message",
"result_key2": 6
}
I hope this works as a template to use SWI-Prolog on AWS.
By the way, I recommend you take a look to this repository to make your custom Prolog runtime for Lambda.

Related

How to make sure that a jsonb object is not an encoded string?

I have found sometimes a jsonb object:
{"a": 1, "b": 2}
will get re-encoded and stored as a jsonb string:
"{\"a\": 1, \"b\": 2}"
is there a way to write a function that will reparse the string when input is not a jsonb object?
The #>> operator (Extracts JSON sub-object at the specified path as text) does the job:
select ('"{\"a\": 1, \"b\": 2}"'::jsonb #>> '{}')::jsonb
This operator behavior is not officially documented. It appears to be a side effect of its underlying function. Oddly enough, its twin operator #> doesn't work that way, though it would be even more logical. It's probably worth asking Postgres developers to solve this, preferably by adding a new decoding function. While waiting for a system solution, you can define a simple SQL function to make queries clearer in cases where the problem occurs frequently.
create or replace function jsonb_unescape(text)
returns jsonb language sql immutable as $$
select ($1::jsonb #>> '{}')::jsonb
$$;
Note that the function works well both on escaped and plain strings:
with my_data(str) as (
values
('{"a": 1, "b": 2}'),
('"{\"a\": 1, \"b\": 2}"')
)
select str, jsonb_unescape(str)
from my_data;
str | jsonb_unescape
------------------------+------------------
{"a": 1, "b": 2} | {"a": 1, "b": 2}
"{\"a\": 1, \"b\": 2}" | {"a": 1, "b": 2}
(2 rows)

Reading array fields in Spark 2.2

Suppose you have a bunch of data whose rows look like this:
{
'key': [
{'key1': 'value11', 'key2': 'value21'},
{'key1': 'value12', 'key2': 'value22'}
]
}
I would like to read this into a Spark Dataset. One way to do it is as follows:
case class ObjOfLists(k1: List[String], k2: List[String])
case class Data(k: ObjOfLists)
Then you can do:
sparkSession.read.json(pathToData).select(
struct($"key.key1" as "k1", $"key.key2" as "k2") as "k"
)
.as[Data]
This works fine, but it kind of butchers the data a little bit; after all in the data 'key' points to a list of objects rather than an object of lists. In other words, what I really want is:
case class Obj(k1: String, k2: String)
case class DataOfList(k: List[Obj])
My question: is there some other syntax I can put in select which allows the resulting Dataframe to be converted to a Dataset[DataOfList]?
I tried using the same select syntax as above, and got:
Exception in thread "main" org.apache.spark.sql.AnalysisException: need an array field but got struct<k1:array<string>,k2:array<string>>;
So I also tried:
sparkSession.read.json(pathToData).select(
array(struct($"key.key1" as "k1", $"key.key2" as "k2")) as "k"
)
.as[DataOfList]
This compiles and runs, but the data looks like this:
DataOfList(List(Obj(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData#bb2a5516,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData#bec5e4a7)))
Any other ideas?
Just recast data to reflect expected names:
case class Obj(k1: String, k2: String)
case class DataOfList(k: Seq[Obj])
val text = Seq("""{
"key": [
{"key1": "value11", "key2": "value21"},
{"key1": "value12", "key2": "value22"}
]
}""").toDS
val df = spark.read.json(text)
df
.select($"key".cast("array<struct<k1:string,k2:string>>").as("k"))
.as[DataOfList]
.first
DataOfList(List(Obj(value11,value21), Obj(value12,value22)))
With extraneous objects you define schema on read:
val textExtended = Seq("""{
"key": [
{"key0": "value01", "key1": "value11", "key2": "value21"},
{"key1": "value12", "key2": "value22", "key3": "value32"}
]
}""").toDS
val schemaSubset = StructType(Seq(StructField("key", ArrayType(StructType(Seq(
StructField("key1", StringType),
StructField("key2", StringType))))
)))
val df = spark.read.schema(schemaSubset).json(textExtended)
and proceed as before.

Livy Server: return a dataframe as JSON?

I am executing a statement in Livy Server using HTTP POST call to localhost:8998/sessions/0/statements, with the following body
{
"code": "spark.sql(\"select * from test_table limit 10\")"
}
I would like an answer in the following format
(...)
"data": {
"application/json": "[
{"id": "123", "init_date": 1481649345, ...},
{"id": "133", "init_date": 1481649333, ...},
{"id": "155", "init_date": 1481642153, ...},
]"
}
(...)
but what I'm getting is
(...)
"data": {
"text/plain": "res0: org.apache.spark.sql.DataFrame = [id: string, init_date: timestamp ... 64 more fields]"
}
(...)
Which is the toString() version of the dataframe.
Is there some way to return a dataframe as JSON using the Livy Server?
EDIT
Found a JIRA issue that addresses the problem: https://issues.cloudera.org/browse/LIVY-72
By the comments one can say that Livy does not and will not support such feature?
I recommend using the built-in (albeit hard to find documentation for) magics %json and %table:
%json
session_url = host + "/sessions/1"
statements_url = session_url + '/statements'
data = {
'code': textwrap.dedent("""\
val d = spark.sql("SELECT COUNT(DISTINCT food_item) FROM food_item_tbl")
val e = d.collect
%json e
""")}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()
%table
session_url = host + "/sessions/21"
statements_url = session_url + '/statements'
data = {
'code': textwrap.dedent("""\
val x = List((1, "a", 0.12), (3, "b", 0.63))
%table x
""")}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()
Related: Apache Livy: query Spark SQL via REST: possible?
I don't have a lot of experience with Livy, but as far as I know this endpoint is used as an interactive shell and the output will be a string with the actual result that would be shown by a shell. So, with that in mind, I can think of a way to emulate the result you want, but It may not be the best way to do it:
{
"code": "println(spark.sql(\"select * from test_table limit 10\").toJSON.collect.mkString(\"[\", \",\", \"]\"))"
}
Then, you will have a JSON wrapped in a string, so your client could parse it.
I think in general your best bet is to write your output to a database of some kind. If you write to a randomly named table, you could have your code read it after the script is done.

Look for JSON example with all allowed combinations of structure in max depth 2 or 3

I've wrote a program which process JSON objects. Now I want to verify if I've missed something.
Is there an JSON-example of all allowed JSON structure combinations? Something like this:
{
"key1" : "value",
"key2" : 1,
"key3" : {"key1" : "value"},
"key4" : [
[
"string1",
"string2"
],
[
1,
2
],
...
],
"key5" : true,
"key6" : false,
"key7" : null,
...
}
As you can see at http://json.org/ on the right hand side the grammar of JSON isn't quite difficult, but I've got several exceptions because I've forgotten to handles some structure combinations which are possible. E.g. inside an array there can be "string, number, object, array, true, false, null" but my program couldn't handle arrays inside an array until I ran into an exception. So everything was fine until I got this valid JSON object with arrays inside an array.
I want to test my program with a JSON object (which I'm looking for). After this test I want to be feel certain that my program handle every possible valid JSON structure on earth without an exception.
I don't need nesting in depth 5 or so. I only need something in nested depth 2 or max 3. With all base types which nested all allowed base types, inside this base type.
Have you thought of escaped characters and objects within an object?
{
"key1" : {
"key1" : "value",
"key2" : [
"String1",
"String2"
],
},
"key2" : "\"This is a quote\"",
"key3" : "This contains an escaped slash: \\",
"key4" : "This contains accent charachters: \u00eb \u00ef",
}
Note: \u00eb and \u00ef are resp. charachters ë and ï
Choose a programming language that support json.
Try to load your json, on fail the exception's message is descriptive.
Example:
Python:
import json, sys;
json.loads(open(sys.argv[1]).read())
Generate:
import random, json, os, string
def json_null(depth = 0):
return None
def json_int(depth = 0):
return random.randint(-999, 999)
def json_float(depth = 0):
return random.uniform(-999, 999)
def json_string(depth = 0):
return ''.join(random.sample(string.printable, random.randrange(10, 40)))
def json_bool(depth = 0):
return random.randint(0, 1) == 1
def json_list(depth):
lst = []
if depth:
for i in range(random.randrange(8)):
lst.append(gen_json(random.randrange(depth)))
return lst
def json_object(depth):
obj = {}
if depth:
for i in range(random.randrange(8)):
obj[json_string()] = gen_json(random.randrange(depth))
return obj
def gen_json(depth = 8):
if depth:
return random.choice([json_list, json_object])(depth)
else:
return random.choice([json_null, json_int, json_float, json_string, json_bool])(depth)
print(json.dumps(gen_json(), indent = 2))

Find if nested key exists in json python

In the following JSON response, what's the proper way to check if the nested key "C" exists in python 2.7?
{
"A": {
"B": {
"C": {"D": "yes"}
}
}
}
one line JSON
{ "A": { "B": { "C": {"D": "yes"} } } }
This is an old question with accepted answer, but I would do this using nested if statements instead.
import json
json = json.loads('{ "A": { "B": { "C": {"D": "yes"} } } }')
if 'A' in json:
if 'B' in json['A']:
if 'C' in json['A']['B']:
print(json['A']['B']['C']) #or whatever you want to do
or if you know that you always have 'A' and 'B':
import json
json = json.loads('{ "A": { "B": { "C": {"D": "yes"} } } }')
if 'C' in json['A']['B']:
print(json['A']['B']['C']) #or whatever
Use the json module to parse the input. Then within a try statement try to retrieve key "A" from the parsed input then key "B" from the result and then key "C" from that result. If an error gets thrown the nested "C" does not exists
An quite easy and comfortable way is to use the package python-benedict with full keypath support. Therefore, cast your existing dict d with the function benedict():
d = benedict(d)
Now your dict has full key path support and you can check if the key exists in the pythonic way, using the in operator:
if 'mainsnak.datavalue.value.numeric-id' in d:
# do something
Please find here the complete documentation.
I used a simple recursive solution:
def check_exists(exp, value):
# For the case that we have an empty element
if exp is None:
return False
# Check existence of the first key
if value[0] in exp:
# if this is the last key in the list, then no need to look further
if len(value) == 1:
return True
else:
next_value = value[1:len(value)]
return check_exists(exp[value[0]], next_value)
else:
return False
To use this code, just set the nested key in an array of strings, for example:
rc = check_exists(json, ["A", "B", "C", "D"])