Supporting JSON paths in BigQuery - json

I was wondering if BigQuery has any additional support to JSON paths, as it seems like this is such a common way to work with nested data in BigQuery. For example, as a few years ago it seemed like the answer was: What JsonPath expressions are supported in BigQuery?, i.e., "Use a UDF".
However, it seems like using a path within an array, such as:
`$..Job'
Is such a common operation given BigQuery's repeated field, that about 70% of the times I've tried to use BigQuery's JSON_EXTRACT, I run into the limitation of having to iterate down an array.
Is this ability supported yet in BigQuery, or are there plans to support it, without having to do a UDF? As nice as something like the following works:
CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING)
RETURNS STRING
LANGUAGE js AS """
try { var parsed = JSON.parse(json);
return JSON.stringify(jsonPath(parsed, json_path));
} catch (e) { return null }
"""
OPTIONS (
library="gs://xx-bq/jsonpath-0.8.0.js"
);
SELECT CUSTOM_JSON_EXTRACT(to_json_string(Occupation), '$..Job'), to_json_string(MovieInfo), json_extract(MovieInfo, '$.Platform') FROM `xx-163219.bqtesting.xx` LIMIT 1000
It ends up taking anywhere between 4-6x longer than a normal JSON_EXTRACT function (2s vs. about 10s). Or, is there something that I'm missing with what you're able to do with JSON objects in BQ?

Currently, the support for JSONPath on BigQuery includes and is limited to $, ., and [], where the latter can be either a child operator or a subscript (array) operator.
Other syntax elements from JSONPath are not supported, but for future reference, there's a public feature request to support complete JSONPath syntax.

Related

How to marshal a predicate from JSON in Prolog?

In Python it is common to marshal objects from JSON. I am seeking similar functionality in Prolog, either swi-prolog or scryer.
For instance, if we have JSON stating
{'predicate':
{'mortal(X)', ':-', 'human(X)'}
}
I'm hoping to find something like load_predicates(j) and have that data immediately consulted. A version of json.dumps() and loads() would also be extremely useful.
EDIT: For clarity, this will allow interoperability with client applications which will be collecting rules from users. That application is probably not in Prolog, but something like React.js.
I agree with the commenters that it would be easier to convert the JSON data to a .pl file in the proper format first and then load that.
However, you can load the predicates from JSON directly, convert them to a representation that Prolog understands, and use assertz to add them to the knowledge base.
If indeed the data contains all the syntax needed for a predicate (as is the case in the example data in the question) then converting the representation is fairly simple as you just need to concatenate the elements of the list into a string and then create a term out of the string. Note that this assumption skips step 2 in the first comment by Guy Coder.
Note that the Prolog JSON library is rather strict in which format it accepts: only double quotes are valid as string delimiters, and lists with singleton values (i.e., not key-value pairs) need to use the notation [a,b,c] instead of {a,b,c}. So first the example data needs to be rewritten:
{"predicate":
["mortal(X)", ":-", "human(X)"]
}
Then you can load it in SWI-Prolog. Minimal working example:
:- use_module(library(http/json)).
% example fact for testing
human(aristotle).
load_predicate(J) :-
% open the file
open(J, read, JSONstream, []),
% parse the JSON data
json_read(JSONstream, json(L)),
% check for an occurrence of the predicate key with value L2
member(predicate=L2, L),
% concatenate the list into a string
atomics_to_string(L2, S),
% create a term from the string
term_string(T, S),
% add to knowledge base
assertz(T).
Example run:
?- consult('mwe.pl').
true.
?- load_predicate('example_predicate.json').
true.
?- mortal(X).
X = aristotle.
Detailed explanation:
The predicate json_read stores the data in the following form:
json([predicate=['mortal(X)', :-, 'human(X)']])
This is a list inside a json term with one element for each key-value pair. The element has the syntax key=value. In the call to json_read you can already strip the json() term and store the list directly in the variable L.
Then member/2 is used to search for the compound term predicate=L2. If you have more than one predicate in the JSON file then you should turn this into a foreach or in a recursive call to process all predicates in the list.
Since the list L2 already contains a syntactically well-formed Prolog predicate it can just be concatenated, turned into a term using term_string/2 and asserted. Note that in case the predicate is not yet in the required format, you can construct a predicate out of the various pieces using built-in predicate manipulation functionality, see https://www.swi-prolog.org/pldoc/doc_for?object=copy_predicate_clauses/2 for some pointers.

How to return an array of JSON objects rather the a collection of rows using JOOQ & PostgreSQL

Having read this post suggesting that it's sometimes a good trade-off to use JSON operators to return JSON directly from the database; I'm exploring this idea using PostgreSQL and JOOQ.
I'm able to write SQL queries returning a JSON array of JSON objects rather than rows using the following pattern:
select jsonb_pretty(array_to_json(array_agg(row_to_json(r)))::jsonb)
from (
select [subquery]
) as r;
However, I failed to translate this pattern in JOOQ.
Any help regarding how to translate a collection of rows (fields being of "usual" SQL type or already mapped as json(b)) using JOOQ would be appreciated.
SQL Server FOR JSON semantics
That's precisely what the SQL Server FOR JSON clause does, which jOOQ supports and which it can emulate for you on other dialects as well:
ctx.select(T.A, T.B)
.from(T)
.forJSON().path()
.fetch();
PostgreSQL native functionality
If you prefer using native functions directly, you will have to do with plain SQL templating for now, as some of these functions aren't yet supported by jOOQ, including:
JSONB_PRETTY (no plans of supporting it yet)
ARRAY_TO_JSON (https://github.com/jOOQ/jOOQ/issues/12841)
ROW_TO_JSON (https://github.com/jOOQ/jOOQ/issues/10685)
It seems quite simple to write a utility that does precisely this:
public static ResultQuery<Record1<JSONB>> json(Select<?> subquery) {
return ctx
.select(field(
"jsonb_pretty(array_to_json(array_agg(row_to_json(r)))::jsonb)",
JSONB.class
))
.from(subquery.asTable("r"))
}
And now, you can execute any query in your desired form:
JSONB result = ctx.fetchValue(json(select(T.A, T.B).from(T)));
Converting between PG arrays and JSON arrays
A note on performance. It seems that you're converting between data types a bit often. Specifically, I'd suggest you avoid aggregating a PostgreSQL array and turning that into a JSON array, but to use JSONB_AGG() directly. I haven't tested this, but it seems to me that the extra data structure seems unnecessary.

Parsing query string in Node to allow logical operators

I would like something similar to what node-odata offers, but I do not want to wrap it around my database (I am using Cassandra and already have an Express app set up with routes, etc).
Currently, I grab data from the database (which will ultimately return a JSON object to the user) and then using the values passed in the query string I modify the results with JavaScript and pass the modified JSON object on through to the user.
I cannot pass in a query string like this http://localhost:3001/getSomeData?name=jim&age=21||eyeColor=red which includes logical operators in the query string, and would grab all data and filter it where the name is "jim", the age is "21" OR eyeColor is "red". So this would give me all Jims that have either eyeColor red and/or age of 21. If I used this age=21&&eyeColor=red I would expect to get all Jims that have BOTH eye color of red and are 21 years old.
I was thinking of using a custom query string that can be passed in (i.e. inclusive=age&inclusive=eyeColor appended at the end of the query string) and in Node, I would modify the filter results to treat these properties (age and eyeColor) as if they were passed in with the || OR operator). However, this is quite verbose, and I was hoping there was a library or another simpler implementation out there that solves this problem, or somehow lets me pass in simple logical operators into the query string.
I ended up using this library to achieve what I wanted: https://www.npmjs.com/package/jspath
It's well document and worked perfectly for my situation.
npm i querystringify //or
https://cdnjs.cloudflare.com/ajax/libs/qs/6.7.0/qs.min.js
//it will will return an object
const myObject = Qs.parse(location.search, {ignoreQueryPrefix: true});
//you can use object destructuring.
const {age,eyeColor}=Qs.parse(location.search, {ignoreQueryPrefix: true})
By default, parsing will include "?" too.
{ignoreQueryPrefix: true} this option will omit "?".

MongoDB not returning a proper JSON

When I run db.collection.explain().find(), it gives the following error;
The last field in this json object has a double quote problem: `"totalChildMillis" : NumberLong(2)`.
When I parse this object, I got an exception saying that NumberLong(2) should be double quoted. Is there a way for MongoDB returns a standard JSON object?
{
"executionStages":{
"stage": "SINGLE_SHARD",
"nReturned": 10000,
"executionTimeMillis": 3,
"totalKeysExamined": 0,
"totalDocsExamined": 10000,
"totalChildMillis": NumberLong(2)
}
}
EDIT1
I am currently using Javascript NodeJS to create a sub-process of a mongo-shell. And send explain command to that process and listen on its output. Once I got the output, I need to parse it to a javascript object by JSON.parse() method. Based on this use case, what is the easier way for me to adapt mongo json extension to be a standard javascript object?
See the docs on MongoDB Extended JSON. Basically it comes down to the fact that MongoDB extends JSON to add additional datatypes that JSON does not support. In order to preserve that type information, various tools use either "strict" mode (which confirms to the JSON RFC) or "mongo shell" mode, which uses notations like NumberLong() and ISODate() to represent datatypes and is generally not parseable using a JSON parser.
Depending on what you're doing, you can use mongoexport which has an option to output in strict mode. But if you're trying to evaluate the explain plan of a query, I don't think that's going to work unless you insert the explain plan into a temp collection and then mongoexport it out.
Your best bet is to do whatever scripting you're trying to accomplish using a programming language (e.g. Java, Perl, Python, C#, etc.) and one of the corresponding MongoDB drivers. There you'll have much more flexibility and power with how you retrieve and parse data.
Since you mentioned in your edit that you're using Node.js, you can use the explain option to get the explain output directly from Node without having to spawn a sub-process.
Here's a very basic example:
var url = 'mongodb://localhost:27017/test';
var MongoClient = require('mongodb').MongoClient;
MongoClient.connect(url, function(err, db) {
assert.equal(null, err);
var collection = db.collection('test');
collection.find({}, {explain:true}).each(function(err, doc) {
if(doc != null)
console.dir(doc);
});
});

Is it valid to define functions in JSON results?

Part of a website's JSON response had this (... added for context):
{..., now:function(){return(new Date).getTime()}, ...}
Is adding anonymous functions to JSON valid? I would expect each time you access 'time' to return a different value.
No.
JSON is purely meant to be a data description language. As noted on http://www.json.org, it is a "lightweight data-interchange format." - not a programming language.
Per http://en.wikipedia.org/wiki/JSON, the "basic types" supported are:
Number (integer, real, or floating
point)
String (double-quoted Unicode
with backslash escaping)
Boolean
(true and false)
Array (an ordered
sequence of values, comma-separated
and enclosed in square brackets)
Object (collection of key:value
pairs, comma-separated and enclosed
in curly braces)
null
The problem is that JSON as a data definition language evolved out of JSON as a JavaScript Object Notation. Since Javascript supports eval on JSON, it is legitimate to put JSON code inside JSON (in that use-case). If you're using JSON to pass data remotely, then I would say it is bad practice to put methods in the JSON because you may not have modeled your client-server interaction well. And, further, when wishing to use JSON as a data description language I would say you could get yourself into trouble by embedding methods because some JSON parsers were written with only data description in mind and may not support method definitions in the structure.
Wikipedia JSON entry makes a good case for not including methods in JSON, citing security concerns:
Unless you absolutely trust the source of the text, and you have a need to parse and accept text that is not strictly JSON compliant, you should avoid eval() and use JSON.parse() or another JSON specific parser instead. A JSON parser will recognize only JSON text and will reject other text, which could contain malevolent JavaScript. In browsers that provide native JSON support, JSON parsers are also much faster than eval. It is expected that native JSON support will be included in the next ECMAScript standard.
Let's quote one of the spec's - https://www.rfc-editor.org/rfc/rfc7159#section-12
The The JavaScript Object Notation (JSON) Data Interchange Format Specification states:
JSON is a subset of JavaScript but excludes assignment and invocation.
Since JSON's syntax is borrowed from JavaScript, it is possible to
use that language's "eval()" function to parse JSON texts. This
generally constitutes an unacceptable security risk, since the text
could contain executable code along with data declarations. The same
consideration applies to the use of eval()-like functions in any
other programming language in which JSON texts conform to that
language's syntax.
So all answers which state, that functions are not part of the JSON standard are correct.
The official answer is: No, it is not valid to define functions in JSON results!
The answer could be yes, because "code is data" and "data is code".
Even if JSON is used as a language independent data serialization format, a tunneling of "code" through other types will work.
A JSON string might be used to pass a JS function to the client-side browser for execution.
[{"data":[["1","2"],["3","4"]],"aFunction":"function(){return \"foo bar\";}"}]
This leads to question's like: How to "https://stackoverflow.com/questions/939326/execute-javascript-code-stored-as-a-string".
Be prepared, to raise your "eval() is evil" flag and stick your "do not tunnel functions through JSON" flag next to it.
It is not standard as far as I know. A quick look at http://json.org/ confirms this.
Nope, definitely not.
If you use a decent JSON serializer, it won't let you serialize a function like that. It's a valid OBJECT, but not valid JSON. Whatever that website's intent, it's not sending valid JSON.
JSON explicitly excludes functions because it isn't meant to be a JavaScript-only data
structure (despite the JS in the name).
A short answer is NO...
JSON is a text format that is completely language independent but uses
conventions that are familiar to programmers of the C-family of
languages, including C, C++, C#, Java, JavaScript, Perl, Python, and
many others. These properties make JSON an ideal data-interchange
language.
Look at the reason why:
When exchanging data between a browser and a server, the data can only
be text.
JSON is text, and we can convert any JavaScript object into JSON, and
send JSON to the server.
We can also convert any JSON received from the server into JavaScript
objects.
This way we can work with the data as JavaScript objects, with no
complicated parsing and translations.
But wait...
There is still ways to store your function, it's widely not recommended to that, but still possible:
We said, you can save a string... how about converting your function to a string then?
const data = {func: '()=>"a FUNC"'};
Then you can stringify data using JSON.stringify(data) and then using JSON.parse to parse it (if this step needed)...
And eval to execute a string function (before doing that, just let you know using eval widely not recommended):
eval(data.func)(); //return "a FUNC"
Via using NodeJS (commonJS syntax) I was able to get this type of functionality working, I originally had just a JSON structure inside some external JS file, but I wanted that structure to be more of a Class, with methods that could be decided at run time.
The declaration of 'Executor' in myJSON is not required.
var myJSON = {
"Hello": "World",
"Executor": ""
}
module.exports = {
init: () => { return { ...myJSON, "Executor": (first, last) => { return first + last } } }
}
Function expressions in the JSON are completely possible, just do not forget to wrap it in double quotes. Here is an example taken from noSQL database design:
{
"_id": "_design/testdb",
"views": {
"byName": {
"map": "function(doc){if(doc.name){emit(doc.name,doc.code)}}"
}
}
}
although eval is not recommended, this works:
<!DOCTYPE html>
<html>
<body>
<h2>Convert a string written in JSON format, into a JavaScript function.</h2>
<p id="demo"></p>
<script>
function test(val){return val + " it's OK;}
var someVar = "yup";
var myObj = { "func": "test(someVar);" };
document.getElementById("demo").innerHTML = eval(myObj.func);
</script>
</body>
</html>
Leave the quotes off...
var a = {"b":function(){alert('hello world');} };
a.b();