Is XQuery 3.1 designed for advanced JSON editing? - json

XQuery 3.1 introduced several JSON functions. I was wondering if these functions were designed with advanced JSON editing in mind.
As far as I can tell, these functions only work for simple JSONs, like for instance...
let $json:={"a":1,"b":2} return map:put($json,"c",3)
{
"a": 1,
"b": 2,
"c": 3
}
and
let $json:={"a":1,"b":2,"c":3} return map:remove($json,"c")
{
"a": 1,
"b": 2
}
The moment the JSON gets a bit more complex:
let $json:={"a":{"x":1,"y":2},"b":2} return map:put($json?a,"z",3)
{
"x": 1,
"y": 2,
"z": 3
}
let $json:={"a":{"x":1,"y":2,"z":3},"b":2} return map:remove($json?a,"z")
{
"x": 1,
"y": 2
}
Obviously map:put() and map:remove() do exactly what you tell them to do; select the "a"-object and add or remove an attribute.
However, when I want to edit a JSON document, I'd like to edit the entire document. And as far as I know that's not possible with the current implementation. Or is it? At least something like map:put($json,$json?a?z,3) or map:remove($json,$json?a?z) doesn't work.
For the removal of the "z"-attribute I did come up with a custom recursive function (which only works in this particular use-case)...
declare function local:remove($map,$key){
if ($map instance of object()) then
map:merge(
map:keys($map)[.!=$key] ! map:entry(.,local:remove($map(.),$key))
)
else
$map
};
let $json:={"a":{"x":1,"y":2,"z":3},"b":2} return
local:remove($json,"z")
...with the expected output...
{
"a": {
"x": 1,
"y": 2
},
"b": 2
}
...but I wasn't able to create a custom "add"-function.
I imagine advanced JSON editing can be done with some pretty advanced custom functions, but instead I would very much like to see that something like map:put($json,$json?a?z,3) would work, or otherwise an extra option which lets map:put() put out the entire JSON document, like map:put($json?a?z,3, <extra-option> ).
Or... I'd have to settle with the notion that XQuery isn't the right choice of course.

You're correct that doing what I call deep update of a map is quite difficult with XQuery 3.1 (and indeed XSLT 3.0) as currently defined. And it's not easy to define language constructs with clean semantics. I attempted to design a construct as an XSLT extension instruction - see https://saxonica.com/documentation10/index.html#!extensions/instructions/deep-update -- but I don't think its anywhere near a perfect solution.

I wanted the same thing, so wrote my own surrogate XSLT functions, tan:map-put() and tan:map-remove(), which do deep map replacement and removal:
https://github.com/Arithmeticus/XML-Pantry/tree/master/maps-and-arrays
These can be incorporated in an XSLT workflow via xsl:include or xsl:import, or in an XQuery one via fn:transform(). Some of the other functions may be useful, too. If these functions don't do exactly what you want, they might catalyze your own variation.

In XQuery 3.1, you are supposed to write a recursive function for such things. You could put all you functions in a module file, and then load the module when you need them...
Besides that, Xidel has an object editing extension from before JSONiq and XPath 3.1. For a global mutable variable (without let), you can write:
$json:={"a":{"x":1,"y":2,"z":3},"b":2},
(($json).a).z := 4
$json:={"a":{"x":1,"y":2,"z":3},"b":2},
$json("a")("z") := 4

#comment119588453_67652693 by #ChristianGrün:
If updates are required, we tend to convert JSON data to XML.
I'm a xidel user and last week (with a little help from #BeniBela) I've had a look at whether this could be done with json-to-xml(), Xidel's own x-replace-nodes() and xml-to-json(). The answer is yes. Thanks for the hint.
For reference and for anyone interested, here's 1 example.
To change key "c" in {"x":{"a":1,"b":2,"c":3},"y":2} to "d":
$ xidel -s '{"x":{"a":1,"b":2,"c":3},"y":2}' -e '
xml-to-json(
x:replace-nodes(
json-to-xml(
serialize($json,{"method":"json"})
)//fn:map[#key="x"]/fn:number[#key="c"]/#key,
attribute key {"d"}
)
)
'
{"x":{"a":1,"b":2,"d":3},"y":2}
Xidel online tester.

Related

Deserialize JSON without knowing full structure

I'm redoing the backend of a very basic framework that connects to a completely customizable frontend. It was originally in PHP but for the refactor have been plodding away in F#. Although it seems like PHP might be the more suited language. But people keep telling me you can do everything in F# and I like the syntax and need to learn and this seemingly simple project has me stumped when it comes to JSON. This is a further fleshed out version of my question yesterday, but it got alot more complex than I thought.
Here goes.
The frontend is basically a collection of HTML files, which are simply loaded in PHP and preg_replace() is used to replace things like [var: varName] or [var: array|key] or the troublesome one: [lang: hello]. That needs to be replaced by a variable defined in a translation dictionary, which is stored as JSON which is also editable by a non-programmer.
I can't change the frontend or the JSON files, and both are designed to be edited by non-programmers so it is very likely that there will be errors, calls to language variables that don't exist etc.
So we might have 2 json files, english.json and french.json
english.json contains:
{
"hello":"Hello",
"bye":"Goodbye"
}
french.json:
{
"hello": "Bonjour",
"duck": "Canard"
//Plus users can add whatever else they want here and expect to be able to use it in a template
}
There is a template that contains
<b>[lang: hello]</b>
<span>Favourite Animal: [lang:duck]</span>
In this case, if the language is set to "english" and english.json is being loaded, that should read:
<b>Hello</b>
<span>Favourite Animal: </span>
Or in French:
<b>Bonjour</b>
<span>Favourite Animal: Canard</span>
We can assume that the json format key: value is always string:string but ideally I'd like to handle string: 'T as well but that might be beyond the scope of this question.
So I need to convert a JSON file (called by dynamic name, which gave F# Data a bit of an issue I couldn't solve last night as it only allowed a static filename as a sample, and since these two files have potential to be different from sample and provided, the type provider doesn't work) to a dictionary or some other collection.
Now inside the template parsing function I need to replace [lang: hello] with something like
let key = "duck"
(*Magic function to convert JSON to usable collection*)
let languageString = convertedJSONCollection.[key] (*And obviously check if containsKey first*)
Which means I need to call the key dynamically, and I couldn't figure out how to do that with the type that FSharp.Data provided.
I have played around with some Thoth as well to some promising results that ended up going nowhere. I avoided JSON.NET because I thought it was paid, but just realised I am mistaken there so might be an avenue to explore
For comparison, the PHP function looks something like this:
function loadLanguage($lang='english){
$json = file_get_contents("$lang.json");
return json_decode($json, true);
}
$key = 'duck';
$langVars = loadLanguage();
$duck = $langVars[$key] || "";
Is there a clean way to do this in F#/.NET? JSON seems really painful to work with in comparison to PHP/Javascript and I'm starting to lose my mind. Am I going to have to write my own parser (which means probably going back to PHP)?
Cheers to all you F# geniuses who know the answer :p
open Thoth.Json.Net
let deserialiseDictionary (s: string) =
s
|> Decode.unsafeFromString (Decode.keyValuePairs Decode.string)
|> Map.ofList
let printDictionary json =
json
|> deserialiseDictionary
|> fun m -> printfn "%s" m.["hello"] // Hello
For the question about 'T the question becomes, what can 'T be? For json it very limited, it can be a number of things, string, json-object, number, bool or json array. What should happen if it is bool or a number?

Powershell: json and dot including value variable access

I have a problem accessing json-objects of predictable structure but unknown depth in Powershell. So the json-objects contain information that can be connected by "and" and "or", but those connections can be used in several levels. As an exanple:
$ab=#"
{
"cond": "one",
"and": [
{"cond": "two"},
{"cond": "three"},
{"or": [{"cond": "four"},
{"cond": "five"}
]
}
]
}
"# | ConvertFrom-Json
I need to be able to read/test something like
$test="and.or"
$ab.$test.cond
where $test is a combination of several "and"s and "or"s like and.or.or.and .
The problem is that I can't figure out how my idea of $ab.$test.cond is to be written in Powershell to work. In theory I could test all possible combinations to a given depth by hand, but I'd prefer not to. Does anyhow have an idea how this could work? Thanks a lot!
(Powershell Version 5)
I think you should define a proper set of classes for your conditional engine/descriptors, either using PowerShell classes or using C# to create an assembly so you can use the types within PowerShell.
But for a quick and dirty PowerShell solution, you could do this:
"`$ab.$test.cond" | Invoke-Expression
# or
'$ab.{0}.cond' -f $test | Invoke-Expression
This has no error checking of course. Any other solution is likely going to be a separate recursive function if you want to get real checking and such, but it will be more fragile then using a well-defined set of objects.

What programming language or technology to use for defining a rule set on JSON parsing?

I'm facing the following scenario. I am currently writing a validation mechanism on JSON-files, to verify that several constraints are met. For this task, I am wondering, which might be the best tool to be used and I'd be happy for any fitting suggestions.
Now some of you might want to mention here, that the usage of a JSON-schema validator might be the thing I'm searching for, but I personally think it does not match my use case.
Let me give you an example:
{
"document" : {
"Type" : "A",
"Action" : [ "ActionA",
"ActionB",
"ActionC"
]
}
}
For the upper, simplified document, I would like to define rules for validation, which produce errors or hints, in case they are not met.
Rules could (in an abstract manner) be described as something like this:
if document has type "A" and "ActionB" in Actions then throw error
if document has type "B" then throw error
if document has no type then throw error
if document has type "C" and "ActionC" is not in Actions then throw error
Such a "black-listing" approach for such a rule set would be a preferred solution to me, but also a "white-list"-based approach could fit the situation quite well. The sample rules I have described here can also be more complex and span across multiple levels in the JSON-hierarchy.
As far as I'm informed, JSON schema validation is not capable of completing the described task, as it merely validates that the syntax on a JSON file is correct. But I'll gladly admit my failure, in case I was wrong.
I have constructed an architecture around that parsing mechanism based on python and initially wanted to also implement the described validation with python. But on the second look, I had the feeling that there might be better fitting tools for this task.
Tools like Yacc (in conjunction with lex) came in to my mind, as the whole situation tends to the necessity to define a grammar on JSON-files, which executes the rule set I'd like to implement. But unfortunately I am not very familiar with such tools and therefore was unable to evaluate, whether it would be the right choice.
So, to repeat my question. I would like to know, which tool or programming language would make a good fit on my problem in a "clean" manner? (By the term "clean" I just mean that the tool should basically be destined for the desired purpose and does not need a bunch of work-arounds to get it done. Because otherwise any tool would fit the purpose)
Here is an approach using jq. The basic idea is to express your checks as jq functions.
For example if you have a file bad.json with a bad document
{"document":{"Type":"A","Action":["ActionA","ActionB","ActionC"]}}
and a file good.json with a good document
{"document":{"Type":"Good"}}
this command
jq -Mnr '
def case1: if (.document.Type == "A") and (.document.Action|contains(["ActionB"])) then "case 1" else empty end;
def case2: if .document.Type == "B" then "case 2" else empty end;
def case3: if .document.Type == null then "case 3" else empty end;
def case4: if (.document.Type == "A") and (.document.Action|contains(["ActionD"])|not) then "case 4" else empty end;
def status:
first(case1, case2, case3, case4, "ok")
;
inputs
| "\(input_filename): \(status)"
' bad.json good.json
produces
bad.json: case 1
good.json: ok

How is the tilde escaping in the JSON Patch RFC supposed to operate?

Referencing https://www.rfc-editor.org/rfc/rfc6902#appendix-A.14:
A.14. ~ Escape Ordering
An example target JSON document:
{
"/": 9,
"~1": 10
}
A JSON Patch document:
[
{"op": "test", "path": "/~01", "value": 10}
]
The resulting JSON document:
{
"/": 9,
"~1": 10
}
I'm writing an implementation of this RFC, and I'm stuck on this. What is this trying to achieve, and how is it supposed to work?
Assuming the answer to the first part is "Allowing json key names containing /s to be referenced," how would you do that?
The ~ character is a keyword in JSON pointer. Hence, we need to "encode" it as ~0. To quote jsonpatch.com,
If you need to refer to a key with ~ or / in its name, you must escape the characters with ~0 and ~1 respectively. For example, to get "baz" from { "foo/bar~": "baz" } you’d use the pointer /foo~1bar~0
So essentially,
[
{"op": "test", "path": "/~01", "value": 10}
]
when decoded yields
[
{"op": "test", "path": "/~1", "value": 10}
]
~0 expands to ~ so /~01 expands to /~1
I guess they mean that you shouldn't "double expand" so that expanded /~1 should not be expanded again to // and thus must not match the documents "/" key (which would happen if you double expanded). Neither should you expand literals in the source document so the "~1" key is literally that and not equivalent to the expanded "/". But I repeat that's my guess about the intention of this example, the real intention may be different.
The example is indeed really bad, in particular since it's using a "test" operation and doesn't specify the result of that operation. Other examples like the next one at A.15 at least says its test operation must fail, A.14 doesn't tell you if the operation should succeed or not. I assume they meant the operation should succeed, so that implies /~01 should match the "~1" key. That's probably all about that example.
If I were to write an implementation I'd probably not worry too much about this example and just look at what other implementations do - to check if I'm compatible with them. It's also a good idea to look for test suites of other projects, for example I found one from http://jsonpatch.com/ at https://github.com/json-patch/json-patch-tests
I think the example provided in RFC isn't exactly best thought-out, especially that it tries to document a feature only through example, which is vague at best - without providing any kind of commentary.
You might be interested in interpretation presented in following documents:
Documentation of Rackspace API
Documentation of OpenStack API
These seem awfully similar and I think it's due to nature of relation between Rackspace and OpenStack:
OpenStack began in 2010 as a joint project of Rackspace Hosting and NASA (...)
It actually provides some useful details including grammar it accepts and rationale behind introducing these tokens, as opposed to the RFC itself.
Edit: it seems that JSON pointers have separate RFC 6901, which is available here and OpenStack and Rackspace specifications above are consistent with it.

Getting Sphider to output JSON

I've recently added the Sphider crawler to my site in order to add search functionality. But the default search.php that comes with the distribution of Sphider that I downloaded is too plain and doesn't integrate well with the rest of my site. I have a little navigation bar at the top of the site which has a search box in it, and I'd like to be able to access Sphider's search results through that search field using Ajax. To do this, I figure I need to get Sphider to return its results in JSON format.
The way I did that is I used a "theme" that outputs JSON (Sphider supposts "theming" its output). I found that theme on this thread on Sphider's site. It seems to work, but more strict JSON parsers will not parse it. Here's some example JSON output:
{"result_report":"Displaying results 1 - 1 of 1 match (0 seconds) ", "results":[ { "idented":"false", "num":"1", "weight":"[100.00%]", "link":"http://www.avtainsys.com/articles/Triple_Contraints", "title":"Triple Contraints", "description":" on 01/06/12 Project triple constraints are time, cost, and quality. These are the three constraints that control the performance of the project. Think about this triple-constraint as a three-leg tripod. If one of the legs is elongated or", "link2":"http://www.avtainsys.com/articles/Triple_Contraints", "size":"3.3kb" }, { "num":"-1" } ], "other_pages":[ { "title":"1", "link":"search.php?query=constraints&start=1&search=1&results=10&type=and&domain=", "active":"true" }, ] }
The issue is that there is a trailing comma near the end. According to this, "trailing commas are not allowed" when using PHP's json_decode() function. This JSON also failed to parse using this online formatter. But when I took the comma out, it worked and I got this better-formatted JSON:
{
"result_report":"Displaying results 1 - 1 of 1 match (0 seconds) ",
"results":[
{
"idented":"false",
"num":"1",
"weight":"[100.00%]",
"link":"http://www.avtainsys.com/articles/Triple_Contraints",
"title":"Triple Contraints",
"description":" on 01/06/12 Project triple constraints are time, cost, and quality. These are the three constraints that control the performance of the project. Think about this triple-constraint as a three-leg tripod. If one of the legs is elongated or",
"link2":"http://www.avtainsys.com/articles/Triple_Contraints",
"size":"3.3kb"
},
{
"num":"-1"
}
],
"other_pages":[
{
"title":"1",
"link":"search.php?query=constraints&start=1&search=1&results=10&type=and&domain=",
"active":"true"
}
]
}
Now, how would I do this programmatically? And (perhaps more importantly), is there a more elegant way of accomplishing this? And you should know that PHP is the only language I can run on my shared hosting account, so a Java solution for example would not work for me.
In search_result.html, you can surround the , at the end of the foreach loop with condition to only print if the index is strictly less than the number of pages - 1.