I'm currently working on improving JSON querying capabilities with Brackit[1] and [2], which is an XQuery engine with additional arrays and "records". I'm now basically following the same XDM as JSONiq uses, but I'm sadly no XQuery expert. I guess I've more or less taken over the project from Sebastian and especially added temporal enhancements.
Brackit uses a dereferencing operator => for records / objects to get the value for a name.
Additionally it uses [[expr()]] for array index lookups, I guess just like the pure JSONiq specification.
I'm sure you have good reasons to do the dynamic function calls instead, so I might have to change it. However, I thing that the dereferencing operator might work in all cases, which is in my opinion a nicer syntax.
I think this vision is great to have a query compiler for semi-structured data with proven optimizations for use in data stores: http://wwwlgis.informatik.uni-kl.de/cms/dbis/projects/brackit/mission/
One of the decisive features of Brackit might be the pipelining of FLOWR expressions for set-oriented processing.
kind regards
Johannes
[1] https://github.com/sirixdb/brackit
[2] http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/publications/2013/Dissertation-Baechle.pdf
Thank you for your question, Johannes.
Array and object lookup with dynamic function calls was introduced in very early versions of JSONiq, which started as an extension of XQuery. It is common practice in language design to try to reuse existing machinery in early investigations, before extending the data model and syntax.
Since objects and arrays can be seen as "extensional functions" that explicitly list the input-output pairs, (ab)using dynamic function calls for object and array lookup is quite natural. This approach was also taken in XQuery 3.1.
Syntactic extensions came later. In the "pure JSONiq" syntax, we opted for .Expr for objects and [] as well as [[Expr]] for arrays (double [[]] to not confuse with predicates). XQuery 3.1 also adds a convenient syntax with ? for both objects and arrays.
For FLWOR expressions I recommend looking into Rumble, which pretty much does pipelines in that way already. The paper is here.
Related
At first glance, it seems like Unison may be homoiconic due to the fact that "code is data", at least in the sense that Unison code are stored as cryptographic hashes in a durable fashion. However, directly working with cryptographic hashes doesn't seem very practical, perhaps no more than directly working with compiled bytecode for the JVM. So maybe it is best to break this down into two parts:
Is Unison currently homoiconic?
Could Unison be homoiconic, with additional code-generation and AST manipulation features?
On 1, I'd say no. When Unison is eventually self-hosted, then sure, the compiler data structures could be made available as a library.
However, since Unison builtins to convert any Unison value or code to a well defined serialized form, you can write a library that parses that form into some Unison data structures that represent that code. That is actually what the in progress Unison JIT compiler does. And the library that Dan developed for this will be something people could use for other purposes (like I could imagine using it to write plugins that would generate a JSON serializer for arbitrary Unison values, for instance).
Maybe some people would say the existence of said library counts as homoiconicity now. Like it doesn't really matter if the compiler internally represents code as a Unison data structure as long as you have a function for converting code to a Unison data structure.
Aside: I dislike the term homoiconicity. It's a fancy piece of jargon that isn't even particularly well-defined.
Is there any difference between Rapid JSON and Json parser in Boost Library(Boost\property_Tree\Json_parser.hpp)
Thanks.
I have compared 37 C/C++ JSON libraries in nativejson-benchmark for standard conformance and performance.
However, I failed to integrate Boost.PropertyTree (1.60) in the benchmark, because it parses number, true, false, null types as strings.
Edit: To answer the question more directly, Boost.PropertyTree cannot provide JSON functionalities most JSON libraries do. On the other side, RapidJSON is a JSON library with high conformance and performance. BTW, in addition to parsing/stringifying JSON, RapidJSON also provides streaming-style API, JSON pointer and JSON schema. These features are uncommon in open source libraries.
EDIT - the Boost Library seems to only use RapidXML, not RapidJSON.
It should be of no concern to you because it's an implementation detail of the library anyways.
So the answer might be "no" (more likely, "yes") and you stand to gain absolutely nothing from it because you cannot depend on it.
Just pick your own XML library and use it where you need it: What XML parser should I use in C++?
IIRC Boost mostly modified the namespace, so you won't have ODR clashes when you select RapidXML
I have a cookie value like:
"[{"index":"1","name":"TimePeriod","hidden":false},{"index":"2","name":"Enquiries","hidden":false},{"index":"3","name":"Online","hidden":false}]"
I would like to use this cookie value as an array in ColdFusion. What would be the best possible way to do this?
The normal answer would be use the built-in deserializeJson function, but since that function wasn't available in CFMX7 (it arrived in CF8), you will need to use a UDF to achieve the same thing.
There are two sites which contain resources of this type, cflib.org and riaforge.org, each of which have a different potential solution for MX7.
Searching CFlib provides JsonDecode. (CFLib has a specific filter for "Maximum Required CF Version", so you can ensure any results that appear will work for your version.)
Searching riaforge provides JSONUtil, which runs on MX7 (but also claims better type mapping than the newer built-in functions).
Since MX7 runs on Java, you can likely also make use of any of the numerous Java libraries listed on json.org, using createObject/java.
JSON serialization was added natively in CF8.
If you are on MX7 look on riaforge.org for a library that will deSerialize JSON for you.
I am learning on my own about writing an interpreter for a programming language, and I have read about Abstract Syntax Trees. I have an idea of what they are, but I do not see their use.
Why are ASTs useful?
They represent the logic/syntax of the code, which is naturally a tree rather than a list of lines, without getting bogged down in concrete syntax issues such as where you place your asterisk.
The logic can then be manipulated in a manner more consistent and convenient from the backend's POV, which can be (and is, for everything but Lisps ;) very different from how we write the concrete syntax.
The main benefit os using an AST is that you separate the parsing and validation logic from the implementation piece. Interpreters implemented as ASTs really are easier to understand and maintain. If you have a problem parsing some strange syntax you look at the AST parser , if a pices of code is not producing the expected results than you look at the code that interprets the AST.
The other great advantage is when you syntax requires "lookahead" e.g. if your syntax allows a subroutine to be used before it is defined it is trivial to validate the existence of a subroutine when you are using an AST - its much more difficult with an "on the fly" parser.
You need "syntax trees" to represent the structure of most programming langauges, in order to carry out analysis or transformation on documents that contain programming language text. (You can see some fancy examples of this via my bio).
Whether that tree is abstract (AST) or concrete (CST) is a matter of taste, convenience, and engineering sweat. The term CST is specially used to describe the parse derivation tree when a grammar is used to deconstruct source code; it usually contains tree elements for lots of concrete syntax such as statement terminator semicolons. AST is used to mean "something simpler than the CST", e.g., leaving out semicolon tree nodes because they don't affect program analysis much, and thus writing analyzers that process ASTs is less conceptual and engineering effort than writing the same analyzer on a CST. A better way to understand this is to realize that the AST is usually as isomorphic equivalent of the CST, that is, you should be able to regenerate the CST from it. If you want to transform the source text and regenerate it, then the CST is often a better choice as it loses less information from the original program (and my fancy example uses this approach).
I think you will find the SO discussion on abstract vs. concrete syntax trees pretty helpful.
In general you are going to parse you code into some form of AST, it may be more or less of a formal model. So I think what Kirk Woll was getting at by his comment above is that when you parse the language, you very often use the parser to create some sort of data model of the raw content of what you are reading, generally organized in a tree fashion. So by that definition an AST is hard to avoid unless you are doing a very simple translator.
I use ANTLR often for parsing complex languages and in that context there is a slightly more specific meaning of an AST. ANTLR has a handy way of generating an AST in the parser grammar using pretty simple actions. You then write a generally much simpler parser for this AST which you can operate on like a much simpler version the language you are processing. Whether the extra work of building two parsers is a net gain is a function of the language complexity and what you are planning on doing with with it once you parsed it.
A good book on the subject that you may want to take a look at is "Language Implementation Patterns" by Terrence Parr the ANTLR author. He addresses this topic pretty thoroughly. That said, I didn't really get ASTs until I started using them, so that (as usual) is the best way to understand them.
Late to the question but I thought I'd add something. You don't actually have to build an AST. It is possible to emit instructions directly as you parse the source code. In this case, the AST is implied in the parsing grammar. For simple languages, especially dynamically typed languages, this is a perfectly ok strategy. For more complex languages or where you need to further analyze the source code, an AST can be very useful. For example, if your language is statically typed, ie your variables are declared with fixed types then the AST can be used to check that you're not assigning the wrong type to a variable. eg assigning a string to a variable that is declared to hold an integer would be wrong and this can be caught more conveniently with the AST.
Also, as others have mentioned, the AST offers a clean separation between syntax analysis and code generation and makes the code much more modular.
With an assignment dealing with access control lists, it was required to construct a doubly-linked list first, as Java doesn't include that in the SUN API. I can understand if the professor wanted us to create the Doubly Linked List from scratch to understand how it works (like writing a sort program vs using the baked in methods), but why exclude it from the library?
That got me to thinking, what determines if a data structure is included in the basic language library? For example, in Java, why is there a LinkedList class but not a DoublyLinkedList?
According to the Java Docs:
All of the operations perform as could
be expected for a doubly-linked list.
Operations that index into the list
will traverse the list from the
beginning or the end, whichever is
closer to the specified index.
I believe, of course this is opinion, that the only data structures that all programming languages have internally built into them are the following: Array, List, Tree, Graph, and Bag. I think List and Bag can pretty much be interchangeable in terms of vocabulary. Keyword being think.