Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Is DOM the only way to parse JSON?
Some JSON parsers do offer incremental ("streaming") parser; for Java, at least following parsers from json.org page offer such an interface:
Jackson (pull interface)
Json-simple (SAX-style push interface)
(in addition to Software Monkey's parser referred to by another answer)
Actually, it is kind of odd that so many JSON parsers do NOT offer this simple low-level interface -- after all, they already need to implement low-level parsing, so why not expose it.
EDIT (June 2011): Gson too has its own streaming API (with gson 1.6)
By DOM, I assume you mean that the parser reads an entire document at once before you can work with it. Note that saying DOM tends to imply XML, these days, but IMO that is not really an accurate inference.
So, in answer to your questions - "Yes", there are streaming API's and "No", DOM is not the only way. That said, processing a JSON document as a stream is often problematic in that many objects are not simple field/value pairs, but contain other objects as values, which you need to parse to process, and this tends to end up a recursive thing. But for simple messages you can do useful things with a stream/event based parser.
I have written a pull-event parser for JSON (it was one class, about 700 lines). But most of the others I have seen are document oriented. One of the layers I have built on top of my parser is a document reader, which took about 30 LOC. I have only ever used my parser in practice as a document loader (for the above reason).
I am sure if you search the net you will find pull and push based parsers for JSON.
EDIT: I have posted the parser to my site for download. A working compilable class and a complete example are included.
EDIT2: You'll also want to look at the JSON website.
As stefanB mentioned, http://lloyd.github.com/yajl/ is a C library for stream parsing JSON. There are also many wrappers mentioned on that page for other languages:
yajl-ruby - ruby bindings for YAJL
yajl-objc - Objective-C bindings for YAJL
YAJL IO bindings (for the IO language)
Python bindings come in two flavors, py-yajl OR yajl-py
yajl-js - node.js bindings (mirrored to github).
lua-yajl - lua bindings
ooc-yajl - ooc bindings
yajl-tcl - tcl bindings
some of them may not allow streaming, but many of them certainly do.
If you want to use pure javascript and a library that runs both in node.js and in the browser you can try clarinet:
https://github.com/dscape/clarinet
The parser is event-based, and since it’s streaming it makes dealing with huge files possible. The API is very close to sax and the code is forked from sax-js.
Disclaimer: I'm suggesting my own project.
I maintain a streaming JSON parser in Javascript which combines some of the features of SAX and DOM:
Oboe.js website
The idea is to allow streaming parsing, but not require the programmer to listen to lots of different events like with raw SAX. I like SAX but it tends to be quite low level for what most people need. You can listen for any interesting node from the JSON stream by registering JSONPath patterns.
The code is on Github here:
Oboe.js Github page
LitJSON supports a streaming-style API. Quoting from the manual:
"An alternative interface to handling JSON data that might be familiar to some developers is through classes that make it possible to read and write data in a stream-like fashion. These classes are JsonReader and JsonWriter.
"These two types are in fact the foundation of this library, and the JsonMapper type is built on top of them, so in a way, the developer can think of the reader and writer classes as the low-level programming interface for LitJSON."
If you are looking specifically for Python, then ijson claims to support it. However, it is only a parser, so I didn't come across anything for Python that can generate json as a stream.
For C++ there is rapidjson that claims to support both parsing and generation in a streaming manner.
Here's a NodeJS NPM library for parsing and handling streams of JSON:
https://npmjs.org/package/JSONStream
For Python, an alternative (apparently lighter and more efficient) to ijson is jsaone (see that link for rough benchmarks, showing that jsaone is approximately 3x faster).
DISCLAIMER: I'm the author of jsaone, and the tests I made are very basic... I'll be happy to be proven wrong!
Answering the question title: YAJL a JSON parser library in C:
YAJL remembers all state required to
support restarting parsing. This
allows parsing to occur incrementally
as data is read off a disk or network.
So I guess using yajl to parse JSON can be considered as processing stream of data.
In reply to your 2nd question, no, many languages have JSON parsers. PHP, Java, C, Ruby and many others. Just Google for the language of your choice plus "JSON parser".
Related
Is there any difference between Rapid JSON and Json parser in Boost Library(Boost\property_Tree\Json_parser.hpp)
Thanks.
I have compared 37 C/C++ JSON libraries in nativejson-benchmark for standard conformance and performance.
However, I failed to integrate Boost.PropertyTree (1.60) in the benchmark, because it parses number, true, false, null types as strings.
Edit: To answer the question more directly, Boost.PropertyTree cannot provide JSON functionalities most JSON libraries do. On the other side, RapidJSON is a JSON library with high conformance and performance. BTW, in addition to parsing/stringifying JSON, RapidJSON also provides streaming-style API, JSON pointer and JSON schema. These features are uncommon in open source libraries.
EDIT - the Boost Library seems to only use RapidXML, not RapidJSON.
It should be of no concern to you because it's an implementation detail of the library anyways.
So the answer might be "no" (more likely, "yes") and you stand to gain absolutely nothing from it because you cannot depend on it.
Just pick your own XML library and use it where you need it: What XML parser should I use in C++?
IIRC Boost mostly modified the namespace, so you won't have ODR clashes when you select RapidXML
I'm looking for a library to parse HTML files in OCaml.
Basically the equivalent of Jsoup/Beautiful Soup.
The main requirement is being able to query the DOM with CSS selectors.
Something in the form of
page.fetch("http://www.url.com")
page.find("#tag")
I had a need for something like this recently, so after seeing this question and reading the recommendations in the comments, I wrote a library "Lambda Soup" over the weekend for fun.
You will want to use a library like ocurl or Cohttp to retrieve the actual HTML. After you have it, you can do
html |> parse $ "#tag"
to do what is asked in the question. For other possibilities and the full signature, see the documentation. You may want to look at the documentation postprocessor or tests for a fairly thorough demonstration of usage and capabilities, including CSS support and extensions.
Per comments, Lambda Soup uses Ocamlnet's HTML parser. Lambda Soup uses Markup.ml. Otherwise, it has no dependencies, except OUnit if you wish to run the tests. I'm happy for any feedback, including about modifying the interface (it is at an early stage) or discussions of adding an HTTP downloader to the library (which seems iffy because it greatly alters the scope of the library as it now is, but I am happy to hear arguments).
The license is BSD.
As the header says.
In general I like YAML more than JSON these days. I implemented a RESTful WS PoC back in the day using JSON. I was wondering if I can instead use YAML or not.
E.g. are there enough tools/libraries/support for doing that? Or would I end up doing quite a bit of mundane/tedious coding which I would've avoided if I were using JSON instead?
Also as I understood from WWW: REST doesn't restrict one from using YAML as the payload, is that correct?
Thanks!
Yes, if it's a goal that the data be especially readable by humans. REST itself isn't focused on protocols/formats so much as patterns.
There's not a lot to gain here for webservices however, which typically represent app to app communication. Computers don't care, and JSON can be pretty-printed to improve legibility somewhat.
YAML is well supported by mainstream languages, though not always included in standard libraries as JSON typically is. So you'll probably be looking at an additional library dependency.
Also, if the client is a browser, parsing will be slower, as you'll have to use a non-native external lib such as described here using: JavaScript YAML Parser . Make sure it gets compressed in transit or the extra indentation spaces will expand the size of the data.
Also, YAML has a lot of esoteric and downright potentially dangerous features. Whenever I'm using it I use the "safe" parser, and deactivate many if not most of its features besides data structures.
I could imagine some utility as a debug parameter however, perhaps url.yaml or …?fmt=yaml to assist during development. But, otherwise not much gain for all the trouble.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am complete blank about JSON but I would like to be able to read some data from the URL http://stokercloud.dk/dev/getdriftjson.php?mac=oz8hp and be able to store them im a DB.
But I don't know where to start, so I thought I would ask here for hints and maybe some links to samples that I might learn from
I know that the output might look confusing, but I have a list of what each item is.
the file is runtime data from my pelletburner
The JSON specification is the first page to read. The standard is so simple it is easy to understand it from this page.
I found a wider tutorial, with illustrations and more resources. Nice to see.
Here is the conclusion of this web page:
JSON is a open,text based, light-weight data interchange format specified as RFC4627, came to the developer world in 2005 and it's
popularity is increased rapidly.
JSON uses Object and Array as data structures and strings, number, true, false and null as values. Objects and arrays can be nested
recursively.
Most (if not all) modern programming languages can be used to work with JSON.
NoSQL databases, which are evolved to get rid of the bottlenecks of the Relational Databases, are using JSON to store data.
JSON gives developers a power to choose between XML and JSON leading to more flexibility.
Besides NoSQL, AJAX, Package Management, and integration of APIs to the web application are the major areas where JSON is being used
extensively.
IMHO the main point with JSON is that it contains documents, or arrays of documents. There is less data types than with Delphi (e.g. no official date/time, and just one numeric type). It is an exchange format, which is widely used now, and, from my own experiment, easier to work with than XML, from both human and computer sides.
In Delphi, you have several libraries around, mainly:
SuperObject
XSuperObject
dwsJSON
lkJSON
DBXJSON which ships with newer versions of Delphi;
mORMot for Win32/Win64
SynCrossPlatformJSON
About performance, you can take a look at our blog article. DBXJSON (and the official JSON unit of Delphi) is by far the slowest, and somewhat difficult to work with. Some methods for easy access to the JSON document content are missing. Other libraries are much easier to work with. Our version shipped with mORMot is very fast, as is dwsJSON. SuperObject is slower than those, especially for huge content, and XSuperObject is slow (but cross-platform). Our SynCrossPlatformJSON unit is also cross-platform, very fast, and has a variant-based document access.
Some code using mORMot library:
uses
SynCrtSock,
SynCommons;
procedure test;
var json: RawUTF8;
jsondata: TDocVariantData;
i: integer;
begin
json := TWinHttp.Get('http://stokercloud.dk/dev/getdriftjson.php?mac=oz8hp');
jsondata := DocVariantData(_json(json).jsondata)^;
for i := 0 to jsondata.Count-1 do
writeln(jsondata.Values[i]); // here all items are converted back to JSON and written
end;
To learn JSON (JavaScript Object Notation), you'd read JSON on Wikipedia.
To download data from url, you can use TIdHttp, which is an Http client of Indy framework.
To parse JSON, I'd suggest use superobject. It includes great examples in demos directory.
JSON is an interchange form for sending data between anything that needs to have data sent to it. Its simplicity is its strength.
The text is valid javascript and so can be interpreted by any javascript compiler, but is now so popular that virtually every language now has a json parser built in or as a library ( see http://json.org/ scroll down to the bottom).
Basically JSON is a very simple structured text. If you google JSON Library Delphi you should get some solutions or for any other language you want to use.
I have chosen jeroMQ for building Asynchronous message channel for publishing content from multiple clients. On the other end server side workers processes request and notify client only if server wanted to notify client based on the message received.
On digging deep, looking for messaging library to marshal/un-marshal message. I found kvpmsg class which does the job for simple key-value.
Don't want to re-invent the wheel if some standard library exists, that can be applied for bigger objects
It seems like you are asking for data serialization libraries. Check Wikipedia for a list and a comparison of data serialization formats.
Also there is a relevant entry in ZeroMQ FAQ explaining why ZeroMQ doesn't include any serialization format:
Does ØMQ include APIs for serializing data to/from the wire representation?
No. This design decision adheres to the UNIX philosophy of "do one thing and do it well". In the case of ØMQ, that one thing is moving messages, not marshaling data to/from binary representations.
Some middleware products do provide their own serialization API. We believe that doing so leads to bloated wire-level specifications like CORBA (1055 pages). Instead, we've opted to use the simplest wire formats possible which ensure easy interoperability, efficiency and reduce the code (and bug) bloat.
If you wish to use a serialization library, there are plenty of them out there. See for example
Google Protocol Buffers
MessagePack
JSON-GLib
C++ BSON Library
Note that serialization implementations might not be as performant as you might expect. You may need to benchmark your workloads with several serialization formats and libraries in order to understand performance and which format/implementation is best for your use case (ease of development must also be considered).