CSV to JSON benchmarks - json

I'm working on a project that uses parallel methods to convert text from one form to another. We're going to implement a CSV to JSON converter to demonstrate the speedups that are possible using our parallel framework.
We want to benchmark our converter once it's finished. What are the fastest libraries/stand-alone programs/etc out there that are capable of doing CSV-JSON conversion? I found a list of potential candidates here:Large CSV to JSON/Object in Node.js, but I'm not sure how fast the listed options are. In the worst case I'll benchmark them myself, but if someone already knows what the "best in class" converters are it'd save me some time.

Looks like the maintainer of csvtojson has developed a benchmark application. I think I can add my csv to json converter to his benchmark project to test my converter.

if your project can consider in-browser apps, I suggest csvtojson as it is by far the speediest converter on the market as of 2017.
I created it myself so I may be a bit biaised, but I specifically developed it for a bigger project that required big csv to json crunching.
Tell me if it served.

Related

Can CouchDB natively convert plain text to json format?

I'm aware that there are python and powershell methods to convert plain text files, csv's etc.... into json format for upload into NoSQL DBs such as CouchDB.
But according to the CouchDB definitive guide, it makes it seems like there is a native built in way of doing this kind of conversion, without the need for a 3rd party tool.
This older thread appears to hint at this:
Filter and update functions in CouchDB?
This part in particular:
There are other design document functions that are being introduced at the >time of this writing, including _update and _filter that we aren’t covering in >depth here. Filter functions are covered in Chapter 20, Change Notifications. >Imagine a web service that POSTs an XML blob at a URL of your choosing when >particular events occur. PayPal’s instant payment notification is one of >these. With an _update handler, you can POST these directly in CouchDB and it >can parse the XML into a JSON document and save it. The same goes for CSV, >multi-part form, or any other format.
But when I dig deeper I don't find anything concrete.
The supporting wiki link is not clear to me (a beginner with json/NoSQL/curl stuff: http://wiki.apache.org/couchdb/Document_Update_Handlers
Hopefully this is a simple yes/no. And any links to help on this that is better than the above link also appreciated, thank you!
CouchDB supports transforming the internal documents/views into many other formats through the use of show and list functions. It's not a "native" transformation, as you define the transformation yourself, it's not magical.
That being said, there is not a similar mechanism for the reverse (ie: converting some arbitrary format into JSON documents) but you're much better off scripting that with a full-featured language/script and using the bulk docs API to do your imports in batch.

Convert Jupiter Tessellation(JT) files to JSON to render in THREE.js

How do I render .JT files in THREE.js? I checked following options and could not get anything to proceed with:
checked in Three.js on different loaders which are available - didn't get loader for JT files. Please let me know if there is anything already present which I am missing in three.js.
http://www.johannes-raida.de/jnetcad.htm - if I have to write my own conversion method, at this of point, not sure how to proceed with. any pointer will be helpful
Any help or pointers will be appriciated.
Thanks in advance,
Pradeep
JNetCAD won't help you a huge amount unless you write it as a web service. Its Java (server side code) while three.js is client side. If you wanted to go down this route though, you could create a small RESTful web service that took a .JT file as input and returned the converted JSON. This does require a fair amount of Java knowledge though.
Unless you specifically need on-the-fly transformation, you're probably going to find it a lot easier to just run the .JT file through the tool yourself and work directly from a supported format. This is what they have done on the JNetCAD web page to good effect.

Weka: Limitations on what one can output as source?

I was consulting several references to discover how I may output trained Weka models into Java source code so that I may use the classifiers I am training in actual code for research applications I have been developing.
As I was playing with Weka 3.7, I noticed that while it does output Java code to its main text buffer when use simpler classification (supervised in my case this time) methods such as J48 decision tree, it removes the option (rather, it voids it by removing the ability to checkmark it and fades the text) to output Java code for RandomTree and RandomForest (which are the ones that give me the best performance in my situation).
Note: I am clicking on the "More Options" button and checking "Output source code:".
Does Weka not allow you to output RandomTree or RandomForest as Java code? If so, why? Or if it does and just doesn't put it in the output buffer (since RF is multiple decision trees which I imagine it doesn't want to waste buffer space), how does one go digging up where in the file system Weka outputs java code by default?
Are there any tricks to get Weka to give me my trained RandomForest as Java code? Or is Serialization of the output *.model files my only hope when it comes to RF and RandomTree?
Thanks in advance to those who provide help.
NOTE: (As an addendum to the answer provided below) If you run across a similar situation (requiring you to use your trained classifier/ML model in your code), I recommend following the links posted in the answer that was provided in response to my question. If you do not specifically need the Java code for the RandomForest, as an example, de-serializing the model works quite nicely and fits into Java application code, fulfilling its task as a trained model/hardened algorithm meant to predict future unlabelled instances.
RandomTree and RandomForest can't be output as Java code. I'm not sure for the reasoning why, but they don't implement the "Sourceable" interface.
This explains a little about outputting a classifier as Java code: Link 1
This shows which classifiers can be output as Java code: Link 2
Unfortunately I think the easiest route will be Serialization, although, you could maybe try implementing "Sourceable" for other classifiers on your own.
Another, but perhaps inconvenient solution, would be to use Weka to build the classifier every time you use it. You wouldn't need to load the ".model" file, but you would need to load your training data and relearn the model. Here is a starters guide to building classifiers in your own java code http://weka.wikispaces.com/Use+WEKA+in+your+Java+code.
Solved the problem for myself by turning the output of WEKA's -printTrees option of the RandomForest classifier into Java source code.
http://pielot.org/2015/06/exporting-randomforest-models-to-java-source-code/
Since I am using classifiers with Android, all of the existing options had disadvantages:
shipping Android apps with serialized models didn't reliably work across devices
computing the model on the phone took too much resources
The final code will consist of three classes only: the class with the generated model + two classes to make the classification work.

reading json objects in hadoop map reduce for processing data

iam a beginner in hadoop,can any one help me in reading json in mapreduce job.
i have googled and found jaql is suitable for reading json.but i didnot find any documentaion on how it could be implemented in our map reduce job.
is there any other framework which supports reading json in map reduce?
any suggestions on this?
Thanks in Advance
I would rather trust the MapReduce framework itself to handle this. MapReduce allows us to write custom Inout/Output Formats to handle data which is not supported by it OOTB, like JSON. See this question for an example. I would prefer this as I won't require any third party stuff for this. It's just a matter of extending the MapReduce API(But it's just my choice. Other's may find something else more suitable).
But, the easiest way, IMHO, would be to use Hive or Pig to handle JSON data. You don't have to do much in order to make it work, as both these project have OOTB JSON support. See this for Hive-JSON SerDe and this for Pig's JsonLoader and JsonStorage.
HTH

How can I marshal JSON to/from a POJO for BlackBerry Java?

I'm writing a RIM BlackBerry client app. BlackBerry uses a simplified version of Java (no generics, no annotations, limited collections support, etc.; roughly a Java 1.3 dialect). My client will be speaking JSON to a server. We have a bunch of JAXB-generated POJOs, but they're heavily annotated, and they use various classes that aren't available on this platform (ArrayList, BigDecimal, XMLGregorianCalendar). We also have the XSD used by the JAXB-XJC compiler to generate those source files.
Being the lazy programmer that I am, I'd really rather not manually translate the existing source files to Java 1.3-compatible JSON-marshalling classes. I already tried JAXB 1.0.6 xjc. Unfortunately, it doesn't understand the XSD file well enough to emit proper classes.
Do you know of a tool that will take JAXB 2.0 XSD files and emit Java 1.3 classes? And do you know of a JSON marshalling library that works with old Java?
I think I am doomed because JSON arrived around 2006, and Java 5 was released in late 2004, meaning that people probably wouldn't be writing JSON-parsing code for old versions of Java.
However, it seems that there must be good JSON libraries for J2ME, which is why I'm holding out hope.
For the first part good luck but I really don't think you're going to find a better solution than to modify the code yourself. However, there is a good J2ME JSON library you can find a link to the mirror here.
I ended up using apt (annotation processing tool) to run over the 1.5 sources and emit new 1.3-friendly source. Actually turned out to be a pretty nice solution!
I still haven't figured out an elegant way to do the actual JSON marshalling, but the apt tool can probably help write the rote code that interfaces with a JSON library like the one Jonathan pointed out.