CF9 Serializejson gives "out of memory" error - json

I'm trying to serialize a query to json. The query returns about 300.000 records. When serializing error 500 "out of memory" occurs.
How to solve this. Is there a way to directly stream the query to some file format?

300 records shouldn't be enough to overflow the json library...
How much memory does your server have available / assigned to cf?
Can you paste a stack trace?
We use a handy little library called javacsv.
It is marvelous at creating csvs from arrays of strings. You simply add the .jar file to your class path, then create the java csv class, then call a bunch of methods to add columns or rows. It's good as it automagically quotes all of your data so you dont even have to think about it. It's fast too! I can post some code samples if you are interested.
http://sourceforge.net/projects/javacsv/
CF9 has some spreadsheet exporting methods too, which you should probably check out if you haven't already.
http://cfquickdocs.com/cf9/#cfspreadsheet

Related

SparkSQL: Read JSON or Execute Query on Files Directly?

I have many, large JSON files that I'd like to run some analytics against. I'm just getting started with SparkSQL and am trying to make sure I understand the benefits between having SparkSQL read the JSON records into an RDD/DataFrame from file (and have the schema inferred) or to run a SparkSQL query on the files directly. If you have any experience using SParkSQL either way I'd be interested to hear which method is preferred and why.
Thank you, in advance, for your time and help!
You can call explain() as an action instead of show() or count() on a dataset. Then Spark will show you the selected physical plan.
You can find the above picture here. As far as I know there should be no difference. But I prefer to use the read() method. When I use an IDE, I can see all the available methods. When you do it with SQL, there could be a mistake like slect instead of select, but you will get the error first, when you run your code.

Is HDF5 an Appropriate Technology to Store JSON Data?

I've inherited some code which makes calls to a web API, and get's a deeply nested (up to eight levels) response.
I've written some code to flatten the structure so that it can be written to .csv files, and a SQL database, for people to consume more easily.
What I'd really like to do though is keep a version of the original response, so that there's a reference of the original structure if I ever want/need it.
I understand that HDF5 is primarily meant to store numerical data. Is there any reason not to use it to dump JSON blobs? It seems a lot easier than setting up a NoSQL database.
It should be fine. It sounds like you'd be storing each JSON response as a HDF5 variable length string. Which is fine, it's just a string to the library.
Do you plan to store each response as a separate dataset? That may be inefficient if you are talking about >1000's of responses.
Alternatively, you can create a 1-d extensible dataset, and just append to it with each response.
Decided it was easier to set up a Mongo database.

Dynamic JSON file vs API

I am designing a system with 30,000 objects or so and can't decide between the two: either have a JSON file pre computed for each one and get data by pointing to URL of the file (I think Twitter does something similar) or have a PHP/Perl/whatever else script that will produce JSON object on the fly when requested, from let's say database, and send it back. Is one more suited for than another? I guess if it takes a long time to generate the JSON data it is better to have already done JSON files. What if generating is as quick as accessing a database? Although I suppose one has a dedicated table in the database specifically for that. Data doesn't change very often so updating is not a constant thing. In that respect the data is static for all intense and purposes.
Anyways, any thought would be much appreciated!
Alex
You might want to try MongoDB which retrieves the objects as JSON and is highly scalable and easy to setup.

Bulk loading MongoDB from JSON file with a number of objects

I want to do a bulk load into MongoDB. I have about 200GB of files containing JSON objects which I want to load, the problem is I cannot use the mongoimport tool as the objects contain objects (i.e. I'd need to use the --jsonArray aaram) which is limited to 4MB.
There is the Bulk Load API in CouchDB where I can just write a script and use cURL to send a POST request to insert the documents, no size limits...
Is there anything like this in MongoDB? I know there is Sleepy but I am wondering if this can cope with a JSON nest array insert..?
Thanks!
Ok, basically appears there is no real good answer unless I write my own tool in something like Java or Ruby to pass the objects in (meh effort)... But that's a real pain so instead I decided to simply split the files down to 4MB chunks... Just wrote a simple shell script using split (note that I had to split the files multiple times because of the limitations). I used the split command with -l (line numbers) so each file had x number of lines in it. In my case each Json object was about 4kb so I just guessed line sizes.
For anyone wanting to do this remember that split can only make 676 files (26*26) so you need to make sure each file has enough lines in it to avoid missing half the files. Any way put all this in a good old bash script and used mongo import and let it run overnight. Easiest solution IMO and no need to cut and mash files and parse JSON in Ruby/Java or w.e. else.
The scripts are a bit custom, but if anyone wants them just leave a comment and ill post.
Without knowing anything about the structure of your data I would say that if you can't use mongoimport you're out of luck. There is no other standard utility that can be tweaked to interpret arbitrary JSON data.
When your data isn't a 1:1 fit to what the import utilities expect, it's almost always easiest to write a one-off import script in a language like Ruby or Python to do it. Batch inserts will speed up the import considerably, but don't do too large batches or else you will get errors (the max size of an insert in 1.8+ is 16Mb). In the Ruby driver a batch insert can be done by simply passing an array of hashes to the insert method, instead of a single hash.
If you add an example of your data to the question I might be able to help you further.

Multiple JSON files, parse, and load into tables

I'm a real beginner when it comes time for this, so I apologize in advance.
The long and short of what I am looking for is a fairly simple concept - I want to pull JSON data off a server, parse it, and load it into excel, access, or some other type of tables. Basically, I want to be able to store the data so I can filter, sort, and query it.
To make matters a little more complicated, the server will only return truncated results with each JSON, so it will be necessary to make multiple requests to the server.
Are there tools out there or code available which will help me do what I am looking for? I am completely lost, and I have no idea where to start.
(please be gentle)
I'm glad seeing this question b/c I'm doing very similar things! And based on what I'd gone through, it has lot to do with how those tables are designed or even linked together at first, and then the mapping between these tables and different JSON objects at different depth or position in the original JSON file. After the mapping rules are made clear, the code can be done by merely hard-coding the mapping(I mean like: if you got JSON object after a certain parent of it, then you save the data into certain table(s)) if you're using some high level JSON paring library.
OK as i have to dash home from the office now:
Assuming that you are going to use Excel to Parse the data you are going to need:
1.Some Json Parser JSON Parser for VBA
2.Some code to download the JSON
3.A loop of VBA code that loops through each file and parses it into a sheet.
Is this ok for a starter? If you are struggling let me know and I will try and knock something up a little better over the weekend.