I have a JSON file with a lot of unneeded data and I wish to get rid of most of it.
It a huge file so I need to make an operation that will do that.
I tried Regex but most of the apps I tried seems to stuck in the middle of the process.
What I need is simply find objects by their key and delete them from the file.
Any Ideas?
If the file is too large to be read into memory, you might want to use something like yajl, which provides an event-driven, SAX-like interface. This allows you to make changes to the JSON as you read it (and, I suppose, write it to another file).
Related
Hello, my question is simple but I don't know if it's possible to realize it.
I've got json files structured like that one in the picture and I'd like to use these files, (actually I'd like to use the information contained in them), to create a database. I wanted to understand if it's possible to do this, without the necessity of transcribing all information again, but simply by moving the files into the database that will automatically save them in it.
I have a JSON file with a large array of JSON objects. I am using JsonTextReader on StreamReader to read data from the files. But, also, some attributes need to be updated as well.
Is it possible to use JsonTextWriter to find and update a particular JSON object?
Generally, to modify a file means reading the whole file to memory, making the change, then writing the whole thing back out to the file. (There are certain file formats that don't require this by virtue of having a static-size layout or other mechanisms designed to work around having to read in the whole file but JSON isn't one of those.)
JSON.net is capable of reading and writing JSON streams as a series of tokens, so it should be possible to minimize the memory footprint by using this. However you will still be reading the entire file into memory and then writing it back out. Because of the simultaneous read/write, you'd need to write to a temp file instead and then, once you're done, move/rename that temp file to the correct place.
Depending on how you've structured the JSON, you may also need to keep track of where you are in that structure. This can be done by tracking the tokens as they're received and using them to maintain a kind of "path" into the structure. That path can be used to determine when you're at a place that needs updating.
The general strategy is to read in tokens, alter them if required, then write them out again.
I want to store file data of a directory in a file. i.e., file name, file size etc so that I can reduce search time. My problem now is to find an efficient way to do it. I have considered json and xml but can't decide between these two. Also if there is a better way let me know.
I'd say that it's up to what kind of data you prefer to work with and to what structure of data you have (very simple as a list of word, less simple as a list of word and the number of time each word was searched,...)
For a list of word you can use a simple text file with one word per line or coma separated (csv), for a less simple structure, json or xml will work fine.
I like to work with json as it's more light than xml and less verbose. If you didn't plan to share this data and/or it isn't complex, you don't need the validation (xsd,...) offered by xml.
And even if you plan to share this data, you can work with json.
You'll need some server side code to write the data to a file, like php, java, python, ruby,...
I would recommend Json file if you use alomst like a properties file.
If you plan to store the data in the file into database then you can go for XML where u have to the option to use JAXB/JPA in java environment
I want to do a bulk load into MongoDB. I have about 200GB of files containing JSON objects which I want to load, the problem is I cannot use the mongoimport tool as the objects contain objects (i.e. I'd need to use the --jsonArray aaram) which is limited to 4MB.
There is the Bulk Load API in CouchDB where I can just write a script and use cURL to send a POST request to insert the documents, no size limits...
Is there anything like this in MongoDB? I know there is Sleepy but I am wondering if this can cope with a JSON nest array insert..?
Thanks!
Ok, basically appears there is no real good answer unless I write my own tool in something like Java or Ruby to pass the objects in (meh effort)... But that's a real pain so instead I decided to simply split the files down to 4MB chunks... Just wrote a simple shell script using split (note that I had to split the files multiple times because of the limitations). I used the split command with -l (line numbers) so each file had x number of lines in it. In my case each Json object was about 4kb so I just guessed line sizes.
For anyone wanting to do this remember that split can only make 676 files (26*26) so you need to make sure each file has enough lines in it to avoid missing half the files. Any way put all this in a good old bash script and used mongo import and let it run overnight. Easiest solution IMO and no need to cut and mash files and parse JSON in Ruby/Java or w.e. else.
The scripts are a bit custom, but if anyone wants them just leave a comment and ill post.
Without knowing anything about the structure of your data I would say that if you can't use mongoimport you're out of luck. There is no other standard utility that can be tweaked to interpret arbitrary JSON data.
When your data isn't a 1:1 fit to what the import utilities expect, it's almost always easiest to write a one-off import script in a language like Ruby or Python to do it. Batch inserts will speed up the import considerably, but don't do too large batches or else you will get errors (the max size of an insert in 1.8+ is 16Mb). In the Ruby driver a batch insert can be done by simply passing an array of hashes to the insert method, instead of a single hash.
If you add an example of your data to the question I might be able to help you further.
I'm a real beginner when it comes time for this, so I apologize in advance.
The long and short of what I am looking for is a fairly simple concept - I want to pull JSON data off a server, parse it, and load it into excel, access, or some other type of tables. Basically, I want to be able to store the data so I can filter, sort, and query it.
To make matters a little more complicated, the server will only return truncated results with each JSON, so it will be necessary to make multiple requests to the server.
Are there tools out there or code available which will help me do what I am looking for? I am completely lost, and I have no idea where to start.
(please be gentle)
I'm glad seeing this question b/c I'm doing very similar things! And based on what I'd gone through, it has lot to do with how those tables are designed or even linked together at first, and then the mapping between these tables and different JSON objects at different depth or position in the original JSON file. After the mapping rules are made clear, the code can be done by merely hard-coding the mapping(I mean like: if you got JSON object after a certain parent of it, then you save the data into certain table(s)) if you're using some high level JSON paring library.
OK as i have to dash home from the office now:
Assuming that you are going to use Excel to Parse the data you are going to need:
1.Some Json Parser JSON Parser for VBA
2.Some code to download the JSON
3.A loop of VBA code that loops through each file and parses it into a sheet.
Is this ok for a starter? If you are struggling let me know and I will try and knock something up a little better over the weekend.