Massive data file conversion - json

I have a .data file that is +19MB in size. If I open it in texteditor on my Mac I get the circle of death. I've tried moving into a json file in Atom, but that too breaks. It's all on 1 line, so trying to beautify to break down file into manageable chunks is breaking my things too. I have cut about 150 lines worth, beautified it, and it came out great, but that's kb of data. I've got a +19MB file.
I've tried multiple ways of cutting and pasting to see if I can make smaller files, but that too breaks everything. Can I get this into sql?
I'm working on my personal project and this data file is critical to my project. Thanks!

Related

Why do .pdn files for paint.net contain a bunch of gibberish?

I was using paint.net (An image program) and I decided to open a .pdn file as raw text because I was curious, what I saw was a bunch of gibberish! Why is the data stored like this?
It is most likely stored as binary. This wont make sense to view as a human. However this makes it quick and easy for the program to understand. It also most likely reduces the amount of space the file takes up. Most programs store data like this.

Split a monster json file in smaller json files

as by title I have a beast of json files that I would like to split into smaller files. The size of this json file is 8.7 Gb. The format/json are located at this link: Detailed book graph
. is the first paragraph. The file is big enough to saturate the RAM of my PC (32 GB)
I tried to look for some tools online or on github, but nothing worked. Anyone have any idea how I can do that?

R Markdown HTML output doesn't match the R Studio output

I am sorry in advance if this question sounds stupid: I am at ease with R but relatively new to Markdown. I realize that Markdown .Rmd script is meant to be reproducible, so whatever is in the Markdown script has to come from it and not Global Environment or other script. I have done a tedious work of copying my very long intial .R script ito .Rmd, with explanations, like a report. My problem is the following: after running the code in .Rmd script I get the outputs below each chunk. I then Knit it, and the outputs in HTML document are not the same. The essentials are the same, but the model summaries are not. I simply cannot understand why.
I have of course tried restarting R Studio, cleaning up Global Environment and starting again from blank script. The tooth-grinding problem is, my script is long and some chunks are heavy (like imputation of missing data using MICE). So every time I have a problem and I have to re-compute everything, it's a very long coffee break.
While I cannot include the code for this reason, I still hope very much that someone has encountered this problem before and can share their experience. I particularly want to know what happens if you leave some chunks {r eval=FALSE} and run them manually for the first time only. Could this be a source of the problems? If so, how do you guys Knit long computation-heavy scripts?
Thanks very much in advance.
P.S. After throwing this bottle into the sea, I'll go and try splitting my script into few scripts to pinpoint the problem (and to be able to include the part that causes the problem).
So, apparently the bug above has the following explanation:
The outputs shown below the chunk codes in R.Studio(.Rmd) are based on the data held in Global Environment.
The Knitted HTML, on the contrary, is rendered by running the script from .Rmd.
Normally it shouldn't pose a problem. But if some code chunks are with eval=FALSE to skip the repeated lengthy execution (in my case, data imputation using MICE), then there's imputed data in Global Environment and non-imputed data being knitted. So, the models in knitted HTML are run on incomplete set of data and are all off.
Before receiving the suggestion with cache=TRUE, I found another workaround, which is doing all required transformations and imputations once, then saving the data with a new code chunk, then setting EVAL=FALSE for this chunk and the chunks above that no longer have to be run (even though some of them still have to be shown).
Then, I import the treated data in a hidden chunk (eval=TRUE, include=FALSE) and run the rest of the training, etc. While technically it's not the best in terms of reproducibility, it saved my neck and computation time.

.json to .csv "big" file

I recently downloaded my location history from Google. From 2014 to present.
The resulting .json file was 997,000 lines, plus a few.
All of the online converters would freeze and lock up unless I did it in really small slices which isn't an option. (Time constraints)
I've gotten a manual process down between Sublime Text and Libre Office to get my information transferred, but I know there's an easier way somewhere.
I even tried the fastFedora plug-in which I couldn't get to work.
Even though I'm halfway done, and will likely finish up using my process, is there an easier way?
I can play with Java though I'm no pro. Any other languages that play well with .json?
A solution that supports nesting without flattening the file. Location data is nested and needs to remain nested (or the like) to make sense. At least grouped.

Can Lilypond produce variable paper height output?

I have a large number of small scores intended for Sunday morning service leaflets. Preparing the image for insertion into the MS Word document includes removing all the vertical blank space, which is different for each piece.
I currently create custom paper sizes via #(set! paper-alist (cons ...)) but there is still quite a lot of cropping the output images.
Is there a better way?
I wrote a Bash shell script to do this: https://github.com/andrewacashner/lilypond/lilycrop.sh
In the terminal of a Unix-based system (or perhaps on Cygwin on Windows, though I haven't tested that), this script will automatically crop the PDF output by lilypond to a minimum size. It produces a separate cropped PDF file for each page of the original.
Lilypond has no internal way to produce automatically cropped output. On linux, you can use the pdfcrop tool which is part of the texlive-extra-utils package. Does the job nicely.