I need to format a large JSON file for readability, but every resource I've found (mostly online) doesn't deal with data say, above 1-2 MB. I need to format about 30 MB. Is there any way to do this, or any way to code something to do this?
With python >= 2.6 you can do the following:
For Mac/Linux users:
cat ugly.json | python -mjson.tool > pretty.json
For Windows users (thanks to the comment from dnk.nitro):
type ugly.json | python -mjson.tool > pretty.json
jq can format or beautify a ~100MB JSON file in a few seconds:
jq '.' myLargeUnformattedFile.json > myLargeBeautifiedFile.json
The command above will beautify a single-line ~120MB file in ~10 seconds, and jq gives you a lot of json manipulation capabilities beyond simple formatting, see their tutorials.
jsonpps is the only one worked for me (https://github.com/bazaarvoice/jsonpps).
It doesn't load everything to RAM unlike jq, jsonpp and others that I tried.
Some useful tips regarding installation and usage:
Download url: https://repo1.maven.org/maven2/com/bazaarvoice/jsonpps/jsonpps/1.1/jsonpps-1.1.jar
Shortcut (for Windows):
Create file jsonpps.cmd in the same directory with the following content:
#echo off
java -Xms64m -Xmx64m -jar %~dp0\jsonpps-1.1.jar %*
Shortcut usage examples:
Format stdin to stdout:
echo { "x": 1 } | jsonpps
Format stdin to file
echo { "x": 1 } | jsonpps -o output.json
Format file to file:
jsonpps input.json -o output.json
Background-- I was trying to format a huge json file ~89mb on VS Code using the command (Alt+Shift+F) but the usuals, it crashed. I used jq to format my file and store it in another file.
A windows 11 use case is shown below.
step 1- download jq from the official site for your respective OS - https://stedolan.github.io/jq/
step 2- create a folder in the C drive named jq and paste the executable file that you downloaded into the folder. Rename the file as jq (Error1: beware the file is by default an exe file so do not save it as 'jq.exe' save it only as 'jq')
step 3- set your path variable to the URL of the executable file.
step 4- open your directory on cmd where the json file is stored and type the following command - jq . currentfilename.json > targetfilename.json
replace currentfilename with the file name that you want to format
replace targetfilename with the final file name that you want your data formatted in
within seconds you should see your target file in the same directory in a formatted version which can now be opened on VS Code or any editor for that matter. Any error related to the recognizability of jq as a command can be traced back with high probability to Error 1.
jq jquery json data-preprocessing data-cleaning
You can use Notepad++ (https://notepad-plus-plus.org/downloads/) for formatting large JSON files (tested in Windows).
Install Notepad++
Go to Plugins -> Plugins Admin -> Install the 'Json Viewer' plugin. The plugin source code is present in https://github.com/kapilratnani/JSON-Viewer
After plugin installation, go to Plugins -> JSON Viewer -> Format JSON.
This will format your JSON file
Related
I have an ORC file on my local machine and I need any reasonable format from it (e.g. CSV, JSON, YAML, ...).
How can I convert ORC to CSV?
Download
Extract the files, go to the java folder and execute maven: mvn install
Use ORC-Tools
This is how I use them - you will likely need to adjust the paths:
java -jar ~/.m2/repository/org/apache/orc/orc-tools/1.5.4/orc-tools-1.5.4-uber.jar data ~/your_file.orc > output.json
The output is JSON Lines which is easy to convert to CSV. First I needed to remove the last two lines from the output. Then:
import pandas as pd
df = pd.read_json('output.json', lines=True)
df.to_csv('output.csv')
Another option could be bigdata-file-viewer, it's a cross-platform application. You can open an ORC file and save the file in CSV format.
The detailed usage is as following:
Download runnable jar from release page or follow Build section to build from source code.
Invoke it by java -jar BigdataFileViewer-1.2-SNAPSHOT-jar-with-dependencies.jar
Open binary format file by "File" -> "Open". Currently, it can open file with parquet suffix, orc suffix and avro suffix. If no suffix specified, the tool will try to extract it as Parquet file
Set the maximum rows of each page by "View" -> Input maximum row number -> "Go"
Set visible properties by "View" -> "Add/Remove Properties"
Convert to CSV file by "File" -> "Save as" -> "CSV"
Check schema information by unfolding "Schema Information" panel
Is there any parser, which I can use to parse the json present in yarn job logs(jhist files) which gets stored in hdfs to extract information from it.
The second line in the .jhist file is the avro schema for the other jsons in the file. Meaning that you can create avro data out of the jhist file.
For this you could use avro-tools-1.7.7.jar
# schema is the second line
sed -n '2p;3q' file.jhist > schema.avsc
# removing the first two lines
sed '1,2d' file.jhist > pfile.jhist
# finally converting to avro data
java -jar avro-tools-1.7.7.jar fromjson pfile.jhist --schema-file schema.avsc > file.avro
You've got an avro data, which you can for example import to a Hive table, and make queries on it.
You can check out Rumen, a parsing tool from the apache ecosystem
or When you visit the web UI, go to job history and look for the job for which you want to read .jhist file. Hit the Counters link at the left,now you will be able see an API which gives you all the parameters and the value like CPU time in milliseconds etc. which will read from a .jhist file itself.
I am using https://github.com/Keyang/node-csvtojson to convert a csv file into json object but not sure how to create a js file from the output. Using the command
$ csvtojson ./mycsv.csv
I want to save the output into new file, how can I do that using command line?
$ csvtojson ./mycsv.csv > converted.json
See the docs.
From the help section, it doesn't seem like csvtojson cli supports it. But you can always use > to output the command output to a file.
Example: ls > dir_structure.txt.
We run load tests with JMeter and would like to export result data (throughput, latency, requests per second etc.) to JSON, either a file or STDOUT. How can we do that?
JMeter can save the results in a CSV format with header.
(Do not forget to select Save Field Names - it is OFF by default)
Then you can use this tool to covert the CSV to a JSON.
http://www.convertcsv.com/csv-to-json.htm
EDIT
JMeter stores the result in XML or CSV format. XML is by default (with .jtl extension). But It is always recommended to save the result in csv format.
If you want to convert XML to JSON
http://www.utilities-online.info/xmltojson/#.U9O2ifldVBk
If you are planning to use CSV, To save the result in CSV format automatically
When you are running your test via command line, to save the result in csv for a specific test
%JMETER_HOME%\bin\jmeter.bat" -n -t %TESTNAME% -p %PROPERTY_FILE_PATH% -l %RESULT_FILE_PATH% -j %LOG_FILE_PATH% -Djmeter.save.saveservice.output_format=csv
Or
You can update the jmeter.properties in bin folder to enable below property (for any test you run)
jmeter.save.saveservice.output_format=csv
Hope, it is clear!
There is no OOTB solution for this but you could inspire yourself from this patch:
https://issues.apache.org/bugzilla/show_bug.cgi?id=53668
I would like to generate xml file from an extisng csv file using xslt.
Can anybody tell me command to use.
I don't knwo the command to convert the file.
Suppose my csv file named :- source.csv
ouput template :- temp.xsl
command:-
msxsl source.csv temp.xsl -o result.xml
Is this the right command or not?
Here is a XSL file to convert CSV to XML: http://andrewjwelch.com/code/xslt/csv/csv-to-xml_v2.html
To run it from the command line, the instructions say to download Saxon and use:
java -cp saxon.jar net.sf.saxon.Transform -o output.xml -it main csv-to-xml.xslt pathToCSV=file:/C:/dev/test.csv
Here are the parts of that command line explained:
java The java executable (programming language in which Saxon is written
-cp saxon.jar saxon.jar contains the XSLT code, -cp stands for "classpath" and tells java how to find it
-o output.xml Where the output should go. That is result.xml in your example.
-it main csv-to-xml.xslt specify the xslt file (csv-to-xml.xslt) and the entry point within it (main)
pathToCSV=file:/C:/dev/test.csv your input csv file (source.csv in your example, but formatted as a url)
I do not have sufficient reputation to comment on Stephen's answer.
the transform as described is highly dependent on the XSL stylesheet which defines a parameter pathToCSV and a template with the identifier of "main"
the command will not work as Stephen has written it for version 9 of the Home Edition; when I attempt to run the command as written I get a response of "Command line option -o requires a value". However, this format of the command works as of the day of this posting:
java -cp saxon9he.jar net.sf.saxon.Transform -o:csvfile.xml -it:main "csv2xml.xsl" pathToCSV="csvfile.csv"
the linked xsl appears to be buggy (probably hasn't been maintained) and will not correctly transform all csv files (e.g. the csv example from Michael Kay's book). However, it is a good example from which to learn.