I have a large JSON file that contains bigints with their full values--not rounded like JavaScript loves to do by default.
We have a workaround to deal with the bigints in Node.js, but I'm trying to use jq (the command-line tool) to clean up our data.
However, when I ran jq on our JSON file, it rounded all of our bigints.
Is there a way to use jq so that it doesn't round the bigints or is there perhaps another command-line tool that works on a Mac that I may use instead?
As of right now, the best jq has to offer with respect to JSON numbers is the "master" version, which preserves the external numerical value very well. The updates were made on or about 22 Oct 2019, and the "master" version of jq seems to be as safe to use as the most recent release (jq 1.6).
Examples using a recent "master" version:
jqMaster -n -M '
[0000,
10000000000000000000000000000000000000012,
1.0000000000000000000000000000000000000012,
1000000000000000000000000000000000000001210000000000000000000000000000000000000012,
0.1e123456]'
Output
[
0,
10000000000000000000000000000000000000012,
1.0000000000000000000000000000000000000012,
1000000000000000000000000000000000000001210000000000000000000000000000000000000012,
1E+123455
]
Another option would be to use “gojq”, the Go implementation of jq that uses unbounded-precision representation of integer literals.
In fact, except for one bug that has only been fixed in the “master” version of gojq as of this writing, gojq supports unbounded-precision integer arithmetic. The bug fix: https://github.com/itchyny/gojq/commit/7a1840289029c9c038d61274ceac9b8d307c0358
Related
This question already has an answer here:
jq special characters in nested keys
(1 answer)
Closed 3 years ago.
I am trying to use the jq command line JSON processor https://shapeshed.com/jq-json/ (which works great) to process a JSON file that seems to have been made using some poor choices.
Normally your id and value in the JSON file would not contain any periods such as:
{"id":"d9s7g9df7sd9","name":"Tacos"}
To get Tacos from the file you would do the following in bash:
echo $json | jq -r '.name'
This will give you Tacos (There may be some extra code missing from that example but you get the point.)
I have a JSON file that looks like this:
{"stat.blah":123,"stat.taco":495,"stat.yum... etc.
Notice how they decided to use a period in the identifying field associated with the value? This makes using jq very difficult because it associates the period as a separator to dig down into child values in the JSON. Sure, I could first load my file, replace all "." with "_" and that would fix the problem, but this seems like a really dumb and hackish solution. I have no way to change how the initial JSON file is generated. I just have to deal with it. Is there a way in bash I can do some special escape to make it ignore the period?
Thanks
Use generic object index syntax, e.g:
.["stat.taco"]
If you use the generic object syntax, e.g. .["stat.taco"], then chaining is done either using pipes as usual, or without the dot, e.g.
.["stat.taco"]["inner.key"]
If your jq is sufficiently recent, then you can use the chained-dot notation by quoting the keys with special characters, e.g.
."stat.taco"."inner.key"
You can also mix-and-match except that expressions such as: .["stat.taco"].["inner.key"] are not (as of jq 1.6) supported.
I have actually a situation were i run into the BIGINT Problem and the haziness with that in jq 1.5/1.6 (in a Windows enviroment).
I read the issue reports and thought that if i transform the number to a string, i can handle that. But i test it with a specific command and the result is the same
[. | { last_update: .starbase_detailed_scan.last_update_time, user_name: .starbase_detailed_scan.owner_name, alliance_id: .starbase_detailed_scan.owner_alliance_id | tostring, drydocks: .starbase_detailed_scan.num_drydocks, tier: .starbase_detailed_scan.owner_level, defence_plattform: .starbase_detailed_scan.num_defence_platforms, shield_triggered: .starbase_detailed_scan.player_shield.triggered_on, shield_end: .starbase_detailed_scan.player_shield.expiry_time, parsteel: .starbase_detailed_scan.resources["2325683920"], tritanium: .starbase_detailed_scan.resources["743985951"], dilithium: .starbase_detailed_scan.resources["2614028847"], user_id: .starbase_detailed_scan.owner_user_id, defence_rating: .starbase_detailed_scan.defense_rating }]
result:
[{"last_update":"2018-12-23T19:26:24","user_name":"Hamita40","alliance_id":"774615702811599900","drydocks":3,"tier":19,"defence_plattform":3,"shield_triggered":"0001-01-01T00:00:00","shield_end":"0001-01-01T00:00:00","parsteel":183649,"tritanium":22459,"dilithium":7074,"user_id":"a2588903decc455283c88508f6a7fedf","defence_rating":25200}]
The Alliance_id is not correct. The correct id is:
774615702811599864
Is there any Workaround?
BR
Timo
Using tostring is not going to help, because tostring only gets to see the number after the jq parser has read the input.
The jq maintainers are well-aware of this issue and indeed there is a "pull request" which addresses it:
https://github.com/stedolan/jq/pull/1752
If you wish to use an officially released version of jq, then the only available "workaround" will be to change the number in the JSON source to a string. You might wish to use the "bigint" library for handling "bigint" strings - https://github.com/joelpurra/jq-bigint
UPDATES
As of Oct 22, 2019, the version of jq at "master" preserves the precision of numbers that are read, and tostring can be used on such numbers without loss of precision, e.g.
$ jq tostring
123456789123456789123456789123456789123456789123456789
"123456789123456789123456789123456789123456789123456789"
You might alternatively wish to use gojq, the Go implementation of jq.
jq is a lightweight and flexible command-line JSON processor.
https://stedolan.github.io/jq/
Is there a jq command line tool or wrapper which lets you pipe output into it and interactively explore jq, with the JSON input in one pane and your interactively updating result in another pane, similar to jmespath.terminal ?
I'm looking for something similar to the JMESPath Terminal jpterm
"JMESPath exploration tool in the terminal"
https://github.com/jmespath/jmespath.terminal
I found this project jqsh but it's not maintained and it appears to produce a lot of errors when I use it.
https://github.com/bmatsuo/jqsh
I've used https://jqplay.org/ and it's a great web based jq learning tool. However, I want to be able to, in the shell, pipe the json output of a command into an interactive jq which allows me to explore and experiment with jq commands.
Thanks in advance!
I've been using jiq and I'm pretty happy with it.
https://github.com/fiatjaf/jiq
It's jid with jq.
You can drill down interactively by using jq filtering queries.
jiq uses jq internally, and it requires you to have jq in your PATH.
Using the aws cli
aws ec2 describe-regions --region-names us-east-1 us-west-1 | jiq
jiq output
[Filter]> .Regions
{
"Regions": [
{
"Endpoint": "ec2.us-east-1.amazonaws.com",
"RegionName": "us-east-1"
},
{
"Endpoint": "ec2.us-west-1.amazonaws.com",
"RegionName": "us-west-1"
}
]
}
https://github.com/simeji/jid
n.b. I'm not clear how strictly it follows jq syntax and feature set
You may have to roll-your-own.
Of course, jq itself is interactive in the sense that if you invoke it without specifying any JSON input, it will process STDIN interactively.
If you want to feed the same data to multiple programs, you could easily write your own wrapper. Over at github, there's a bash script named jqplay that has a few bells and whistles. For example, if the input command begins with | then the most recent result is used as input.
Example 1
./jqplay -c spark.json
Enter a jq filter (possibly beginning with "|"), or blank line to terminate:
.[0]
{"name":"Paddington","lovesPandas":null,"knows":{"friends":["holden","Sparky"]}}
.[1]
{"name":"Holden"}
| .name
"Holden"
| .[0:1]
"H"
| length
1
.[1].name
"Holden"
Bye.
Example 2
./jqplay -n
Enter a jq filter (possibly beginning and/or ending with "|"), or blank line to terminate:
?
An initial | signifies the filter should be applied to the previous jq
output.
A terminating | causes the next line that does not trigger a special
action to be appended to the current line.
Special action triggers:
:exit # exit this script, also triggered by a blank line
:help # print this help
:input PATHNAME ...
:options OPTIONS
:save PN # save the most recent output in the named file provided
it does not exist
:save! PN # save the most recent output in the named file
:save # save to the file most recently specified by a :save command
:show # print the OPTIONS and PATHNAMEs currently in effect
:! PN # equivalent to the sequence of commands
:save! PN
:input PN
? # print this help
# # ignore this line
1+2
3
:exit
Bye.
If you're using Emacs (or willing to) then JQ-mode allows you to run JQ filters interactively on the current JSON document buffer:
https://github.com/ljos/jq-mode
There is a new one: https://github.com/PaulJuliusMartinez/jless
JLess is a command-line JSON viewer designed for reading, exploring, and searching through JSON data.
JLess will pretty print your JSON and apply syntax highlighting.
Expand and collapse Objects and Arrays to grasp the high- and low-level structure of a JSON document. JLess has a large suite of vim-inspired commands that make exploring data a breeze.
JLess supports full text regular-expression based search. Quickly find the data you're looking for in long String values, or jump between values for the same Object key.
I'm creating a Bash script to parse the air pollution levels from the webpage:
http://aqicn.org/city/beijing/m/
There is a lot of stuff in the file, but this is the relevant bit:
"iaqi":[{"p":"pm25","v":[59,21,112],"i":"Beijing pm25 (fine
particulate matter) measured by U.S Embassy Beijing Air Quality
Monitor
(\u7f8e\u56fd\u9a7b\u5317\u4eac\u5927\u4f7f\u9986\u7a7a\u6c14\u8d28\u91cf\u76d1\u6d4b).
Values are converted from \u00b5g/m3 to AQI levels using the EPA
standard."},{"p":"pm10","v":[15,5,69],"i":"Beijing pm10
(respirable particulate matter) measured by Beijing Environmental
Protection Monitoring Center
I want the script to parse and display 2 numbers: current PM2.5 and PM10 levels (the numbers in bold in the above paragraph).
CITY="beijing"
AQIDATA=$(wget -q 0 http://aqicn.org/city/$CITY/m/ -O -)
PM25=$(awk -v FS="(\"p\":\"pm25\",\"v\":\\\[|,[0-9]+)" '{print $2}' <<< $AQIDATA)
PM100=$(awk -v FS="(\"p\":\"pm10\",\"v\":\\\[|,[0-9]+)" '{print $2}' <<< $AQIDATA)
echo $PM25 $PM100
Even though I can get PM2.5 levels to display correctly, I cannot get PM10 levels to display. I cannot understand why, because the strings are similar.
Anyone here able to explain?
The following approach is based on two steps:
(1) Extracting the relevant JSON;
(2) Extracting the relevant information from the JSON using a JSON-aware tool -- here jq.
(1) Ideally, the web service would provide a JSON API that would allow one to obtain the JSON directly, but as the URL you have is intended for viewing with a browser, some form of screen-scraping is needed. There is a certain amount of brittleness to such an approach, so here I'll just provide something that currently works:
wget -O - http://aqicn.org/city/beijing/m |
gawk 'BEGIN{RS="function"}
$1 ~/getAqiModel/ {
sub(/.*var model=/,"");
sub(/;return model;}/,"");
print}'
(gawk or an awk that supports multi-character RS can be used; if you have another awk, then first split on "function", using e.g.:
sed $'s/function/\\\n/g' # three backslashes )
The output of the above can be piped to the following jq command, which performs the filtering envisioned in (2) above.
(2)
jq -c '.iaqi | .[]
| select(.p? =="pm25" or .p? =="pm10") | [.p, .v[0]]'
The result:
["pm25",59]
["pm10",15]
I think your problem is that you have a single line HTML file that contains a script that contains a variable that contains the data you are looking for.
Your field delimiters are either "p":"pm100", "v":[ or a comma and some digits.
For pm25 this works, because it is the first, and there are no occurrences of ,21 or something similar before it.
However, for pm10, there are some that are associated with pm25 ahead of it. So the second field contains the empty string between ,21 and ,112
#karakfa has a hack that seems to work -- but he doesn't explain very well why it works.
What he does is use awk's record separator (which is usually a newline) and sets it to either of :, ,, or [. So in your case, one of the records would be "pm25", because it is preceded by a colon, which is a separator, and succeeded by a comma, also a separator.
Once it hits the matching content ("pm25") it sets a counter to 4. Then, for this and the next records, it counts this counter down. "pm25" itself, "v", the empty string between : and [, and finally reaches one when hitting the record with the number you want to output: 4 && ! 3 is false, 3 && ! 2 is false, 2 && ! 1 is false, but 1 && ! 0 is true. Since there is no execution block, awk simply prints this record, which is the value you want.
A more robust work would probably be using xpath to find the script, then use some json parser or similar to get the value.
chw21's helpful answer explains why your approach didn't work.
peak's helpful answer is the most robust, because it employs proper JSON parsing.
If you don't want to or can't use third-party utility jq for JSON parsing, I suggest using sed rather than awk, because awk is not a good fit for field-based parsing of this data.
$ sed -E 's/^.*"pm25"[^[]+\[([0-9]+).+"pm10"[^[]+\[([0-9]+).*$/\1 \2/' <<< "$AQIDATA"
59 15
The above should work with both GNU and BSD/OSX sed.
To read the result into variables:
read pm25 pm10 < \
<(sed -E 's/^.*"pm25"[^[]+\[([0-9]+).+"pm10"[^[]+\[([0-9]+).*$/\1 \2/' <<< "$AQIDATA")
Note how I've chosen lowercase variable names, because it's best to avoid all upper-case variables in shell programming, so as to avoid conflicts with special shell and environment variables.
If you can't rely on the order of the values in the source string, use two separate sed commands:
pm25=$(sed -E 's/^.*"pm25"[^[]+\[([0-9]+).*$/\1/' <<< "$AQIDATA")
pm10=$(sed -E 's/^.*"pm10"[^[]+\[([0-9]+).*$/\1/' <<< "$AQIDATA")
awk to the rescue!
If you have to, you can use this hacky way using smart counters with hand-crafted delimiters. Setting RS instead of FS transfers looping through fields to awk itself. Multi-char RS is not available for all awks (gawk supports it).
$ awk -v RS='[:,[]' '$0=="\"pm25\""{c=4} c&&!--c' file
59
$ awk -v RS='[:,[]' '$0=="\"pm10\""{c=4} c&&!--c' file
15
I have been looking for a way to reformat a CSV (Pipe separator) file with some if parameters, I'm pretty sure this can be done in PHP (strpos and if statements) or using XSLT but wanted to know if this is the best/easiest way to do it before I go and learn my way around a new language. here is a small example of the kind of thing I'm trying to achieve (the real file is about 25000 lines is this changes the answer?)
99407350|Math Book #13 (Random Information)|AB Collings|http:www.abc.com/ABC
497790366|English Book|Harold Herbert|http:www.abc.com/HH
Transform to this:
99407350|Math Book|#13|AB Collings|http:www.abc.com/ABC
497790366|English Book||Harold Herbert|http:www.abc.com/HH
Any advice about which direction I need to look in would be great.
PHP provides getcsv() (PHP 5) and fgetcsv() (PHP 4 and 5) for this, so if you are working in a PHP environment, use that. See e.g. http://www.php.net/manual/en/function.fgetcsv.php
If you do something yourself, remember to cope with "...|..." and/or \| to have | inside a field. Or test to make sure it can't happen - e.g. check the code that exports the database to CSV if that's what's happening.
Note also - on Unix / Solaris / Linux / OS X systems,
awk -F '|' '(NF != 9)' yourfile.csv | wc
will count the number of lines with other than 9 fields; if you are certain | never occurs except as a field delimiter, awk is a perfectly fine language for this too, e.g. with
awk -F '|' '{ gsub(/ [(].*[)]/, "", $1); print}' yourfile.csv
Here, [(] matches ( in a way that works across different versions of awk, and same for [)].