Debug a large json file that is in one line

Debug a large json file that is in one line - json

I have a 2MB json file that is only all in one line and now I get an error using jq:
$ jq .<nodes.json
parse error: Invalid literal at line 1, column 377140
How do I debug this on the console? To look at the mentioned column, I tried this:
head -c 377139 nodes.json|tail -c 1000
But I cannot find any error with a wrong t there, so it seems it is not the correct way to reach the position in the file.
How can I debug such a one-liner?

cut the file into more lines with
cat nodes.json|cut -f 1- -d} --output-delimiter=$'}\n'>/tmp/a.json
and analyse /tmp/a.json with, then you get an error with line nr:
parse error: Invalid literal at line 5995, column 47
use less -N /tmp/a.json to find that line

I see you are on a shell prompt. So you could try perl, because your operating system has it pre-installed, presumably.
cat nodes.json | json_xs -f json -t json-pretty
This tells the json_xs command line program to parse the file and prettify it.
If you don't have json_xs installed, you could try json_pp (pp is for pure-perl).
If you have neither, you must install the JSON::XS perl module with this command:
sudo cpanm JSON::XS
[sudo] password for knb:
--> Working on JSON::XS
Fetching http://www.cpan.org/authors/id/M/ML/MLEHMANN/JSON-XS-3.01.tar.gz ... OK
Configuring JSON-XS-3.01 ... OK
Building and testing JSON-XS-3.01 ... OK
Successfully installed JSON-XS-3.01 (upgraded from 2.34)
1 distribution installed
This installs JSON::XS and a few helper scripts, among them json_xs and json_pp.
Then you can run this simple one-liner:
cat dat.json | json_xs -f json -t json-pretty
After misplacing a parenthesis to force a nesting-error somewhere in the valid json file dat.json I got this:
cat dat.json | json_xs -f json -t json-pretty
'"' expected, at character offset 1331 (before "{"A_DESC":"density i...") at /usr/local/bin/json_xs line 181, <STDIN> line 1.
Maybe this is more informative than the jq output.

Related

Why is JSON from aws rds run in Docker "malformed" according to other tools?

To my eyes the following JSON looks valid.
{
"DescribeDBLogFiles": [
{
"LogFileName": "error/postgresql.log.2022-09-14-00",
"LastWritten": 1663199972348,
"Size": 3032193
}
]
}
A) But, jq, json_pp, and Python json.tool module deem it invalid:
# jq 1.6
> echo "$logfiles" | jq
parse error: Invalid numeric literal at line 1, column 2
# json_pp 4.02
> echo "$logfiles" | json_pp
malformed JSON string, neither array, object, number, string or atom,
at character offset 0 (before "\x{1b}[?1h\x{1b}=\r{...") at /usr/bin/json_pp line 51
> python3 -m json.tool <<< "$logfiles"
Expecting value: line 1 column 1 (char 0)
B) But on the other hand, if the above JSON is copy & pasted into an online validator, both 1 and 2, deem it valid.
As hinted by json_pp's error above, hexdump <<< "$logfiles" indeed shows additional, surrounding characters. Here's the prefix: 5b1b 313f 1b68 0d3d 1b7b ...., where 7b is {.
The JSON is output to a logfiles variable by this command:
logfiles=$(aws rds describe-db-log-files \
--db-instance-identifier somedb \
--filename-contains 2022-09-14)
# where `aws` is
alias aws='docker run --rm -it -v ~/.aws:/root/.aws amazon/aws-cli:2.7.31'
> bash --version
GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
Have perused this GitHub issue, yet can't figure out the cause. I suspect that double quotes get mangled somehow when using echo - some reported that printf "worked" for them.

The use of docker run --rm -it -v command to produce the JSON, added some additional unprintable characters to the start of the JSON data. That makes the resulting file $logfiles invalid.
The -t option allocations a tty and the -i creates an interactive shell. In this case the -t is allowing the shell to read login scripts (e.g. .bashrc). Something in your start up scripts is outputting ansi escape codes. Often this will to clear the screen, set up other things for the interactive shell, or make the output more visually appealing by colorizing portions of the data.

How to output pure JSON with no warnings in openshift's `oc ... -o json` command

When I output as JSON when using the openshift oc command, it also outputs warnings to stdout.
How can I output only the JSON so it can be parsed correctly?
For an example, the corresponding github issue has example: oc new-app $(git remote get-url origin) --dry-run --context-dir my-dir --name mw -o json > my.json which results in Line 1 of the JSON file having: warning: Cannot check if git requires authentication. This is not valid JSON.

My current workaround is print all lines after the opening JSON bracket is found:
oc ... -o json | sed -n '/{/,$p'
This will trim the error and warning lines printed at the start, if any.

Explain me how does this Shell pipe magic (... | tee >(tail -c1 >$PULSE) | bzip2 | ...) works?

Here the original source code (relevant 30 lines bash code highlighted)
Here simplified (s3 is a binary which streams to object storage). The dots (...) are options not posted here.
PULSE=$(mktemp -t shield-pipe.XXXXX)
trap "rm -f ${PULSE}" QUIT TERM INT
set -o pipefail
mysqldump ... | tee >(tail -c1 >$PULSE) | bzip2 | s3 stream ...
How does that work exactly? Can you explain me how this redirections and pipes working? Howto debug the error mysqldump: Got errno 32 on write. When manually invoked (only) mysqldump never fails with an error.

The tricky part is that:
tee writes to standard output as well as a file
>( cmd ) creates a writeable process substitution (a command that mimics the behaviour of a writeable file)
This is used to effectively pipe the output of mysqldump into two other commands: tail -c1 to print the last byte to a file and bzip2 to compress the stream.
As Inian pointed out in the comments, the error 32 comes from a broken pipe. I guess that this comes from s3 stream terminating (maybe a timeout?) which in turn causes the preceding commands in the pipeline to fail.

how to ignore attribute without quotes in xml

i want to count how many times tag1 occurs
givin this 123.xml file ( streaming from the internet)
<startend>
<tag1 name=myname>
<date>10-10-10</date>
</tag1 >
<tag1 name=yourname>
<date>11-10-10</date>
</tag1 >
</startend>
using : xmlstarlet sel -t -v "count(//tag1)" 123.xml
output :
AttValue: " or ' expected
attributes construct error
how to ignore that the attribute has no " " ?

You input XML/HTML structure has invalid tags/attributes and should be recovered beforehand:
xmlstarlet solution:
xmlstarlet fo -o -R -H -D 123.xml 2>/dev/null | xmlstarlet sel -t -v "count(//tag1)" -n
The output:
2
Details:
fo (or format) - Format XML document(s)
-o or --omit-decl - omit xml declaration
-R or --recover - try to recover what is parsable
-D or --dropdtd - remove the DOCTYPE of the input docs
-H or --html - input is HTML
2>/dev/null - suppress errors/warnings

XML always requires quotes around attribute values. If you want to keep using XML, you first must produce valid XML from your input. You could use an SGML processor such as OpenSP (in particular, the osx program) to format your input into wellformed XML. It's as simple as invoking osx <your Input file> on it.
If you're on Ubuntu/Debian Linux, you can install osx by invoking sudo apt-get install opensp on the command line (and similarly on other Unix systems).

What is the correct syntax for jq?

Is there a commandline documentation to use jq? I am currently running this command:
%jq% -f JSON.txt -r ".sm_api_content"
It is supposed to read from JSON.txt and to output the value of sm_api_content (which is a string).
But I am getting this error:
jq: error: Could not open file .sm_api_content: No such file or directory
Can anyone help me out here?

-f is for specifying a filename to read your "filter" from - the filter in this case being .sm_api_content
It sounds as if you just want to run jq without -f, e.g.
jq -r .sm_api_content JSON.txt

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Debug a large json file that is in one line - json

cut the file into more lines with cat nodes.json|cut -f 1- -d} --output-delimiter=$'}\n'>/tmp/a.json and analyse /tmp/a.json with, then you get an error with line nr: parse error: Invalid literal at line 5995, column 47 use less -N /tmp/a.json to find that line

Related

Why is JSON from aws rds run in Docker "malformed" according to other tools?

How to output pure JSON with no warnings in openshift's `oc ... -o json` command

Explain me how does this Shell pipe magic (... | tee >(tail -c1 >$PULSE) | bzip2 | ...) works?

how to ignore attribute without quotes in xml

What is the correct syntax for jq?

Categories

Resources