how to ignore attribute without quotes in xml - html

i want to count how many times tag1 occurs
givin this 123.xml file ( streaming from the internet)
<startend>
<tag1 name=myname>
<date>10-10-10</date>
</tag1 >
<tag1 name=yourname>
<date>11-10-10</date>
</tag1 >
</startend>
using : xmlstarlet sel -t -v "count(//tag1)" 123.xml
output :
AttValue: " or ' expected
attributes construct error
how to ignore that the attribute has no " " ?

You input XML/HTML structure has invalid tags/attributes and should be recovered beforehand:
xmlstarlet solution:
xmlstarlet fo -o -R -H -D 123.xml 2>/dev/null | xmlstarlet sel -t -v "count(//tag1)" -n
The output:
2
Details:
fo (or format) - Format XML document(s)
-o or --omit-decl - omit xml declaration
-R or --recover - try to recover what is parsable
-D or --dropdtd - remove the DOCTYPE of the input docs
-H or --html - input is HTML
2>/dev/null - suppress errors/warnings

XML always requires quotes around attribute values. If you want to keep using XML, you first must produce valid XML from your input. You could use an SGML processor such as OpenSP (in particular, the osx program) to format your input into wellformed XML. It's as simple as invoking osx <your Input file> on it.
If you're on Ubuntu/Debian Linux, you can install osx by invoking sudo apt-get install opensp on the command line (and similarly on other Unix systems).

Related

Why is JSON from aws rds run in Docker "malformed" according to other tools?

To my eyes the following JSON looks valid.
{
"DescribeDBLogFiles": [
{
"LogFileName": "error/postgresql.log.2022-09-14-00",
"LastWritten": 1663199972348,
"Size": 3032193
}
]
}
A) But, jq, json_pp, and Python json.tool module deem it invalid:
# jq 1.6
> echo "$logfiles" | jq
parse error: Invalid numeric literal at line 1, column 2
# json_pp 4.02
> echo "$logfiles" | json_pp
malformed JSON string, neither array, object, number, string or atom,
at character offset 0 (before "\x{1b}[?1h\x{1b}=\r{...") at /usr/bin/json_pp line 51
> python3 -m json.tool <<< "$logfiles"
Expecting value: line 1 column 1 (char 0)
B) But on the other hand, if the above JSON is copy & pasted into an online validator, both 1 and 2, deem it valid.
As hinted by json_pp's error above, hexdump <<< "$logfiles" indeed shows additional, surrounding characters. Here's the prefix: 5b1b 313f 1b68 0d3d 1b7b ...., where 7b is {.
The JSON is output to a logfiles variable by this command:
logfiles=$(aws rds describe-db-log-files \
--db-instance-identifier somedb \
--filename-contains 2022-09-14)
# where `aws` is
alias aws='docker run --rm -it -v ~/.aws:/root/.aws amazon/aws-cli:2.7.31'
> bash --version
GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
Have perused this GitHub issue, yet can't figure out the cause. I suspect that double quotes get mangled somehow when using echo - some reported that printf "worked" for them.
The use of docker run --rm -it -v command to produce the JSON, added some additional unprintable characters to the start of the JSON data. That makes the resulting file $logfiles invalid.
The -t option allocations a tty and the -i creates an interactive shell. In this case the -t is allowing the shell to read login scripts (e.g. .bashrc). Something in your start up scripts is outputting ansi escape codes. Often this will to clear the screen, set up other things for the interactive shell, or make the output more visually appealing by colorizing portions of the data.

How to pass bash variable value in json file

Add bash variables value to json file
I am trying to get latest zip file from nexus using below curl command.
Ouput of curl comes like this : 1.0.0.0-20190205.195251-396
In the json field i need this value(1.0.0.0-20190205.195251-396) to be updated like this: developer-service-1.0.0.0-20190205.195251-396.zip
ERROR: [2019-02-06T16:19:17-08:00] WARN: remote_file[/var/chef/cache/developer-service-.zip] cannot be downloaded from https://nexus.gnc.net/nexus/content/repositories/CO-Snapshots/com/GNC/platform/developer/developer-service/1.0.9.9-SNAPSHOT/developer-service-.zip: 404 "Not Found"
#!/bin/bash
latest=`curl -s http://nexus.gnc.net/nexus/content/repositories/CO-Snapshots/com/gnc/platform/developer/developer-service/1.0.9.9-SNAPSHOT/maven-metadata.xml | grep -i value | head -1 | cut -d ">" -f 2 | cut -d "<" -f 1`
echo $latest
sudo bash -c 'cat << EOF > /etc/chef/deploy_service.json
{
"portal" : {
"nexus_snapshot_version":"developer-service-${latest}.zip"
}
}
EOF'
The problem is that when you use "${latest}", it's inside single-quotes, and hence not treated as a variable reference, just some literal text; it's passed to a subshell (the bash -c, and that will parse it as a variable reference and replace it with the value of the variable latest, but that variable is only defined in the parent shell, not in the subshell. You could probably export the variable (so it gets inherited by subprocesses) and use sudo -E to prevent sudo from cleaning the environment (hence removing the variable)... But this whole thing is an overcomplicated mess; there's a much simpler way, using the standard sudo tee trick:
sudo tee ./deploy_service.json >/dev/null <<EOF
{
"portal" : {
"nexus_snapshot_version":"developer-service-${latest}.zip"
}
}
EOF
This way there's not single-quoted string, no subshell, etc. The variable reference is now just in a plain here-document (that's interpreted by the shell that knows $latest), and gets expanded normally.

Debug a large json file that is in one line

I have a 2MB json file that is only all in one line and now I get an error using jq:
$ jq .<nodes.json
parse error: Invalid literal at line 1, column 377140
How do I debug this on the console? To look at the mentioned column, I tried this:
head -c 377139 nodes.json|tail -c 1000
But I cannot find any error with a wrong t there, so it seems it is not the correct way to reach the position in the file.
How can I debug such a one-liner?
cut the file into more lines with
cat nodes.json|cut -f 1- -d} --output-delimiter=$'}\n'>/tmp/a.json
and analyse /tmp/a.json with, then you get an error with line nr:
parse error: Invalid literal at line 5995, column 47
use less -N /tmp/a.json to find that line
I see you are on a shell prompt. So you could try perl, because your operating system has it pre-installed, presumably.
cat nodes.json | json_xs -f json -t json-pretty
This tells the json_xs command line program to parse the file and prettify it.
If you don't have json_xs installed, you could try json_pp (pp is for pure-perl).
If you have neither, you must install the JSON::XS perl module with this command:
sudo cpanm JSON::XS
[sudo] password for knb:
--> Working on JSON::XS
Fetching http://www.cpan.org/authors/id/M/ML/MLEHMANN/JSON-XS-3.01.tar.gz ... OK
Configuring JSON-XS-3.01 ... OK
Building and testing JSON-XS-3.01 ... OK
Successfully installed JSON-XS-3.01 (upgraded from 2.34)
1 distribution installed
This installs JSON::XS and a few helper scripts, among them json_xs and json_pp.
Then you can run this simple one-liner:
cat dat.json | json_xs -f json -t json-pretty
After misplacing a parenthesis to force a nesting-error somewhere in the valid json file dat.json I got this:
cat dat.json | json_xs -f json -t json-pretty
'"' expected, at character offset 1331 (before "{"A_DESC":"density i...") at /usr/local/bin/json_xs line 181, <STDIN> line 1.
Maybe this is more informative than the jq output.

Simple way to verify valid BPF filter

What is the simplest way to verify a BPF filter as a normal user?
Easiest I have found is to run tcpdump with a small pcap file as input to the -r option.
$ tcpdump -r one_packet.pcap -F invalid_bpf.conf 2> /dev/null ; echo $?
1
$ tcpdump -r one_packet.pcap -F valid_bpf.conf 2> /dev/null ; echo $?
0
Returns standard error codes for invalid or valid BPF filters. This requires that I have a PCAP file to provide as input.
Is there a way to do this simple test without a PCAP file or special privileges?
IF you have a shell that has a built-in "echo" command that supports escape sequences, one somewhat-perverse way of doing this would be to do
echo -en "\0324\0303\0262\0241\02\0\04\0\0\0\0\0\0\0\0\0\0377\0377\0\0\01\0\0\0"|\
./tcpdump -r - -F bpf.conf 2>/dev/null; echo $?
This worked for me on OS X 10.8, which has bash 3.2.48(1)-release (x86_64-apple-darwin12).
That "echo" command writes out a short pcap file with no packets in it, and with a link-layer header type of DLT_EN10MB. That will test whether the filter is valid for Ethernet; there are filters that are valid for some link-layer header types but not valid for others, such as "not broadcast", which is valid for Ethernet but not for PPP, so you'll need to choose some link-layer header type to use when testing.

Export collection in mongodb using shell command

I want to export collection in mongodb using shell command:
I try the following command but fields with ":" (number:phone, number:fax) are not exported.
mongoexport --csv -d schemaName -c collectionName -q "{typeName:'user'}" -f "name, surname, e-mail, number:phone, number:fax" -o export.csv
I think that you have found a legitimate bug. The mongoexport tool is rarely used and the colon means something very specific when parsing JSON, so the tool is probably confused.
You can file the bug here: http://jira.mongodb.org/