JQ can't parse \u2022 character - json

I'm trying to perform a bulk upload to Elasticsearch (around 1mln documents). In order to do that, I'm using jq to reformat the JSON file extracted from MySQL database and curl to post the data to Elasticsearch:
cat dataset.json | jq -r -c '.[] | { "index" : { } }, .' | curl -u login:password -H "Content-Type: application/json" -XPOST "https://.../skills/default/_bulk?pretty" --data-binary #-
I get an error:
parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 276249, column 317
I found that the character that jq can't parse is \u2022. I tried adding "-r" jq command but the error stil occurs. How can I handle this for all occurrences of \u2022?

Here's verification that \u2022 is properly handled by various versions of jq in a Mac environment:
$ echo '"\u2022"' | jq-1.4 .
"•"
$ echo '"•"' | jq-1.6 .
"•"
$ echo '"•"' | jq-1.5 .
"•"
$ echo '"•"' | jq-1.4 .
"•"
$
Perhaps the problem is related to a bug that was fixed since the release of jq 1.5 (see e.g. https://github.com/stedolan/jq/issues/1311).
If you are having difficulties with jq version 1.6 (the current version), please provide a minimal complete verifiable example
with further details about the computing environment.

Related

Filtering array using curl (qBittorrent Web API)

Hoping someone can help me. I'm trying to understand the qBittorrent Web API. At the moment I'm listing all the paused torrents with:
curl -i http://localhost:8080/api/v2/torrents/info?category=test
The problem is that lists the whole JSON array - my question is can I just display the "name" or "hash" fields? This is all using curl through cmd, but I've tried this in Git Bash & Powershell:
[{"eta":8640000,"f_l_piece_prio":false,"force_start":false,"hash":"8419d48d86a14335c83fdf4930843438a2f75a6b","last_activity":1664863523,"magnet_uri":"","max_seeding_time":0,"**name**":"TestTorrentName","num_complete":12,"num_incomplete":1,"num_leechs":0,"num_seeds":0,"priority":0,"progress":1,"ratio":0,"ratio_limit":-2,"save_path":"F:\\Completed\\test\\","seeding_time":0,"seeding_time_limit":-2,"seen_complete":1664863523,"seq_dl":false,"size":217388295,"state":"pausedUP","super_seeding":false,"tags":"","time_active":569,"total_size":217388295,"tracker":"udp://open.stealth.si:80/announce","trackers_count":10,"up_limit":-1,"uploaded":0,"uploaded_session":0,"upspeed":0}]
I've tried the following that should work according to https://jqplay.org/ - see screenshot
curl -i http://localhost:8080/api/v2/torrents/info?category=test | jq --raw-output '.[] | .name'
But unfortunately I'm getting the following error:
curl -i http://localhost:8080/api/v2/torrents/info?category=test | jq --raw-output '.[] | .name'
% Total % Received % Xferd Average Speed Time '.name'' is not recognized as an internal or external command,
operable program or batch file.
Ti
curl -i http://localhost:8080/api/v2/torrents/info?category=test | jq --raw-output '.[] | .name'
The -i let curl give some header info, that is parsed to jq, but jq can only parse JSON end therefore fails.
Remove the -i and optionally replace it with -s to remove the stats:
curl -s http://localhost:8080/api/v2/torrents/info?category=test | jq --raw-output '.[] | .name'

Parsing json output using jq -jr

I am running a puppet bolt command query certain information from a set of servers in json format. I am piping it to jq.. Below is what I get
$ bolt command run "cat /blah/blah" -n #hname.txt -u uid --no-host-key-check --format json |jq -jr '.items[]|[.node],[.result.stdout]'
[
"node-name"
][
"stdout data\n"
]
What do I need to do to make it appear like below
["nodename":"stdout data"]
If you really want output that is not valid JSON, you will have to construct the output string, which can easily be done using string interpolation, e.g.:
jq -r '.items[] | "[\"\(.node)\",\"\(.result.stdout)\"]"'
#peak thank you.. that helped. Below is how it looks like
$ bolt command run "cat /blah/blah" -n #hname.txt -u UID --no-host-key-check --format json |jq -r '.items[] | "[\"\(.node)\",\"\(.result.stdout)\"]"'
["node name","stdout data
"]
I used a work around to get the data I needed by using the #csv flag to the command itself. Sharing with you below what worked.
$ bolt command run "cat /blah/blah" -n #hname.txt -u uid --no-host-key-check --format json |jq -jr '.items[]|[.node],[.result.stdout]|#csv'
""node-name""stdout.data
"

How to pass bash variable to JSON

I'm trying to write a sample script where I'm generating names like 'student-101...student-160'. I need to post JSON data and when I do, I get a JSON parse error.
Here's my script:
name="student-10"
for i in {1..1}
do
r_name=$name$i
echo $r_name
curl -i -H 'Authorization: token <token>' -d '{"name": $r_name, "private": true}' "<URL>" >> create_repos_1.txt
echo created $r_name
done
I always get a "Problems parsing JSON" error. I've tried various combination of quotes, etc but nothing seems to work!
What am I doing wrong?
First, your name property is a string, so you need to add double quotes to it in your json.
Second, using single quotes, bash won't do variable expansion: it won't replace $r_name with the variable content (see Expansion of variable inside single quotes in a command in bash shell script for more information).
In summary, use:
-d '{"name": "'"$r_name"'", "private": true}'
Another option is to use printf to create the data string:
printf -v data '{"name": "%s", "private": true}' "$r_name"
curl -i -H 'Authorization: token <token>' -d "$data" "$url" >> create_repos_1.txt
Don't; use jq (or something similar) to build correctly quoted JSON using variable inputs.
name="student-10"
for i in {1..1}
do
r_name=$name$i
jq -n --arg r_name "$r_name" '{name: $r_name, private: true}' |
curl -i -H 'Authorization: token <token>' -d #- "<URL>" >> create_repos_1.txt
echo created $r_name
done
The #- argument tells curl to read data from standard input (via the pipe from jq) to use for -d.
Something like "{\"name\": \"$r_name\", \"private\": true}" may work, but it is ugly and will also fail if r_name contains any character which needs to be quoted in the resulting JSON, such as double quotes or ASCII control characters.

Using awk to extract a token from a larger JSON string

I have a string assigned to a variable:
#/bin/bash
fullToken='{"type":"APP","token":"l0ng_Str1ng.of.d1fF3erent_charAct3rs"}'
I need to extract only l0ng_Str1ng.of.d1fF3erent_charAct3rs without quotes and assign that to another variable.
I understand I can use awk, sed, or cut but I am having trouble getting around the special characters in the original string.
Thanks in advance!
EDIT: I was not awake I should specify this is JSON. Thanks for the replies so far.
EDIT2: I am using BSD (macOS)
It looks like you have a JSON string there. Keep in mind that JSON is unordered, so most sed, awk, cut solutions will fail if you string comes next time in a different order.
It is most robust to use a JSON parser.
You could use ruby with its JSON parser library:
$ echo "$fullToken" | ruby -r json -e 'p JSON.parse($<.read)["token"];'
"l0ng_Str1ng.of.d1fF3erent_charAct3rs"
Or, if you don't want the quoted string (which is useful for Bash):
$ echo "$fullToken" | ruby -r json -e 'puts JSON.parse($<.read)["token"];'
l0ng_Str1ng.of.d1fF3erent_charAct3rs
Or with jq:
$ echo "$fullToken" | jq '.token'
"l0ng_Str1ng.of.d1fF3erent_charAct3rs"
All these solutions will work even if the JSON string is in a different order:
$ echo '{"type":"APP","token":"l0ng_Str1ng.of.d1fF3erent_charAct3rs"}' | jq '.token'
"l0ng_Str1ng.of.d1fF3erent_charAct3rs"
$ echo '{"token":"l0ng_Str1ng.of.d1fF3erent_charAct3rs", "type":"APP"}' | jq '.token'
"l0ng_Str1ng.of.d1fF3erent_charAct3rs"
But KNOWING that you SHOULD use a JSON parser, you can also use a PCRE with a look behind in Gnu Grep:
$ echo "$fullToken" | grep -oP '(?<="token":)"([^"]*)'
Or in Perl:
$ echo "$fullToken" | perl -lane 'print $1 if /(?<="token":)"([^"]*)/'
Both of those also work if the string is in a different order.
Or, with POSIX awk:
$ echo "$fullToken" | awk -F"[,:}]" '{for(i=1;i<=NF;i++){if($i~/"token"/){print $(i+1)}}}'
Or, with POSIX sed, you can do:
$ echo "$fullToken" | sed -E 's/.*"token":"([^"]*).*/\1/'
Those solutions are presented strongest (use a JSON parser) to more fragile (sed). But the sed solution I have there is better than the other because it will support the key, values in the JSON string being in different order.
Ps: If you want to remove the quotes from a line, that is a great job for sed:
$ echo '"quoted string"'
"quoted string"
$ echo '"quoted string"' | sed -E 's/^"(.*)"$/UN\1/'
UNquoted string
In awk:
$ awk -v f="$fullToken" '
BEGIN{
while(match(f,/[^:{},]+:[^:{},]+/)) { # search key:value pairs
p=substr(f,RSTART,RLENGTH) # set pair to p
f=substr(f,RSTART+RLENGTH) # remove p from f
split(p,a,":") # split to get key and value
for(i in a) # remove leadin and trailing "
gsub(/^"|"$/,"",a[i])
if(a[1]=="token") { # if key is token
print a[2] # output value
exit # no need to process further
}
}
}'
l0ng_Str1ng.of.d1fF3erent_charAct3rs
l0ng_String can't have characters :{}.
GNU sed:
fullToken='{"type":"APP","token":"l0ng_Str1ng.of.d1fF3erent_charAct3rs"}'
echo "$fullToken"|sed -r 's/.*"(.*)".*/\1/'
grep method would be,
$ grep -oP '[^"]+(?="[^"]+$)' <<< "$fullToken"
l0ng_Str1ng.of.d1fF3erent_charAct3rs
Brief explanation,
[^"]+ : grep would extract the non-" pattern
(?="[^"]+$): extract until the pattern ahead of last "
You may also use sed method to do that,
$sed -E 's/.*"([^"]+)"[^"]+$/\1/' <<< "$fullToken"
l0ng_Str1ng.of.d1fF3erent_charAct3rs
If the source of your string is JSON, then you should use JSON-specific tools. If not, then consider:
Using awk
$ fullToken='{"type":"APP","token":"l0ng_Str1ng.of.d1fF3erent_charAct3rs"}'
$ echo "$fullToken" | awk -F'"' '{print $8}'
l0ng_Str1ng.of.d1fF3erent_charAct3rs
Using cut
$ echo "$fullToken" | cut -d'"' -f8
l0ng_Str1ng.of.d1fF3erent_charAct3rs
Using sed
$ echo "$fullToken" | sed -E 's/.*"([^"]*)"[^"]*$/\1/'
l0ng_Str1ng.of.d1fF3erent_charAct3rs
Using bash and one of the above
The above all work with POSIX shells. If the shell is bash, then we can use a here-string and eliminate the pipeline. Taking cut as the example:
$ cut -d'"' -f8 <<<"$fullToken"
l0ng_Str1ng.of.d1fF3erent_charAct3rs

Bash script output JSON variable to file

I am using twurl in Ubuntu's command line to connect to the Twitter Streaming API, and parse the resulting JSON with this processor. I have the following command, which returns the text of tweets sent from London:
twurl -t -d locations=-5.67,50.06,1.76,58.62 language=en -H stream.twitter.com /1.1/statuses/filter.json | jq '.text'
This works great, but I'm struggling to output the result to a file called london.txt. I have tried the following, but still no luck:
twurl -t -d locations=-5.67,50.06,1.76,58.62 language=en -H stream.twitter.com /1.1/statuses/filter.json | jq '.text' > london.txt
As I'm fairly new to Bash scripting, I'm sure I've misunderstood the proper use of '>' and '>>', so if anyone could point me in the right direction that'd be awesome!
twurl -t -d locations=-5.67,50.06,1.76,58.62 language=en -H stream.twitter.com /1.1/statuses/filter.json | jq '.text' > london.txt
It will replace on each new line pasting on it. But if you use >> it will append the next write operation to end of file. So try the following rather above example, I'm certain it will work.
twurl -t -d locations=-5.67,50.06,1.76,58.62 language=en -H stream.twitter.com /1.1/statuses/filter.json | jq '.text' >> london.txt
Also you can use tee command to check what is printing along side the redirection
twurl -t -d locations=-5.67,50.06,1.76,58.62 language=en -H stream.twitter.com /1.1/statuses/filter.json | jq '.text'| tee london.txt