I am using twurl in Ubuntu's command line to connect to the Twitter Streaming API, and parse the resulting JSON with this processor. I have the following command, which returns the text of tweets sent from London:
twurl -t -d locations=-5.67,50.06,1.76,58.62 language=en -H stream.twitter.com /1.1/statuses/filter.json | jq '.text'
This works great, but I'm struggling to output the result to a file called london.txt. I have tried the following, but still no luck:
twurl -t -d locations=-5.67,50.06,1.76,58.62 language=en -H stream.twitter.com /1.1/statuses/filter.json | jq '.text' > london.txt
As I'm fairly new to Bash scripting, I'm sure I've misunderstood the proper use of '>' and '>>', so if anyone could point me in the right direction that'd be awesome!
twurl -t -d locations=-5.67,50.06,1.76,58.62 language=en -H stream.twitter.com /1.1/statuses/filter.json | jq '.text' > london.txt
It will replace on each new line pasting on it. But if you use >> it will append the next write operation to end of file. So try the following rather above example, I'm certain it will work.
twurl -t -d locations=-5.67,50.06,1.76,58.62 language=en -H stream.twitter.com /1.1/statuses/filter.json | jq '.text' >> london.txt
Also you can use tee command to check what is printing along side the redirection
twurl -t -d locations=-5.67,50.06,1.76,58.62 language=en -H stream.twitter.com /1.1/statuses/filter.json | jq '.text'| tee london.txt
Related
Hoping someone can help me. I'm trying to understand the qBittorrent Web API. At the moment I'm listing all the paused torrents with:
curl -i http://localhost:8080/api/v2/torrents/info?category=test
The problem is that lists the whole JSON array - my question is can I just display the "name" or "hash" fields? This is all using curl through cmd, but I've tried this in Git Bash & Powershell:
[{"eta":8640000,"f_l_piece_prio":false,"force_start":false,"hash":"8419d48d86a14335c83fdf4930843438a2f75a6b","last_activity":1664863523,"magnet_uri":"","max_seeding_time":0,"**name**":"TestTorrentName","num_complete":12,"num_incomplete":1,"num_leechs":0,"num_seeds":0,"priority":0,"progress":1,"ratio":0,"ratio_limit":-2,"save_path":"F:\\Completed\\test\\","seeding_time":0,"seeding_time_limit":-2,"seen_complete":1664863523,"seq_dl":false,"size":217388295,"state":"pausedUP","super_seeding":false,"tags":"","time_active":569,"total_size":217388295,"tracker":"udp://open.stealth.si:80/announce","trackers_count":10,"up_limit":-1,"uploaded":0,"uploaded_session":0,"upspeed":0}]
I've tried the following that should work according to https://jqplay.org/ - see screenshot
curl -i http://localhost:8080/api/v2/torrents/info?category=test | jq --raw-output '.[] | .name'
But unfortunately I'm getting the following error:
curl -i http://localhost:8080/api/v2/torrents/info?category=test | jq --raw-output '.[] | .name'
% Total % Received % Xferd Average Speed Time '.name'' is not recognized as an internal or external command,
operable program or batch file.
Ti
curl -i http://localhost:8080/api/v2/torrents/info?category=test | jq --raw-output '.[] | .name'
The -i let curl give some header info, that is parsed to jq, but jq can only parse JSON end therefore fails.
Remove the -i and optionally replace it with -s to remove the stats:
curl -s http://localhost:8080/api/v2/torrents/info?category=test | jq --raw-output '.[] | .name'
I am running a puppet bolt command query certain information from a set of servers in json format. I am piping it to jq.. Below is what I get
$ bolt command run "cat /blah/blah" -n #hname.txt -u uid --no-host-key-check --format json |jq -jr '.items[]|[.node],[.result.stdout]'
[
"node-name"
][
"stdout data\n"
]
What do I need to do to make it appear like below
["nodename":"stdout data"]
If you really want output that is not valid JSON, you will have to construct the output string, which can easily be done using string interpolation, e.g.:
jq -r '.items[] | "[\"\(.node)\",\"\(.result.stdout)\"]"'
#peak thank you.. that helped. Below is how it looks like
$ bolt command run "cat /blah/blah" -n #hname.txt -u UID --no-host-key-check --format json |jq -r '.items[] | "[\"\(.node)\",\"\(.result.stdout)\"]"'
["node name","stdout data
"]
I used a work around to get the data I needed by using the #csv flag to the command itself. Sharing with you below what worked.
$ bolt command run "cat /blah/blah" -n #hname.txt -u uid --no-host-key-check --format json |jq -jr '.items[]|[.node],[.result.stdout]|#csv'
""node-name""stdout.data
"
I'm trying to perform a bulk upload to Elasticsearch (around 1mln documents). In order to do that, I'm using jq to reformat the JSON file extracted from MySQL database and curl to post the data to Elasticsearch:
cat dataset.json | jq -r -c '.[] | { "index" : { } }, .' | curl -u login:password -H "Content-Type: application/json" -XPOST "https://.../skills/default/_bulk?pretty" --data-binary #-
I get an error:
parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 276249, column 317
I found that the character that jq can't parse is \u2022. I tried adding "-r" jq command but the error stil occurs. How can I handle this for all occurrences of \u2022?
Here's verification that \u2022 is properly handled by various versions of jq in a Mac environment:
$ echo '"\u2022"' | jq-1.4 .
"•"
$ echo '"•"' | jq-1.6 .
"•"
$ echo '"•"' | jq-1.5 .
"•"
$ echo '"•"' | jq-1.4 .
"•"
$
Perhaps the problem is related to a bug that was fixed since the release of jq 1.5 (see e.g. https://github.com/stedolan/jq/issues/1311).
If you are having difficulties with jq version 1.6 (the current version), please provide a minimal complete verifiable example
with further details about the computing environment.
I'm trying to extract JSON from this URL: here
The output that I want is like this https://pastebin.com/BVzUrk6s .Sorry I can't paste it here because of the StackOverFlow character limit.
Here is what I have tried:
curl 'https://www.lazada.co.id/-i160040703-s181911730.html?spm=a2o4j.order_details.details_title.1.52ec6664luQAQs&urlFlag=true&mp=1' | grep -Poz '(?<=app.run\()(.*\n)*.*(?=\);)'
But that command still doesn't extract the JSON data. How do I solve this ? I want to use a pure bash script without installing any programs to do this if possible.
It's a Bad Idea (TM) to attempt JSON parsing this way.
It seems like a Good Idea (TM) to find out what is possible regardless.
#!/bin/bash
function parseUrl() {
local url=$1
echo '"childCategories": ['
curl --silent ${url} \
| awk '/<script type="text" class=J_data/ { show=1 } show; /<\/script>/ { show=0 }' \
| egrep -v "script" \
| sed -e 's/]//g' -e 's/\[//g' -e 's/{"childCategoryName":"","childCategoryUrl":""},//g' -e 's/}$/},/g' \
| sed -e 's/,{/,\'$'\n{/g' -e 's/^[ ]*//g' -e 's/{/ {/g' \
| sed -e 's/childCategoryName/name/g' -e 's/childCategoryUrl/url/g'
echo ' ]'
}
parseUrl 'https://www.lazada.co.id/-i160040703-s181911730.html?spm=a2o4j.order_details.details_title.1.52ec6664luQAQs&urlFlag=true&mp=1' \
| tee /tmp/extracted.json
So there you go: curl, awk, egrep, sed. Use at your own risk.
Code like this isn't extensible, meaning you can't extract nested JSON easily.
It is quite brittle, meaning if someone changes the layout or even CSS, it's bye-bye data extraction.
I am trying to get the href of the most recent production release from Exiftool page.
curl -s 'http://www.sno.phy.queensu.ca/~phil/exiftool/history.html' | grep -o -E "href=[\"'](.*)[\"'].*Version"
Actual output
href="Image-ExifTool-10.36.tar.gz">Version
I want this an as output
Image-ExifTool-10.36.tar.gz
Using grep -P you can use a lookahead and \K for match reset:
curl -s 'http://www.sno.phy.queensu.ca/~phil/exiftool/history.html' |
grep -o -P "href=[\"']\K[^'\"]+(?=[\"']>Version)"
Image-ExifTool-10.36.tar.gz