Formatting a json file to add another field - json

I have a json file with the format given below.I want to modify the file so as to add another key-value pair to it. The key should be url and the value should be www.mywebsite.co.nz extracted from the message given below. What is the easiset way to do this?
{"
Timestamp":"Mon Mar 16 21:37:22 EDT 2015","Event":"Reporting Time","Message":"load for http://xxx.xx.xx.xx:1xxxx/operations&proxy=www.mywebsite.co.nz&send=https://xxx.xx.xx.xx:xxxx/operations?event took 9426 ms (X Time: 306 ms, Y Time: 1923 ms)
StatusCode: Unknown<br>Cookies: nzh_weatherlocation=12; dax_ppv=11|NZH:home|NZH:home|NZH:home|9|undefined; _ga=GA1.4.1415798036.1426208630; _gat=1<br>Links: 225<br>Images: 24<br>Forms: 10<br>Browser: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/41.0.2272.76 Chrome/41.0.2272.76 Safari/537.36<br>CPUs: 2<br>Language: en-GB","UserInfo":"Reporting Time"}

As a combination of jq and sed:
jq ".url = \"$(jq '.Message' input.json | sed 's/.*proxy=\([^&]*\).*/\1/')\"" input.json > output.json
This consists of three steps:
jq '.Message' input.json
extracts the message part from the input JSON,
sed 's/.*proxy=\([^&]*\).*/\1/'
extracts the domain from the message, and
jq ".url = \"domainname\"" input.json > output.json
sets the .url attribute of the input json to the extracted domain name, writing the result to output.json.
I feel compelled to point out, by the way, that a domain name by itself is not technically a URL, so you may want to rethink that attribute name.

For perl users, using ojo:
perl -Mojo -E '$j=j(b("input.file")->slurp);if($j->{Message}=~m/proxy=(.*?)&/){$j->{url}=$1;say j($j)}'
decomposed:
b()->slurp - reads the input.file
j() - converts the json to perl data
if the Message contains "proxy=site&" - get the site
add to the data the url => site
j() convert to json string
and print it.

Related

Transform file content into json string with jq doesn't work in command substitution

This is working as expected:
$ cat /etc/tfe-config/sources/fluent-bit.conf.tpl | jq -R -s
$ "[OUTPUT]\n Name cloudwatch_logs\n Match *\n region eu-central-1\n log_group_name TFE-LogForwarding\n log_stream_name TFE-AllLogs"
However, assignment to a variable does not work:
$ MY_VARIABLE=$(cat /etc/tfe-config/sources/fluent-bit.conf.tpl | jq -R -s)
$ echo $MY_VARIABLE
jq - commandline JSON processor [version 1.5]
Usage: jq [options] <jq filter> [file...]
jq is a tool for processing JSON inputs, applying the
given filter to its JSON text inputs and producing the
filter's results as JSON on standard output.
The simplest filter is ., which is the identity filter,
copying jq's input to its output unmodified (except for
formatting).
For more advanced filters see the jq(1) manpage ("man jq")
and/or https://stedolan.github.io/jq
Some of the options include:
-c compact instead of pretty-printed output;
.... trimmed
I am on AWS EC2 machine with the latest Amazon Linux 2 image.
What is going on here?
The file looks like this:
[OUTPUT]
Name cloudwatch_logs
Match *
region eu-central-1
log_group_name TFE-LogForwarding
log_stream_name TFE-AllLogs
Two things:
You have to specify a filter for jq – just . to get the entire input
Once in a variable with whitespace, you must quote the string, else it shows up differently when you print it
var=$(cat /etc/tfe-config/sources/fluent-bit.conf.tpl | jq -R -s '.')
echo "$var"
Relevant Q&A:
How to use `jq` in a shell pipeline? (and also this GitHub issue)
I just assigned a variable, but echo $variable shows something else

Bash & cURL: Retrieve JSON from web API and search it for specific key:value pairs

I'm cURLing a web API for an application/json response, that response is a set of key value pairs, like this:
{"id":89,"name":"User saved 2018-07-03 12:01:47.337483","create_time":1530644507337,"auto":false,"recovered":false}
{"id":49,"name":"User saved 2018-05-24 12:33:53.927798","create_time":1527190433927,"auto":false,"recovered":false}
{"id":199,"name":"Daily backup 2018-10-22 02:37:37.332271","create_time":1540201057332,"auto":true,"recovered":false}
etc, etc...
I'd like to iterate through this response and find the highest value integer for the "id" key then save that as a variable. If the above was my whole JSON I'd want to end up with variable=199.
Doing something like this:
MY_VARIABLE=$(curl -k -X GET --header "Accept: application/json" \
--header "Authorization: MyAPITarget apikey=${MY_APIKEY}" \
"https://targetserver/api/methodImCalling" |
The output of that is the JSON above. Can I pipe that output into an array and iterate through it but only look for the value of "id" then do something similar to a:
for (i = 0; i < id.length; i++)
I've only been working with code a short while and most my background at this point is JS, trying to make the connection here for bash. I'm trying to avoid using any "installed" tools whatsoever which is why I'm using bash, I'd like this script to run "out of the box" on any linux / unix platform. Any tips? Thanks!
It's probably a separate installation, but the tool you want is jq:
max_id=$(curl ... | jq -s 'map(.id) | max')
The standard tools that one can expect to be pre-installed simply aren't suitable for working with JSON.
While not standard, any machine that has curl installed is likely to have Python installed, and you can use its standard json module to process the JSON properly. Here's a somewhat ungainly one-liner:
curl ... |
python -c 'import json,sys; x="[%s]"%(",".join(sys.stdin),); print(max(y["id"] for y in json.loads(x)))'
Other non-standard but common languages (Perl, Ruby, etc) probably also have built-in ways to consume JSON.

How to parse JSON from stdin at Native Messaging host?

Utilizing the code at How do I use a shell-script as Chrome Native Messaging host application as a template and given the file file.json which contains
{"text":"abc"}
following the code at Iterate over json with jq and the jq documentation
$ cat file.json | jq --raw-output '.text'
outputs
abc
Am not certain how to incorporate the pattern at this Answer
while read -r id name date; do
echo "Do whatever with ${id} ${name} ${date}"
done< <(api-producing-json | jq --raw-output '.newList[] | "\(.id) \(.name) \(.create.date)"')
into the template at the former Answer for the purpose of capturing the single property "text" (abc) from the JSON within the loop using jq for the ability to pass that text to another system call then printf the message to client.
What we are trying to achieve is
json=$(<bash program> <captured JSON property>)
message='{"message": "'$json'"}'
where the {"text":"abc"} is sent to the Native Messaging host from client (Chromium app).
How to use jq within the code at the former Answer to get the JSON property as a variable?
Assuming that file.json contains the JSON as indicated, I believe all you will need is:
json=$(jq '{message: .}' file.json)
If you then echo "$json", the result will be:
{
"message": {
"text": "abc"
}
}

Parsing Filter Needed For Newline Delimited JSON format

I'm running into a problem as I attempt to automate an API process into BigQuery.
The issue is that I need the data to be in a newline delimited JSON format to go into my BigQuery database but the data I'm pulling does not do that, so I need to parse it out.
Here is a link to pastebin so you can get an idea of what the data looks like, but also, here it is just because:
{"type":"user.list","users":[{"type":"user","id":"581c13632f25960e6e3dc89a","user_id":"ieo2e6dtsqhiyhtr","anonymous":false,"email":"test#gmail.com","name":"Joe Martinez","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"Houston","continent_code":"NA","country_name":"United States","latitude":29.7633,"longitude":-95.3633,"postal_code":"77002","region_name":"Texas","timezone":"America/Chicago","country_code":"USA"},"last_request_at":1478235114,"last_seen_ip":"66.87.120.30","created_at":1478234979,"remote_created_at":1478234944,"signed_up_at":1478234944,"updated_at":1478235145,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Linux; Android 6.0.1; SM-G920P Build/MMB29K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.68 Mobile Safari/537.36","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"1","memberType":"claimant"}},{"type":"user","id":"581c22a19a1dc02c460541df","user_id":"1o3helrdv58cxm7jf","anonymous":false,"email":"test#mail.com","name":"Joe Coleman","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"San Jose","continent_code":"NA","country_name":"United States","latitude":37.3394,"longitude":-121.895,"postal_code":"95141","region_name":"California","timezone":"America/Los_Angeles","country_code":"USA"},"last_request_at":1478239113,"last_seen_ip":"216.151.183.47","created_at":1478238881,"remote_created_at":1478238744,"signed_up_at":1478238744,"updated_at":1478239113,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"2","memberType":"claimant"}}],"scroll_param":"24ba0fac-b8f9-46b2-944a-9bb523dcd1b1"}
The two problems are the first line:
{"type":"user.list","users":
And the final piece at the bottom:
,"scroll_param":"24bd0rac-b2f9-46b2-944a-9zz543dcd1b1"}
If you eliminate those two, you are simply left with the necessary data needed, and I know what filter is needed to parse it out to put it in newline delimited format.
You can see for yourself by playing around with this tool, but if you only copy and paste everything from that first open bracket to the close bracket on the final line, set it to "Compact Output" and apply the filter:
.[]
The result will be like what you see here, in a nice and neat newline delimited format like you see here., also here it is not in the link:
{"type":"user","id":"581c13632f25960e6e3dc89a","user_id":"ieo2e6dtsqhiyhtr","anonymous":false,"email":"test#gmail.com","name":"Joe Martinez","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"Houston","continent_code":"NA","country_name":"United States","latitude":29.7633,"longitude":-95.3633,"postal_code":"77002","region_name":"Texas","timezone":"America/Chicago","country_code":"USA"},"last_request_at":1478235114,"last_seen_ip":"66.87.120.30","created_at":1478234979,"remote_created_at":1478234944,"signed_up_at":1478234944,"updated_at":1478235145,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Linux; Android 6.0.1; SM-G920P Build/MMB29K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.68 Mobile Safari/537.36","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"1","memberType":"claimant"}}
{"type":"user","id":"581c22a19a1dc02c460541df","user_id":"1o3helrdv58cxm7jf","anonymous":false,"email":"test#mail.com","name":"Joe Coleman","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"San Jose","continent_code":"NA","country_name":"United States","latitude":37.3394,"longitude":-121.895,"postal_code":"95141","region_name":"California","timezone":"America/Los_Angeles","country_code":"USA"},"last_request_at":1478239113,"last_seen_ip":"216.151.183.47","created_at":1478238881,"remote_created_at":1478238744,"signed_up_at":1478238744,"updated_at":1478239113,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"2","memberType":"claimant"}}
So what I need is a filter I can apply in the same manner I used .[] that pull out all the text prior to the first open bracket (as I highlighted above) as well as all the text prior to the closed bracket at the end.
But here's where the final problem comes in. While I need that final piece of text out of the equation, I still do need that string of letters and numbers known as the scroll paramater. This is because in order to fully capture all the data I need in the API, I need to continuously use the new scroll paramater it generates from the command line call until all the data is in.
The initial call looks as such:
$ curl -s https://api.program.io/users/scroll -u 'dG9rOmU5NGFjYTkwXzliNDFfNGIyMF9iYzA0XzU0NDg3MjE5ZWJkZDoxOjA=': -H 'Accept:application/json'
But in ordere to get all the info in, I need that scroll parameter for a seperate call that looks like:
curl -s https://api.intercom.io/users/scroll?scroll_param=foo -u 'dG9rOmU5NGFjYTkwXzliNDFfNGIyMF9iYzA0XzU0NDg3MjE5ZWJkZDoxOjA=': -H 'Accept:application/json' >scroll.json
So while I need to get rid of the text in the blob that contains the paramater in order to put it in newline delimited format, I still need to extract whatever that paramater is to loop back into another script that will continue to run until it is empty.
Would love to hear any advice in working around this!
Like others who have posted comments, I won't pretend to understand the details of the specific question, but if the general question is how to use jq to emit newline-delimited JSON (that is, ensure that each JSON text is followed by a newline, and that no other (raw) newlines are added), the answer is simple: use jq with the -c option, and without the -r option.
From a cursory examination of your data, the filter
.users[]
will give you just the user data to load and the filter
.scroll_param
will return just the scroll parameter. If you put your data in a file you could invoke jq once for each filter but if you have to stream the data you could simply use the , operator to return one value after another. e.g.
.scroll_param
, .users[]
If you use that filter along with the -c option jq will generate output like
"24ba0fac-b8f9-46b2-944a-9bb523dcd1b1"
{"type":"user","id":"581c13632f25960e6e3dc89a","user_id":"ieo2e6dtsqhiyhtr",...
{"type":"user","id":"581c22a19a1dc02c460541df","user_id":"1o3helrdv58cxm7jf",...
presumably the script that reads the output from jq could capture the first line for use in the curl invocation and put the rest of the data into the file you load.
Hope this helps.

Read a json file with bash

I kwould like to read the json file from http://freifunk.in-kiel.de/alfred.json in bash and separate it into files named by hostname of each element in that json string.
How do I read json with bash?
How do I read json with bash?
You can use jq for that. First thing you have to do is extract the list of hostnames and save it to a bash array. Running a loop on that array you would then run again a query for each hostname to extract each element based on them and save the data through redirection with the filename based on them as well.
The easiest way to do this is with two instances of jq -- one listing hostnames, and another (inside the loop) extracting individual entries.
This is, alas, a bit inefficient (since it means rereading the file from the top for each record to extract).
while read -r hostname; do
[[ $hostname = */* ]] && continue # paranoia; see comments
jq --arg hostname "$hostname" \
'.[] | select(.hostname == $hostname)' <alfred.json >"out-${hostname}.json"
done < <(jq -r '.[] | .hostname' <alfred.json)
(The out- prefix prevents alfred.json from being overwritten if it includes an entry for a host named alfred).
You can use python one-liner in similar way like (I haven't checked):
curl -s http://freifunk.in-kiel.de/alfred.json | python -c '
import json, sys
tbl=json.load(sys.stdin)
for t in tbl:
with open(tbl[t]["hostname"], "wb") as fp:
json.dump(tbl[t], fp)
'