Parsing curl output [duplicate] - json

This question already has answers here:
Parsing JSON with Unix tools
(45 answers)
Find the value of key from JSON
(5 answers)
Closed 2 years ago.
I'm trying to parse a curl request and parse the output and store it on a file called res.txt
Here is my bash cmd line:
curl --request POST --url 'https://www.virustotal.com/vtapi/v2/url/scan' --data 'apikey=XXXXXXXXXXXXXXX' --data 'url=abcde.xyz' >> grep -Po '"scan_id":.*?[^\\]",' res.txt
The output is something like this:
{"permalink": "https://www.virustotal.com/gui/url/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/detection/u-17f485d68047604e61b4067310ab716ae6fddc774bb46ffab06d081613b28e49-1595992331", "resource": "http://abcde.xyz/", "url": "http://abcde.xyz/", "response_code": 1, "scan_date": "2020-07-29 03:12:11", "scan_id": "000000000000000000000000000000000000000", "verbose_msg": "Scan request successfully queued, come back later for the report"}`
I want to store scan_id code on res.txt, but it is not working, no errors! And I do not know if my regex is correct
Can you help me?

The core of the question is about extracting values from JSON data (created by curl, in this specific case).
While it is possible to parse specific JSON data using regular expressiosns (assuming specific structure of while spaces/line breaks), it is very hard (impossible ?) to write regular expression that will cover all possible formatting. This is similar to parsing XML data - some formats can be parsed with regex, but extremely hard to write generic parser.
Instead of regex, consider using JSON specific tool, e.g., jq
Also, there construction of the pipe (curl to grep) should use '|' and not '>>', and the '>' should be used to specify the name of the file result. See below:
curl --request POST --url 'https://www.virustotal.com/vtapi/v2/url/scan' --data 'apikey=XXXXXXXXXXXXXXX' --data 'url=abcde.xyz' |
jq .scan_id > res.txt
To remove the quotes from the res.txt ,use the 'raw-output format of jq (jq -r .scan_id`)
If not possible to use jq for any reason, consider the following modification. It is using 'sed' (instead of grep) to extract the scan_id value (0000... in this case). It assumes that that the "scan_id" tag and value are on the same line.
curl --request POST --url 'https://www.virustotal.com/vtapi/v2/url/scan' --data 'apikey=XXXXXXXXXXXXXXX' --data 'url=abcde.xyz' |
sed -n -e 's/.*"scan_id": *"\([^"]*\)".*/\1/p' > res.txt

Try
curl --request POST --url 'https://www.virustotal.com/vtapi/v2/url/scan' --data 'apikey=XXXXXXXXXXXXXXX' --data 'url=abcde.xyz'| tr ',' '\n' | grep scan_id
Demo:
$"http://abcde.xyz/", "url": "http://abcde.xyz/", "response_code": 1, "scan_date": "2020-07-29 03:12:11", "scan_id": "000000000000000000000000000000000000000", "verbose_msg": "Scan request successfully queued, come back later for the report"}' <
{"permalink": "https://www.virustotal.com/gui/url/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/detection/u-17f485d68047604e61b4067310ab716ae6fddc774bb46ffab06d081613b28e49-1595992331", "resource": "http://abcde.xyz/", "url": "http://abcde.xyz/", "response_code": 1, "scan_date": "2020-07-29 03:12:11", "scan_id": "000000000000000000000000000000000000000", "verbose_msg": "Scan request successfully queued, come back later for the report"}
$: "000000000000000000000000000000000000000", "verbose_msg": "Scan request successfully queued, come back later for the report"}' | tr ',' '\n' | grep scan_id <
"scan_id": "000000000000000000000000000000000000000"
$

Related

Extract data from unix log file, construct JSON and perform post request using curl

My overall task is constantly to collect data from UNIX system log file, filter it, prepare a json payload based on the filtered data and process the data by sending a post api call to another server.
I wonder if that can be done using let's say shell script to monitor the log file with tail, filter with grep to get the specific lines dumpted in another file. With cronjob to run another script which contruct a .json and send curl request with the json to external server.
Some details:
In the log file - connector.log I am interested in lines like:
2020-09-16T15:14:37,337 INFO (tomcat-http--131) [tenant-test;-;138.188.247.4;] com.vmware.horizon.adapters.passwordAdapter.PasswordIdpAdapter - Login: user123 - SUCCESS
These lines, I can collect by the below command:
tailf connector.log | grep 'PasswordIdpAdapter - Login\|FAILURE\|SUCCESS'
and probably dump them into a file:
tailf connector.log | grep 'PasswordIdpAdapter - Login\|FAILURE\|SUCCESS' > log_data.txt
I wonder at this point, is it possible to extract only specific fields from a line(not the whole line) from the connector.log , so one line in log_data.txt to look like(1, 4, 6, 7, 8):
1 2020-09-29T07:15:13,881 [tenant1;usrname#tenant1;10.93.231.5;] - username - SUCCESS
From that point, I need to write a script(maybe could be run by cronjob every minute)/or a command to construct the below json and send the request. One line - one request.
This is the example of the json:
{
"timestamp": "2020-09-16T15:24:35,377",
"tenant_name": "tenant-test",
"log_type": "SERVICE",
"log_entry": "Login: user123 - SUCCESS"
}
The field values that should be replaced already exist in the log line: timestamp(the 1st field, e.g. 2020-09-16T15:14:37,337), tenant_name(the 1st part of the 4th field, tenant-test) and the log_entry(the last four fields, e.g. Login: user123 - SUCCESS).
When the json is constructed, I'll send it by:
curl --header "Content-Type: application/json" --request POST --data \
$payload http://myservert:8080/api/requests
What is not clear to me, this script to get the data line by line from log_data.txt e.g.
and populate some of the fields to create the .json and send it to the server.
Thanks for your answers in advance,
Petko
Thanks #shellter for the awk idea. So, bash, awk, grep, cat, cut and curl did the job.
I've created a cronjob to execute the bash script on 5 min interval.
The script gets the last 5mins of log data, dump it to another file, reads the filtered data, prepare the payload and then executes the API call. Maybe it is stupid but it works.
#!/bin/bash
MONITORED_LOG="/var/logs/test.log"
FILTERED_DATA="/tmp/login/login_data.txt"
REST_HOST="https://rest-host/topics/logs-"
# dump the last 5 mins of log data(date format: 2020-09-28T10:52:28,334)
# to a file, filter for keywords FAILURE\|SUCCESS and NOT having 'lookup|SA'
# an example of data record taken: 1 2020-09-29T07:15:13,881 [tenant1;usrname#tenant1;10.93.231.5;] - username - SUCCESS
awk -v d1="$(date --date="-5 min" "+%Y-%m-%dT%H:%M:%S")" -v d2="$(date "+%Y-%m-%dT%H:%M:%S")" '$0 > d1 && $0 < d2' $MONITORED_LOG | grep 'FAILURE\|SUCCESS' | grep -v 'lookup\|SA-' | awk '{ print $2, $3, $5, $7}' | uniq -c > $FILTERED_DATA
## loop through all the filtered records and send an API call
cat $FILTERED_DATA | while read LINE; do
## preparing the variables
timestamp=$(echo $LINE | cut -f2 -d' ')
username=$(echo $LINE | cut -f5 -d' ')
log_entry=$(echo $LINE | cut -f7 -d' ')
# get the tenant name, split by ; and remove the first char [
tenant_name=$(echo $tenant_name | cut -f1 -d';')
tenant_name="${tenant_name:1}"
# preparing the payload
payload=$'{"records":[{"value":{"timestamp":"'
payload+=$timestamp
payload+=$'","tenant_name":"'
payload+=$tenant_name
payload+=$'","log_entry":"'
payload+=$log_entry
payload+=$'"}}]}'
echo 'payload: ' $payload
# send the api call to the server with dynamic construction of tenant name
curl -i -k -u 'api_user:3494ssdfs3' --request POST --header "Content-type:application/json" --data "$payload" "$REST_HOST$tenant_name"
done

Access Next Offset from GET http api request

I'm unable to understand pagination of Chargebee (https://apidocs.chargebee.com/docs/api) , I need to create a request in where I can add next offset to the request to get further data (without setting limit other then by default, which is 10). But i'm unable to understand how http request will be formed with this given next_offset like attached image.
Screenshot of request and response
I have had success submitting the identical request with the offset tacked onto the end. Here is an example:
First Request
curl -s https://xxyyzz.chargebee.com/api/v2/events -G -K /home/xxyyzz/.cb_curl_key.cfg --data-urlencode limit=2 --data-urlencode occurred_at[between]="[1554076800,1554077099]"
JSON results:
{
"list": [
{"event": {
"id": "ev_xxyyzz1",
"occurred_at": 1554077022,
...
"next_offset": "[\"1556428868000\",\"364450353\"]"
}
The tool I used to display the json is trying to be helpful with the backslashes.
The real next_offset value is
["1556428868000","364450353"]
Second Request
Identical to the first with
offset="["1554077017000","345017569"]"
tacked on to the end:
curl -s https://xxyyzz.chargebee.com/api/v2/events -G -K /home/xxyyzz/.cb_curl_key.cfg --data-urlencode limit=2 --data-urlencode --data-urlencode occurred_at[between]="[1554076800,1554077099]" offset="["1554077017000","345017569"]"
JSON results:
{
"list": [
{"event": {
"id": "ev_xxyyzz3",
"occurred_at": 1556429028,
}
Keep repeating the process until the "next_offset" key does not appear in the JSON result.

How to iterate Json object in shell script [duplicate]

This question already has answers here:
Parsing JSON with Unix tools
(45 answers)
Closed 5 years ago.
I am writing a shell script to run some api's. It return response fine but i need some specific parameter to grep from the response and want to save in file.
My script look like
#!/bin/sh
response=$(curl 'https://example.com' -H 'Content-Type: application/json' )
echo "$response"
reponse is something like
{
status:"success",
response:{
"target":"",
"content":"test content"
}
}
Response is fine and i am able to write whole response in file but My requirement is to save only "content" inside "response" object using the script. which i need for another api.
Note: I cannot change api responses as I am working third party api's;
Thank you
If the output is proper JSON:
$ cat proper.json
{
"status": "success",
"response": {
"target": "",
"content": "test content"
}
}
$ response=$(cat proper.json)
You could use jq:
$ echo $response | jq -r '.response.content'
test content
You can grep for the content and then use awk to split by : and take only the value, not the key
grep "\"content\":" | awk -F":" '{ print $2}'
Will print "test content"
You can do this to get the value of contents into a variable ($content).
content=$(echo "$response" | cut -d'"' -f 7)
Explanation - Split the $response using " (double quote) as the delimiter and use the 7th field of the output (i.e the value (test content) of content in the json response)
Here is an excerpt from the description and usage of the cut command
if you like to extract a whole field, you can combine option -f and -d. The option -f specifies which field you want to extract, and the option -d specifies what is the field delimiter that is used in the input file.

Interpolate command output into GitHub REST request

I am trying to create a pull request comment automatically whenever CI is run. The output of a given command is written to a file (could also just be stored inside an environment variable though). The problem is, I usually get the following response:
curl -XPOST -d "{'body':'$RESULT'}" https://api.github.com/repo/name/issues/number/comment
{
"message": "Problems parsing JSON",
"documentation_url": "https://developer.github.com/v3/issues/comments/#create-a-comment"
}
This is usually due to unescpaed characters, like \n, \t, " etc.
Is there any easy way to achieve this on the command line or in bash, sh, with jq or Python? Using the Octokit.rb library is works straight away, but I don't want to install Ruby in the build environment.
You can use jq to create your JSON object. Supposing you have your comment content in RESULT variable, the full request would be :
DATA=$(echo '{}' | jq --arg val "$RESULT" '.| {"body": $val}')
curl -s -H 'Content-Type: application/json' \
-H 'Authorization: token YOUR_TOKEN' \
-d "$DATA" \
"https://api.github.com/repos/:owner/:repo/issues/:number/comments"
The post "Using curl POST with variables defined in bash script functions" proposes multiple techniques for passing a varialbe like $RESULT in a curl POST parameter.
generate_post_data()
{
cat <<EOF
{
"body": "$RESULT"
}
EOF
}
Then, following "A curl tutorial using GitHub's API ":
curl -X POST \
-H "authToken: <yourToken" \
-H "Content-Type: application/json" \
--data "$(generate_post_data)" https://api.github.com/repo/name/issues/number/comment

payload is not valid JSON

I'm using curl to send JSON to an API endpoint. However, somewhere in the bash chain it is getting messed up.
Is there something special to know about encoding with curl?
If I construct the payload like this:
PAYLOAD='payload={"channel": "github", "username": "webhookbot", "icon_emoji": ":ghost:", "text": "'
PAYLOAD+=$1
PAYLOAD+=' " }'
echo $PAYLOAD
curl -X POST --data-urlencode "$PAYLOAD" $SLACKPOSTURL
echo "sent"
I'll get back an error
Payload was not valid JSONsent
however if i just hardwire to assign a variable with the output
PAYLOAD='payload={"channel": "github", "username": "webhookbot", "icon_emoji": ":ghost:", "text": "LAST_COMMIT Merge pull request #558 from dcsan/boteditor Boteditor " }'
then it will go through fine.
Is there something that a simple assignment is doing differently vs. concatenating strings? In the console the output looks identical.
FWIW some messages go through but content like this:
LAST_COMMIT Merge pull request #558 from dcsan/boteditor Boteditor
will only go through if hardcoded in. so its not the other end afaican see, its something to do with the way messages are built.
I guess you want to concatenate values into your variable. But += is not the way to do so.
To concatenate strings in a variable you need to say:
PAYLOAD="$PAYLOAD $1"
All together it would be something like the following. Note the need to use " so that the variable $PAYLOAD is expanded and the usage of \" to store a literal double quote:
PAYLOAD='payload={"channel": "github", "username": "webhookbot", "icon_emoji": ":ghost:", "text": "'
PAYLOAD="$PAYLOAD $1 \" }"
echo "$PAYLOAD"
curl -X POST --data-urlencode "$PAYLOAD" $SLACKPOSTURL
echo "sent"
This is what worked from me from a bash script:
curl -X POST --data-urlencode "payload={\"text\": \"$2\"}" https://hooks.slack.com/services/$KEY
Notice the inner quotes are escaped, but the outer quotes are not.
FYI, adding:
set -x
at the beginning of a bash script will show you the actual commands being executed, and save a lot of guesswork.