I use GitHub-action for my build, and it generates multiple artifacts (with a different name).
Is there a way to predict the URL of the artifacts of the last successful build? Without knowing the sha1, only the name of the artifact and the repo?
I have developed a service that exposes predictable URLs to either the latest or a particular artifact of a repository's branch+workflow.
https://nightly.link/
https://github.com/oprypin/nightly.link
This is implemented as a GitHub App, and communication with GitHub is authenticated, but users that only download don't need to even log in to GitHub.
The implementation goes and fetches this through the API, in 3 steps:
https://api.github.com/repos/:owner/:repo/actions/workflows/someworkflow.yml/runs?per_page=1&branch=master&event=push&status=success
https://api.github.com/repos/:owner/:repo/actions/runs/123456789/artifacts?per_page=100
https://api.github.com/repos/:owner/:repo/actions/artifacts/87654321/zip
(the last one redirects you to an ephemeral direct download URL)
Note that authentication is required. For OAuth that's public_repo (or repo if appropriate). For GitHub Apps that's "Actions"/"Read-only".
There is indeed no more direct way to do this.
Some relevant issues are
https://github.com/actions/upload-artifact/issues/51
https://github.com/actions/upload-artifact/issues/27
At the moment, no, according to comments from staff although this may change with future versions of the upload-artifact action.
After poking around myself, it is possible to get this using the GitHub actions API:
https://developer.github.com/v3/actions/artifacts/
GET /repos/:owner/:repo/actions/runs/:run_id/artifacts
So you can receive a JSON reply and iterate through the "artifacts" array to get the corresponding "archive_download_url". A workflow can fill in the URL like so:
/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts
You can use the jq command-line JSON processor along with curl to extract the URL as follows.
curl -s https://api.github.com/repos/<OWNER>/<REPO_NAME>/actions/artifacts\?per_page\=<NUMBER_OF_ARTIFACTS_PER_BUILD> | jq '[.artifacts[] | {name : .name, archive_download_url : .archive_download_url}]' | jq -r '.[] | select (.name == "<NAME_OF_THE_ARTIFACT>") | .archive_download_url'
For example;
curl -s https://api.github.com/repos/ballerina-platform/ballerina-distribution/actions/artifacts\?per_page\=9 | jq '[.artifacts[] | {name : .name, archive_download_url : .archive_download_url}]' | jq -r '.[] | select (.name == "Linux Installer deb") | .archive_download_url'
Here, curl -s https://api.github.com/repos/<OWNER>/<REPO_NAME>/actions/artifacts\?per_page\=<NUMBER_OF_ARTIFACTS_PER_BUILD> returns the array of artifacts related to the latest build.
jq '[.artifacts[] | {name : .name, archive_download_url : .archive_download_url}]' extracts the artifacts array and filters required data.
jq -r '.[] | select (.name == "<NAME_OF_THE_ARTIFACT>") | .archive_download_url' selects the url for the given artifact name.
I am not a GitHub and jq guru. Probably there are more optimal solutions out there.
jq playground link: https://jqplay.org/s/Gm0kRcv63C - to test my solution and other possible ideas. I dropped some irrelevant field to shrink the sample JSON size (for example: node_id, size_in_bytes, created_at...)
Further details on the methods in the code samples below.
####### You can get the max date of your artifacts.
####### Then you need to choose the artifact entry by this date.
#######
####### NOTE: I just pre-formatted the first command "line".
####### 2nd "line" has a very similar, but simplified structure.
####### (at least easy to copy-paste into jq playground)
####### NOTE: ASSUMPTION:
####### First "page" of json response contains the most recent entries
####### AND includes artifact(s) with that specific name.
#######
####### (if other artifacts flood your API response, you can increase
####### the page size of it or do iteration on pages as a workaround)
bash$ cat artifact_response.json | \
jq '
(
[
.artifacts[]
| select(.name == "my-artifact" and .expired == false)
| .updated_at
]
| max
) as $max_date
| { $max_date }'
####### output
{ "max_date": "2021-04-29T11:22:20Z" }
Another way:
####### Latest ID of non-expired artifacts with a specific name.
####### Probably this is better solution than the above since you
####### can use the "id" instantly in your download url construction:
#######
####### "https://api.github.com/repos/blabla.../actions/artifacts/92394368/zip"
#######
####### ASSUMPTION: higher "id" means higher "update date" in your workflow
####### (there is no post-update of artifacts aka creation and
####### update dates are identical for an artifact)
cat artifact_response.json | \
jq '[ .artifacts[] | select(.name == "my-artifact" and .expired == false) | .id ] | max'
####### output
92394368
More compact filter assuming in reverse order by date or id in the API response:
####### no full command line, just jq filter string
#######
####### no "max" function, only pick the first element by index
#######
'[ .artifacts[] | select(.name == "my-artifact" and .expired == false) | .id ][0]'
Wrap-up
I recently faced with a similar use case, with the aim to make the build artifacts of a given GitHub Actions workflow more visible, with a single click in the commit statuses (albeit requiring users to be logged in github.com).
As pointed out by #geoff-hutchison in his answer:
The API https://developer.github.com/v3/actions/artifacts/ is helpful.
However:
It is impossible to query the list of artifacts generated by a workflow from a job of the current workflow; deferring the request in a subsequent workflow is needed.
The archive_download_url URLs contained in the corresponding JSON response do not seem practical enough (they are GitHub API URLs redirecting to ephemeral URLs).
Fortunately:
Browsing the workflow build artifacts URLs already proposed in the GitHub Actions page clearly shows that they have the form https://github.com/${{ github.repository }}/suites/${check_suite_id}/artifacts/${artifact_id}, and these URLs are static (albeit only available for logged users, and for 90 days maximum, given the expiration of artifacts).
Implementation
Hence the following, generic code I developed (under MIT license) to pin all artifacts of a given workflow in commit statuses (just replace the TODO-strings):
.github/workflows/pin-artifacts.yml:
name: Pin artifacts
on:
workflow_run:
workflows:
- "TODO-Name Of Existing Workflow"
types: ["completed"]
jobs:
# Make artifacts links more visible for the upstream project
pin-artifacts:
permissions:
statuses: write
name: Add artifacts links to commit statuses
if: ${{ github.event.workflow_run.conclusion == 'success' && github.repository == 'TODO-orga/TODO-repo' }}
runs-on: ubuntu-latest
steps:
- name: Add artifacts links to commit status
run: |
set -x
workflow_id=${{ github.event.workflow_run.workflow_id }}
run_id=${{ github.event.workflow_run.id }} # instead of ${{ github.run_id }}
run_number=${{ github.event.workflow_run.run_number }}
head_branch=${{ github.event.workflow_run.head_branch }}
head_sha=${{ github.event.workflow_run.head_sha }} # instead of ${{ github.event.pull_request.head.sha }} (or ${{ github.sha }})
check_suite_id=${{ github.event.workflow_run.check_suite_id }}
set +x
curl \
-H "Accept: application/vnd.github+json" \
"https://api.github.com/repos/${{ github.repository }}/actions/runs/${run_id}/artifacts" \
| jq '[.artifacts | .[] | {"id": .id, "name": .name, "created_at": .created_at, "expires_at": .expires_at, "archive_download_url": .archive_download_url}] | sort_by(.name)' \
> artifacts.json
cat artifacts.json
< artifacts.json jq -r ".[] | \
.name + \"§\" + \
( .id | tostring | \"https://github.com/${{ github.repository }}/suites/${check_suite_id}/artifacts/\" + . ) + \"§\" + \
( \"Link to \" + .name + \".zip [\" + ( .created_at | sub(\"T.*\"; \"→\") ) + ( .expires_at | sub(\"T.*\"; \"] (you must be logged)\" ) ) )" \
| while IFS="§" read name url descr; do
curl \
-X POST \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ github.token }}" \
"https://api.github.com/repos/${{ github.repository }}/statuses/${head_sha}" \
-d "$( printf '{"state":"%s","target_url":"%s","description":"%s","context":"%s"}' "${{ github.event.workflow_run.conclusion }}" "$url" "$descr" "$name (artifact)" )"
done
(If need be, see also my PR ocaml-sf/learn-ocaml#501 to see an implementation example and screenshots.)
Related
I have some log files which contain mixed of JSON and non-JSON logs, I'd like to separate them into two files, one contains JSON logs only and the other contains non-JSON logs, I get some ideas from this to extract JSON logs with jq, here are what I have tried using tee to split log into two files (usage from here & here) and jq to extract logs:
cat $logfile | tee >(jq -R -c 'fromjson? | select(type == "object") | not') > $plain_log_file) >(jq -R -c 'fromjson? | select(type == "object")' > $json_log_file)
This extracts JSON logs correctly but returns false for each non-JSON log instead of the log content itself.
cat $logfile | tee >(jq -R -c 'try fromjson catch .') > $plain_log_file) >(jq -R -c 'fromjson? | select(type == "object")' > $json_log_file)
this gets jq syntax error "catch ."
I do this so I can view the logs in lnav (an excellent log view/navigation tool).
Any suggestion on how to achieve this? Appreciate your help!
sample input:
{ "name": "joe"}
text line, this can be multi-line too
{ "xyz": 123 }
Assuming each JSON log item occurs on a separate line:
For the JSON logs:
jq -nR -c 'inputs|fromjson?'
For the others, you could use:
jq -nRr 'inputs | . as $in | try (fromjson|empty) catch $in'
If you only want to linewise separate the input into different files, go with #peak's solution. But if you want to further process the lines on conditions, you could turn them into an array using -Rn and [inputs], and go from there. For instance, if you need the according line numbers (e.g. to feed them into another tool, e.g. sed), use from_entries which for arrays provides them in the .key field:
jq -Rn 'reduce ([inputs] | to_entries[]) as $in ({};
.[($in.value | fromjson? | "json") // "plain"] += [$in.key]
)'
{
"json": [
0,
2
],
"plain": [
1
]
}
Demo
If each JSON log entry can be spread over multiple lines, then some assumptions about the non-JSON log entries must be made. Here is an example based on reasonable assumptions about the non-JSON entries. A bash or bash-like environment is also assumed for the sake of convenience.
function log {
cat<<EOF
{ "name":
"joe"}
text line, this can be
multi-line too
{
"xyz": 123 }
EOF
}
log | sed '/^[^"{[ ]/ { s/"/\\"/g ; s/^/"/; s/$/"/;}' |
tee >(jq -rc 'select(type == "string")' > strings.log) |
jq -rc 'select(type != "string")' > json.log
I have a set of JSON that all contain JSON in the following format:
File 1:
{ "host" : "127.0.0.1", "port" : "80", "data": {}}
File 2:
{ "host" : "127.0.0.2", "port" : "502", "data": {}}
File 3:
{ "host" : "127.0.0.1", "port" : "443", "data": {}}
These files can be rather large, up to several gigabytes.
I want to use JQ or some other bash json processing tool that can merge these json files into one file with a grouped format like so:
[{ "host" : "127.0.0.1", "data": {"80": {}, "443" : {}}},
{ "host" : "127.0.0.2", "data": {"502": {}}}]
Is this possible with jq and if yes, how could I possibly do this? I have looked at the group_by function in jq, but it seems like I need to combine all files first and then group on this big file. However, since the files can be very large, it might make sense to stream the data and group them on the fly.
With really big files, I'd look into a primarily disk based approach instead of trying to load everything into memory. The following script leverages sqlite's JSON1 extension to load the JSON files into a database and generate the grouped results:
#!/usr/bin/env bash
DB=json.db
# Delete existing database if any.
rm -f "$DB"
# Create table. Assuming each host,port pair is unique.
sqlite3 -batch "$DB" <<'EOF'
CREATE TABLE data(host TEXT, port INTEGER, data TEXT,
PRIMARY KEY (host, port)) WITHOUT ROWID;
EOF
# Insert the objects from the files into the database.
for file in file*.json; do
sqlite3 -batch "$DB" <<EOF
INSERT INTO data(host, port, data)
SELECT json_extract(j, '\$.host'), json_extract(j, '\$.port'), json_extract(j, '\$.data')
FROM (SELECT json(readfile('$file')) AS j) as json;
EOF
done
# And display the results of joining the objects Could use
# json_group_array() instead of this sed hackery, but we're trying to
# avoid building a giant string with the entire results. It might still
# run into sqlite maximum string length limits...
sqlite3 -batch -noheader -list "$DB" <<'EOF' | sed '1s/^/[/; $s/,$/]/'
SELECT json_object('host', host,
'data', json_group_object(port, json(data))) || ','
FROM data
GROUP BY host
ORDER BY host;
EOF
Running this on your sample data prints out:
[{"host":"127.0.0.1","data":{"80":{},"443":{}}},
{"host":"127.0.0.2","data":{"502":{}}}]
If the goal is really to produce a single ginormous JSON entity, then presumably that entity is still small enough to have a chance of fitting into the memory of some computer, say C. So there is a good chance of jq being up to the job on C. At any rate, to utilize memory efficiently, you would:
use inputs while performing the grouping operation;
avoid the built-in group_by (since it requires an in-memory sort).
Here then is a two-step candidate using jq, which assumes grouping.jq contains the following program:
# emit a stream of arrays assuming that f is always string-valued
def GROUPS_BY(stream; f):
reduce stream as $x ({}; ($x|f) as $s | .[$s] += [$x]) | .[];
GROUPS_BY(inputs | .data=.port | del(.port); .host)
| {host: .[0].host, data: map({(.data): {}}) | add}
If the JSON files can be captured by *.json, you could then consider:
jq -n -f grouping.jq *.json | jq -s .
One advantage of this approach is that if it fails, you could try using a temporary file to hold the output of the first step, and then processing it later, either by "slurping" it, or perhaps more sensibly distributing it amongst several files, one per .host.
Removing extraneous data
Obviously, if the input files contain extraneous data, you might want to remove it first, e.g. by running
for f in *.json ; do
jq '{host,port}' "$f" | sponge $f
done
or by performing the projection in program.jq, e.g. using:
GROUPS_BY(inputs | {host, data: .port}; .host)
| {host: .[0].host, data: map( {(.data):{}} )}
Here's a script which uses jq to solve the problem without requiring more memory than is needed for the largest group. For simplicity:
it reads *.json and directs output to $OUT as defined at the top of the script.
it uses sponge
#!/usr/bin/env bash
# Requires: sponge
OUT=big.json
/bin/rm -i "$OUT"
if [ -s "$OUT" ] ; then
echo $OUT already exists
exit 1
fi
### Step 0: setup
TDIR=$(mktemp -d /tmp/grouping.XXXX)
function cleanup {
if [ -d "$TDIR" ] ; then
/bin/rm -r "$TDIR"
fi
}
trap cleanup EXIT
### Step 1: find the groups
for f in *.json ; do
host=$(jq -r '.host' "$f")
echo "$f" >> "$TDIR/$host"
done
for f in $TDIR/* ; do
echo $f ...
jq -n 'reduce (inputs | {host, data: {(.port): {} }}) as $in (null;
.host=$in.host | .data += [$in.data])' $(cat $f) | sponge "$f"
done
### Step 2: assembly
i=0
echo "[" > $OUT
find $TDIR -type f | while read f ; do
i=$((i + 1))
if [ $i -gt 1 ] ; then echo , >> $OUT ; fi
cat "$f" >> $OUT
done
echo "]" >> $OUT
Discussion
Besides requiring enough memory to handle the largest group, the main deficiencies of the above implementation are:
it assumes that the .host string is suitable as a file name.
the resultant file is not strictly speaking pretty-printed.
These two issues could however be addressed quite easily with minor modifications to the script without requiring additional memory.
EDIT 1:
I'd like to extract video urls and titles from "https://ok.ru/video/c1404844" results using the CLI.
Here's want I've done so far :
The ERE pattern for each video relative URL is :
/video/\d+ and the video absolute URL looks like this : https://ok.ru$videoRelativeURL
I can use this command to extract the video urls (I use uniq because many video IDs appear 3 times) :
$ curl -s https://ok.ru/video/c1404844 | grep -oP "/video/\d+" | uniq | sed "s|^|https://ok.ru|" | head -5
https://ok.ru/video/1896971373228
https://ok.ru/video/1896971438764
https://ok.ru/video/1896971569836
https://ok.ru/video/1896971635372
https://ok.ru/video/1898415590060
Then I tried extracting the video relativeURLs + title with pup.
EDIT 3 : I replaced the class name video-card_n ellip by video-card_n.ellip. However pup only outputs the attribute of the second class (video-card_n.ellip), strange :
$ curl -s https://ok.ru/video/c1404844 | pup '.video-card_lk attr{href}, .video-card_n.ellip attr{title}' | head -5
Death.in.Paradise.S02E05.WEBRip.x264-ION10
Death.in.Paradise.S02E02.WEBRip.x264-ION10
Death.in.Paradise.S02E04.WEBRip.x264-ION10
Death.in.Paradise.S02E03.WEBRip.x264-ION10
Death.in.Paradise.S02E06.WEBRip.x264-ION10
It didn't work so I converted the expanded html to json with this command :
$ curl -s https://ok.ru/video/c1404844 | pup 'json{}' > c1404844.json
Now I want to try and extract the title from video-card_n ellip and the href from video-card_lk from the resulting json file with the jq tool but I know how to use jq enough.
I'd like jq (or pup) to output a flat file : the url as the first column and the title as the second column.
EDIT 2 : A big thank you to #peak for his help on jq !
DONE :
$ curl -s https://ok.ru/video/c1404844 | pup 'json{}' | jq -r 'recurse | arrays[] | select(.class == "video-card_lk").href,select(.class == "video-card_n ellip").title' | awk '{videoRelativeURL = $0;url="https://ok.ru"gensub("?.*$","",videoRelativeURL); getline title; print url" # "title}' | head
https://ok.ru/video/1898417425068 # Death.in.Paradise.S02E05.WEBRip.x264-ION10
https://ok.ru/video/1898417359532 # Death.in.Paradise.S02E02.WEBRip.x264-ION10
https://ok.ru/video/1898417293996 # Death.in.Paradise.S02E04.WEBRip.x264-ION10
https://ok.ru/video/1898417228460 # Death.in.Paradise.S02E03.WEBRip.x264-ION10
https://ok.ru/video/1898417162924 # Death.in.Paradise.S02E06.WEBRip.x264-ION10
https://ok.ru/video/1898417097388 # Death.in.Paradise.S02E07.WEBRip.x264-ION10
https://ok.ru/video/1898417031852 # Death.in.Paradise.S02E08.WEBRip.x264-ION10
https://ok.ru/video/1898416966316 # Death.in.Paradise.S02E01.WEBRip.x264-ION10
https://ok.ru/video/1898416769708 # Death.in.Paradise.S07E02.The.Stakes.Are.High.WEBRip.x264-ION10
https://ok.ru/video/1898416704172 # Death.in.Paradise.S07E03.Written.in.Murder.WEBRip.x264-ION10
...
If you want to scrape specific information from a HTML-source, then there's no need for 5 different tools! Please have a look at xidel. It can do it all.
$ xidel -s https://ok.ru/video/c1404844 -e '
//div[#data-id]/join(
(
div[#class="video-card_img-w"]/a/resolve-uri(substring-before(#href,"?")),
div[#class="video-card_n-w"]/a
),
" # "
)
'
https://ok.ru/video/1898417425068 # Death.in.Paradise.S02E05.WEBRip.x264-ION10
https://ok.ru/video/1898417359532 # Death.in.Paradise.S02E02.WEBRip.x264-ION10
https://ok.ru/video/1898417293996 # Death.in.Paradise.S02E04.WEBRip.x264-ION10
https://ok.ru/video/1898417228460 # Death.in.Paradise.S02E03.WEBRip.x264-ION10
https://ok.ru/video/1898417162924 # Death.in.Paradise.S02E06.WEBRip.x264-ION10
https://ok.ru/video/1898417097388 # Death.in.Paradise.S02E07.WEBRip.x264-ION10
https://ok.ru/video/1898417031852 # Death.in.Paradise.S02E08.WEBRip.x264-ION10
https://ok.ru/video/1898416966316 # Death.in.Paradise.S02E01.WEBRip.x264-ION10
https://ok.ru/video/1898416769708 # Death.in.Paradise.S07E02.The.Stakes.Are.High.WEBRip.x264-ION10
https://ok.ru/video/1898416704172 # Death.in.Paradise.S07E03.Written.in.Murder.WEBRip.x264-ION10
[...]
After using pup to convert the HTML of the top-level page to JSON, the following jq filter produces 24 pairs, the first two of which are shown under "Output" below:
[ [ .. | arrays[] | select(.class == "video-card_n ellip").title],
[ .. | arrays[] | select(.class == "video-card_lk").href]]
| transpose
Output
[
[
"Замечательная пара, красивая песня и чудесное исполнение! Золотые голоса!",
"/video/2406311403450?st._aid=VideoState_open_top"
],
[
"#СидимДома",
"/video/1675421949619?st._aid=VideoState_open_top"
],
...
I've got a JSON file (see below) called department_groups.json.
Essentially if I gave an argument of commercial I'd like it to return:
commercial-team#domain.com
commercial-updates#domain.com
Can anyone guide/help me with doing this?
{
"legal": {
"google_groups":[
["Legal", "legal#domain.com"],
["Legal Team", "legal-team#domain.com"],
["Compliance Checks", "compliance#domain.com"]
],
"samba_groups": ""
},
"commercial":{
"google_groups":[
["Commercial Team", "commercial-team#domain.com"],
["Commercial Updates", "commercial-updates#domain.com"]
],
"samba_groups": ""
},
"technology":{
"google_groups":[
["Technology", "technology#domain.com"],
["Incidents", "incidents#domain.com"]
],
"samba_groups": ""
}
}
This returns the second element in each array in the google_groups property of the commercial property:
jq --arg key commercial '.[$key].google_groups | .[] | .[1]' file
Use jq -r to output in "raw" format (lose the double quotes).
$ key=commercial
$ jq -r --arg key "$key" '.[$key].google_groups | .[] | .[1]' file
commercial-team#domain.com
commercial-updates#domain.com
I used --arg in these examples to show how it is used, optionally with a shell variable. If, on the other hand, commercial was just a fixed string, then you could simplify:
jq -r '.commercial.google_groups | .[] | .[1]' file
To process each line of the output, you can just use a shell while read loop:
key=commercial
while read -r email; do
echo "$email"
# process each email individually here
done < <(jq -r --arg key "$key" '.[$key].google_groups | .[] | .[1]' file)
Here I am using a process substitution <(), which acts like a file that can be processed by the shell. One advantage of doing this, over using a pipe, is that no subshell is created. Among other things, this means that the variables used within the loop remain in scope after the while block, so you can use them later.
If you prefer to use a pipe, just remove the part after done and move the command up to the first line:
jq ... | while read -r email; do # etc.
As #TomFenech noted, the requirements are somewhat unclear, but if it's the email addresses you want, the following variant of his answer may be of interest:
key=commercial
$ jq -r --arg key "$key" '.[$key].google_groups[][] | select(test("#"))' department_groups.json
commercial-team#domain.com
commercial-updates#domain.com
I receive some json that I process until it becomes just text lines. In the first line there's a value that I would like to keep in a variable and all the rest after the first line should be displayed with less or other utils.
Can I do this without using a temporary file?
The context is this:
aws logs get-log-events --log-group-name "$logGroup" --log-stream-name "$logStreamName" --limit "$logSize" |
jq '{message:.nextForwardToken}, .events[] | .message' |
sed 's/^"//g' | sed 's/"$//g'
In the first line there's the nextForwardToken that I want to put in the variable and all the rest is log messages.
The json looks like this:
{
"events": [
{
"timestamp": 1518081460955,
"ingestionTime": 1518081462998,
"message": "08.02.2018 09:17:40.955 [SimpleAsyncTaskExecutor-138] INFO o.s.b.c.l.support.SimpleJobLauncher - Job: [SimpleJob: [name=price-update]] launched with the following parameters: [{time=1518081460875, sku=N-W7ZLH9U737B|N-XIBH22XQE87|N-3EXIRFNYNW0|N-U19C031D640|N-6TQ1847FQE6|N-NF0XCNG0029|N-UJ3H0OZROCQ|N-W2JKJD4S6YP|N-VEMA4QVV3X1|N-F40J6P2VM01|N-VIT7YEAVYL2|N-PKLKX1PAUXC|N-VPAK74C75DP|N-C5BLYC5HQRI|N-GEIGFIBG6X2|N-R0V88ZYS10W|N-GQAF3DK7Y5Z|N-9EZ4FDDSQLC|N-U15C031D668|N-B8ELYSSFAVH}]"
},
{
"timestamp": 1518081461095,
"ingestionTime": 1518081462998,
"message": "08.02.2018 09:17:41.095 [SimpleAsyncTaskExecutor-138] INFO o.s.batch.core.job.SimpleStepHandler - Executing step: [index salesprices]"
},
{
"timestamp": 1518082421586,
"ingestionTime": 1518082423001,
"message": "08.02.2018 09:33:41.586 [upriceUpdateTaskExecutor-3] DEBUG e.u.d.a.j.d.b.StoredMasterDataReader - Reading page 1621"
}
],
"nextBackwardToken": "b/33854347851370569899844322814554152895248902123886870536",
"nextForwardToken": "f/33854369274157730709515363051725446974398055862891970561"
}
I need to put in a variable this:
f/33854369274157730709515363051725446974398055862891970561
and display (or put in an other variable) the messages:
08.02.2018 09:17:40.955 [SimpleAsyncTaskExecutor-138] INFO o.s.b.c.l.support.SimpleJobLauncher - Job: [SimpleJob: [name=price-update]] launched with the following parameters: [{time=1518081460875, sku=N-W7ZLH9U737B|N-XIBH22XQE87|N-3EXIRFNYNW0|N-U19C031D640|N-6TQ1847FQE6|N-NF0XCNG0029|N-UJ3H0OZROCQ|N-W2JKJD4S6YP|N-VEMA4QVV3X1|N-F40J6P2VM01|N-VIT7YEAVYL2|N-PKLKX1PAUXC|N-VPAK74C75DP|N-C5BLYC5HQRI|N-GEIGFIBG6X2|N-R0V88ZYS10W|N-GQAF3DK7Y5Z|N-9EZ4FDDSQLC|N-U15C031D668|N-B8ELYSSFAVH}]
08.02.2018 09:17:41.095 [SimpleAsyncTaskExecutor-138] INFO o.s.batch.core.job.SimpleStepHandler - Executing step: [index salesprices]
08.02.2018 09:33:41.586 [upriceUpdateTaskExecutor-3] DEBUG e.u.d.a.j.d.b.StoredMasterDataReader - Reading page 1621
Thanks in advance for your help.
You might consider it a bit of trick, but you can use tee to pipe all the output to stderr and fetch the one line you want for your variable with head:
var="$(command | tee /dev/stderr | head -n 1)"
Or you can solve this with a bit of scripting:
first=true
while read -r line; do
if $first; then
first=false
var="$line"
fi
echo "$line"
done < <(command)
If you are interested in storing the contents to variables, use mapfile or read on older bash versions.
Just using read to get the first line do. I've added -r flag to jq print output without quotes
read -r token < <(aws logs get-log-events --log-group-name "$logGroup" --log-stream-name "$logStreamName" --limit "$logSize" | jq -r '{message:.nextForwardToken}, .events[] | .message')
printf '%s\n' "$token"
Or using mapfile
mapfile -t output < <(aws logs get-log-events --log-group-name "$logGroup" --log-stream-name "$logStreamName" --limit "$logSize" | jq -r '{message:.nextForwardToken}, .events[] | .message')
and loop through the array. The first element will always contain the token-id you want.
printf '%s\n' "${output[0]}"
Rest of the elements can be iterated over,
for ((i=1; i<${#output[#]}; i++)); do
printf '%s\n' "${output[i]}"
done
Straightforwardly:
aws logs get-log-events --log-group-name "$logGroup" \
--log-stream-name "$logStreamName" --limit "$logSize" > /tmp/log_data
-- set nextForwardToken variable:
nextForwardToken=$(jq -r '.nextForwardToken' /tmp/log_data)
echo $nextForwardToken
f/33854369274157730709515363051725446974398055862891970561
-- print all message items:
jq -r '.events[].message' /tmp/log_data
08.02.2018 09:17:40.955 [SimpleAsyncTaskExecutor-138] INFO o.s.b.c.l.support.SimpleJobLauncher - Job: [SimpleJob: [name=price-update]] launched with the following parameters: [{time=1518081460875, sku=N-W7ZLH9U737B|N-XIBH22XQE87|N-3EXIRFNYNW0|N-U19C031D640|N-6TQ1847FQE6|N-NF0XCNG0029|N-UJ3H0OZROCQ|N-W2JKJD4S6YP|N-VEMA4QVV3X1|N-F40J6P2VM01|N-VIT7YEAVYL2|N-PKLKX1PAUXC|N-VPAK74C75DP|N-C5BLYC5HQRI|N-GEIGFIBG6X2|N-R0V88ZYS10W|N-GQAF3DK7Y5Z|N-9EZ4FDDSQLC|N-U15C031D668|N-B8ELYSSFAVH}]
08.02.2018 09:17:41.095 [SimpleAsyncTaskExecutor-138] INFO o.s.batch.core.job.SimpleStepHandler - Executing step: [index salesprices]
08.02.2018 09:33:41.586 [upriceUpdateTaskExecutor-3] DEBUG e.u.d.a.j.d.b.StoredMasterDataReader - Reading page 1621
I believe the following meets the stated requirements, assuming a bash-like environment:
x=$(aws ... |
tee >(jq -r '.events[] | .message' >&2) |
jq .nextForwardToken) 2>&1
This makes the item of interest available as the shell variable $x.
Notice that the string manipulation using sed can be avoided by using the -r command-line option of jq.
Calling jq just once
x=$(aws ... |
jq -r '.nextForwardToken, (.events[] | .message)' |
tee >(tail -n +2 >&2) |
head -n 1) 2>&1
echo "x=$x"