retrieving the last stored value in variable using shell scripting - mysql

i am trying to retrieve only the last updated value of variable date2,but all the times it
gives 0 in the output,i think it is because i am retrieving it out side the block but
i only want the last value store in variable date2,how can i fix this problem.Thanks for
your help
here is my code
count=0
date1=0
date2=0
mysql -uroot -proot -Dproject_ivr_db -rN --execute "SELECT FeeSubmissionDate FROM
meritlist_date wHERE Discipline='phd' AND AnnounceDate<=now() " | while read value
do
if [[ "$count" == 0 ]]
then
let "date2=$value"
let "count++"
else
let "date1=$value"
let "result=$date1-$date2"
if [[ "$result" -gt 0 ]]
then
let "date2=$date1"
fi
fi
done
echo"V,date2=$date2"

AWK can easily solve this issue. Here's an example of a simple script which lists all files in the current directory and returns the number of those files that have substring "test" as part of the filename.
#!/bin/bash
ls -1 |
awk 'BEGIN { count=0 } { if (index($0,"test") != 0) { count++ }} END { printf("Count=%s\n", count); }'
Now if you replace ls -1 with your mysql command and get rid of the while loop by simply using AWK in an analogue way as shown above, you should have no problems counting the lines of interest.

Related

How can I write a batch file using jq to find json files with certain attribute and copy the to new location

I have 100,000's of lined json files that I need to split out based on whether or not, they contain a certain value for an attribute and then I need to convert them into valid json that can be read in by another platform.
I'm using a batch file to do this and I've managed to convert them into valid json using the following:
for /r %%f in (*.json*) do jq -s -c "." "%%f" >> "C:\Users\me\my-folder\%%~nxf.json"
I just can't figure out how to only copy the files that contain a certain value. So logic should be:
Look at all the files in the folders and sub solders
If the file contains an attribute "event" with a value of "abcd123"
then: convert the file into valid json and persist it with the same filename over to location "C:\Users\me\my-folder\"
else: ignore it
Example of files it should select:
{"name":"bob","event":"abcd123"}
and
{"name":"ann","event":"abcd123"},{"name":"bob","event":"8745LLL"}
Example of files it should NOT select:
{"name":"ann","event":"778PPP"}
and
{"name":"ann","event":"778PPP"},{"name":"bob","event":"8745LLL"}
Would love help to figure out the filtering part.
Since there are probably more file names than will fit on the command line, this response will assume a shell loop through the file names will be necessary, as the question itself envisions. Since I'm currently working with a bash shell, I'll present a bash solution, which hopefully can readily be translated to other shells.
The complication in the question is that the input file might contain one or more valid JSON values, or one or more comma-separated JSON values.
The key to a simple solution using jq is jq's -e command-line option, since this sets the return code to 0 if and only if
(a) the program ran normally; and (b) the last result was a truthy value.
For clarity, let's encapsulate the relevant selection criterion in two bash functions:
# If the input is a valid stream of JSON objects
function try {
jq -e -n 'any( inputs | objects; select( .event == "abcd123") | true)' 2> /dev/null > /dev/null
}
# If the input is a valid JSON array whose elements are to be checked
function try_array {
jq -e 'any( .[] | objects; select( .event == "abcd123") | true)' 2> /dev/null > /dev/null
}
Now a comprehensive solution can be constructed along the following lines:
find . -type f -maxdepth 1 -name '*.json' | while read -r f
do
< "$f" try
if [ $? = 0 ] ; then
echo copy $f
elif [ $? = 5 ] ; then
(echo '['; cat "$f"; echo ']') | try_array
if [ $? = 0 ] ; then
echo copy $f
fi
fi
done
Have you considered using findstr?
%SystemRoot%\System32\findstr.exe /SRM "\"event\":\"abcd123\"" "C:\Users\me\my-folder\*.json"
Please open a Command Prompt window, type findstr /?, press the ENTER key, and read its usage information. (You may want to consider the /I option too, for instance).
You could then use that within another for loop to propagate those files into a variable for your copy command.
batch-file example:
#For /F "EOL=? Delims=" %%G In (
'%SystemRoot%\System32\findstr.exe /SRM "\"event\":\"abcd123\"" "C:\Users\me\my-folder\*.json"'
) Do #Copy /Y "%%G" "S:\omewhere Else"
cmd example:
For /F "EOL=? Delims=" %G In ('%SystemRoot%\System32\findstr.exe /SRM "\"event\":\"abcd123\"" "C:\Users\me\my-folder\*.json"') Do #Copy /Y "%G" "S:\omewhere Else"

Bash loop to merge files in batches for mongoimport

I have a directory with 2.5 million small JSON files in it. It's 104gb on disk. They're multi-line files.
I would like to create a set of JSON arrays from the files so that I can import them using mongoimport in a reasonable amount of time. The files can be no bigger than 16mb, but I'd be happy even if I managed to get them in sets of ten.
So far, I can use this to do them one at a time at about 1000/minute:
for i in *.json; do mongoimport --writeConcern 0 --db mydb --collection all --quiet --file $i; done
I think I can use "jq" to do this, but I have no idea how to make the bash loop pass 10 files at a time to jq.
Note that using bash find results in an error as there are too many files.
With jq you can use --slurp to create arrays, and -c to make multiline json single line. However, I can't see how to combine the two into a single command.
Please help with both parts of the problem if possible.
Here's one approach. To illustrate, I've used awk as it can read the list of files in small batches and because it has the ability to execute jq and mongoimport. You will probably need to make some adjustments to make the whole thing more robust, to test for errors, and so on.
The idea is either to generate a script that can be reviewed and then executed, or to use awk's system() command to execute the commands directly. First, let's generate the script:
ls *.json | awk -v group=10 -v tmpfile=json.tmp '
function out() {
print "jq -s . " files " > " tmpfile;
print "mongoimport --writeConcern 0 --db mydb --collection all --quiet --file " tmpfile;
print "rm " tmpfile;
files="";
}
BEGIN {n=1; files="";
print "test -r " tmpfile " && rm " tmpfile;
}
n % group == 0 {
out();
}
{ files = files " \""$0 "\"";
n++;
}
END { if (files) {out();}}
'
Once you've verified this works, you can either execute the generated script, or change the "print ..." lines to use "system(....)"
Using jq to generate the script
Here's a jq-only approach for generating the script.
Since the number of files is very large, the following uses features that were only introduced in jq 1.5, so its memory usage is similar to the awk script above:
def read(n):
# state: [answer, hold]
foreach (inputs, null) as $i
([null, null];
if $i == null then .[0] = .[1]
elif .[1]|length == n then [.[1],[$i]]
else [null, .[1] + [$i]]
end;
.[0] | select(.) );
"test -r json.tmp && rm json.tmp",
(read($group|tonumber)
| map("\"\(.)\"")
| join(" ")
| ("jq -s . \(.) > json.tmp", mongo("json.tmp"), "rm json.tmp") )
Invocation:
ls *.json | jq -nRr --arg group 10 -f generate.jq
Here is what I came up with. It seems to work and is importing at roughly 80 a second into an external hard drive.
#!/bin/bash
files=(*.json)
for((I=0;I<${#files[*]};I+=500)); do jq -c '.' ${files[#]:I:500} | mongoimport --writeConcern 0 --numInsertionWorkers 16 --db mydb --collection all --quiet;echo $I; done
However, some are failing. I've imported 105k files but only 98547 appeared in the mongo collection. I think it's because some documents are > 16mb.

Newbie: unix bash, nested if statement, results from a loop results from sql

Newbie here, please pardon any confusing wording that I use.
A common task I have is to take a list of names and do a MySQL query to look the names up in a table and see if they are "live" on our site.
Doing this one at a time, my SQL query works fine. I then wanted to do the query using a loop from a file listing multiple names. This works fine, too.
I added this query loop to my bash profile so that I can quickly do the task by typing this:
$ ValidOnSite fileName
This works fine, and I even added an usage statement for my process to remind myself of the syntax. Below is what I have that works fine:
validOnSite() {
if [[ "$1" == "" ]] || [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
echo "Usage:"
echo " $ validOnSite [filename]"
echo " Where validOnSite uses specified file as variables in sql query:"
echo " SELECT name, active FROM dbDb WHERE name=lines in file"
else
cat $1 | while read line ; do hgsql -h genome-centdb hgcentral -Ne "select name, active from dbDb where name='$line'" ; done
fi
Using a file "list.txt" which contains:
nameA
nameB
I would then type:
validOnSite list.txt
and both entries in list.txt meet my query criteria and are found in sql. My results will be:
nameA 1
nameB 1
Note the "1" after each result. I assume this is some sort of "yes" status.
Now, I add a third name to my list.txt, one that I know is not a match in sql. Now list.txt contains:
nameA
nameB
foo
When I again run this command for my list with 3 rows:
validOnSite list.txt
My results are the same as when I used the 1st version of file.txt, and I cannot see which lines failed, I still only see which lines were a success:
nameA 1
nameB 1
I have been trying all kinds of things to add a nested if statement, something that says, "If $line is a match, echo "pass", else echo "fail."
I do not want to see a "1" in my results. Using file.txt with 2 matches and 1 non-match, I would like my results to be:
nameA pass
nameB pass
foo fail
Or even better, color code a pass with green and a fail with red.
As I said, newbie here... :)
Any pointers in the right direction would help. Here is my latest sad attempt, but I realize I may be going in a wrong direction entirely:
validOnSite() {
if [[ "$1" == "" ]] || [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
echo "Usage:"
echo " $ validOnSite [filename]"
echo " Where validOnSite uses specified file as variables in sql query:"
echo " SELECT name, active FROM dbDb WHERE name=lines in file"
else
cat $1 | while read line ; do hgsql -h genome-centdb hgcentral -Ne "select name, active from dbDb where name='$line'" > /dev/null ; done
if ( "status") then
echo $line "failed"
echo $line "failed" >> outfile
else
echo $line "ok"
echo $line "ok" >>outfile
clear
cat outfile
fi
fi
If something looks crazy in my last attempt, it's because it is - I am just googling around and trying as many things as I can while trying to learn. Any help appreciated, I feel stuck after working on this for a long time, but I am excited to move forward and find a solution! I think there is something I'm missing about understanding stdout, and also confusion about nested if's.
Note: I do not need an outfile, but it's ok if one is needed to accomplish the goal. stdout result alone would suffice, and is preferred.
Note: hgssql is just the name of our MySQL server. The MySQL part works fine, I am looking for a better way to deal with my bash output, and I think there is something about stderr that I'm missing. I'm looking for a fairly simple answer as I'm a newbie!
I guess, by hgsql you mean some Mercurial extension that allows to perform MySQL queries. I don't know how hgsql works, but I know that MySQL returns only the matching rows. But in terms of shell scripting, the result is a string that may contain extra information even if the number of matched rows is zero. For example, some MySQL client may return the header or a string like "No rows found", although it is unlikely.
I'll show how it is done with the official mysql client. I'm sure you will manage to adapt hgsql with the help of its documentation to the following example.
if [ -t 1 ]; then
red_color=$(tput setaf 1)
green_color=$(tput setaf 2)
reset_color=$(tput sgr0)
else
red_color=
green_color=
reset_color=
fi
colorize_flag() {
local color
if [ "$1" = 'fail' ]; then
color="$red_color"
else
color="$green_color"
fi
printf '%s' "${color}${1}${reset_color}"
}
sql_fmt='SELECT IF(active, "pass", "fail") AS flag FROM dbDb WHERE name = "%s"'
while IFS= read -r line; do
sql=$(printf "$sql_fmt" "$line")
flag=$(mysql --skip-column-names dbname -e "$sql")
[ -z "$flag" ] && flag='fail'
printf '%-20s%s\n' "$line" "$(colorize_flag "$flag")"
done < file
The first block detects if the script is running in interactive mode by checking if the file descriptor 1 (standard output) is opened on a terminal (see help test). If it is opened in a terminal, the script considers that the script is running interactively, i.e. the standard output is connected to the user's terminal directly, but not via pipe, for example. For interactive mode, it assigns variables to the terminal color codes with the help of tput command.
colorize_flag function accepts a string ($1) and outputs the string with the color codes applied according to its value.
The last block reads file line by line. For each line builds an SQL query string (sql) and invokes mysql command with the column names stripped off the output. The output of the mysql command is assigned to flag by means of command substitution. If "$flag" is empty, it is assigned to 'fail'. The $line and the colorized flag are printed to standard output.
You can test the non-interactive mode by chaining the output via pipe, e.g.:
./script | tee -a
I must warn you that it is generally bad idea to pass the shell variables into SQL queries unless the values are properly escaped. And the popular shells do not provide any tools to escape MySQL strings. So consider running the queries in Perl, PHP, or any programming language that is capable of building and running the queries safely.
Also note that in terms of performance it is better to run a single query and then parse the result set in a loop instead of running multiple queries in a loop, with the exception of prepared statements.
I found a way to get to my solution by piecing together the few basic things that I know. Not elegant, but it works well enough for now. I created a file "[filename]Results" with the output:
nameA 1
nameB 1
I then cut out the "1"s and made a new file. I then did a comparison with "[fileName]results" to list.txt in order to see what lines exist in file.txt but do not exist in results.
Note: I have the following in my .zshrc file.
validOnSite() {
if [[ "$1" == "" ]] || [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
echo "Usage:"
echo " $ validOnSite [filename]"
echo " Where validOnSite uses specified file as variables in sql query:"
echo " SELECT name, active FROM dbDb WHERE name=lines in file"
else
cat $1 | while read line ; do hgsql -h genome-centdb hgcentral -Ne "select name from dbDb where name='$line' and active='1'" >> $1"Pass"; done
autoload -U colors
colors
echo $fg_bold[magenta]Assemblies active on site${reset_color}
echo
cat $1"Pass"
echo
echo $fg_bold[red]Not active or not found on site${reset_color}
comm -23 $1 $1"Pass" 2> /dev/null
echo
echo
mv $1"Pass" ~cath/myFiles/validOnSiteResults
echo "Results file containing only active assemblies resides in ~cath/myFiles/validOnSiteResults"
fi
}
list.txt:
nameA
nameB
foo
My input:
validOnSite list.txt
My output:
Assemblies active on site (<--this font is magenta)
nameA
nameB
Not active or not found on site (<--this font is red)
foo
Results file containing only active assemblies resides in ~me/myFiles/validOnRRresults

How can I tell if a jq filter successfully pulls data from a JSON data structure?

I want to know if a given filter succeeds in pulling data from a JSON data structure. For example:
###### For the user steve...
% Name=steve
% jq -j --arg Name "$Name" '.[]|select(.user == $Name)|.value' <<<'
[
{"user":"steve", "value":false},
{"user":"tom", "value":true},
{"user":"pat", "value":null},
{"user":"jane", "value":""}
]'
false
% echo $?
0
Note: successful results can include boolean values, null, and even the empty string.
###### Now for user not in the JSON data...
% Name=mary
% jq -j --arg Name "$Name" '.[]|select(.user == $Name)|.value' <<<'
[
{"user":"steve", "value":false},
{"user":"tom", "value":true},
{"user":"pat", "value":null},
{"user":"jane", "value":""}
]'
% echo $?
0
If the filter does not pull data from the JSON data structure, I need to know this. I would prefer the filter to return a non-zero return code.
How would I go about determining if a selector successfully pulls data from a JSON data structure vs. fails to pull data?
Important: The above filter is just an example, the solution needs to work for any jq filter.
Note: the evaluation environment is Bash 4.2+.
You can use the -e / --exit-status flag from the jq Manual, which says
Sets the exit status of jq to 0 if the last output values was neither false nor null, 1 if the last output value was either false or null, or 4 if no valid result was ever produced. Normally jq exits with 2 if there was any usage problem or system error, 3 if there was a jq program compile error, or 0 if the jq program ran.
I can demonstrate the usage with a basic filter as below, as your given example is not working for me.
For a successful query,
dudeOnMac:~$ jq -e '.foo?' <<< '{"foo": 42, "bar": "less interesting data"}'
42
dudeOnMac:~$ echo $?
0
For an invalid query, done with a non-existent entity zoo,
dudeOnMac:~$ jq -e '.zoo?' <<< '{"foo": 42, "bar": "less interesting data"}'
null
dudeOnMac:~$ echo $?
1
For an error scenario, returning code 2 which I created by double-quoting the jq input stream.
dudeOnMac:~$ jq -e '.zoo?' <<< "{"foo": 42, "bar": "less interesting data"}"
jq: error: Could not open file interesting: No such file or directory
jq: error: Could not open file data}: No such file or directory
dudeOnMac:~$ echo $?
2
I've found a solution that meets all of my requirements! Please let me know what you think!
The idea is use jq -e "$Filter" as a first-pass check. Then for the return code of 1, do a jq "path($Filter)" check. The latter will only succeed if, in fact, there is a path into the JSON data.
Select.sh
#!/bin/bash
Select()
{
local Name="$1"
local Filter="$2"
local Input="$3"
local Result Status
Result="$(jq -e --arg Name "$Name" "$Filter" <<<"$Input")"
Status=$?
case $Status in
1) jq --arg Name "$Name" "path($Filter)" <<<"$Input" >/dev/null 2>&1
Status=$?
;;
*) ;;
esac
[[ $Status -eq 0 ]] || Result="***ERROR***"
echo "$Status $Result"
}
Filter='.[]|select(.user == $Name)|.value'
Input='[
{"user":"steve", "value":false},
{"user":"tom", "value":true},
{"user":"pat", "value":null},
{"user":"jane", "value":""}
]'
Select steve "$Filter" "$Input"
Select tom "$Filter" "$Input"
Select pat "$Filter" "$Input"
Select jane "$Filter" "$Input"
Select mary "$Filter" "$Input"
And the execution of the above:
% ./Select.sh
0 false
0 true
0 null
0 ""
4 ***ERROR***
I've added an updated solution below
The fundamental problem here is that when try to retrieve a value from an object using the .key or .[key] syntax, jq — by definition — can't distinguish a missing key from a key with a value of null.
You can instead define your own lookup function:
def lookup(k):if has(k) then .[k] else error("invalid key") end;
Then use it like so:
$ jq 'lookup("a")' <<<'{}' ; echo $?
jq: error (at <stdin>:1): invalid key
5
$ jq 'lookup("a")' <<<'{"a":null}' ; echo $?
null
0
If you then use lookup consistently instead of the builtin method, I think that will give you the behaviour you want.
Here's another way to go about it, with less bash and more jq.
#!/bin/bash
lib='def value(f):((f|tojson)//error("no such value"))|fromjson;'
users=( steve tom pat jane mary )
Select () {
local name=$1 filter=$2 input=$3
local -i status=0
result=$( jq --arg name "$name" "${lib}value(${filter})" <<<$input 2>/dev/null )
status=$?
(( status )) && result="***ERROR***"
printf '%s\t%d %s\n' "$name" $status "$result"
}
filter='.[]|select(.user == $name)|.value'
input='[{"user":"steve","value":false},
{"user":"tom","value":true},
{"user":"pat","value":null},
{"user":"jane","value":""}]'
for name in "${users[#]}"
do
Select "$name" "$filter" "$input"
done
This produces the output:
steve 0 false
tom 0 true
pat 0 null
jane 0 ""
mary 5 ***ERROR***
This takes advantage of the fact the absence of input to a filter acts like empty, and empty will trigger the alternative of //, but a string — like "null" or "false" — will not.
It should be noted that value/1 will not work for filters that are simple key/index
lookups on objects/arrays, but neither will your solution. I'm reasonably sure that to
cover all the cases, you'd need something like this (or yours) and something
like get or lookup.
Given that jq is the way it is, and in particular that it is stream-oriented, I'm inclined to think that a better approach would be to define and use one or more filters that make the distinctions you want. Thus rather than writing .a to access the value of a field, you'd write get("a") assuming that get/1 is defined as follows:
def get(f): if has(f) then .[f] else error("\(type) is not defined at \(f)") end;
Now you can easily tell whether or not an object has a key, and you're all set to go. This definition of get can also be used with arrays.

Periodically reading output from async background scripts

Context: I'm making my own i3-Bar script to read output from other (asynchronous) scripts running in background, concatenate them and then echo them to i3-Bar itself.
The way I'm passing outputs is in plain files, and I guess (logically) the problem is that the files are sometimes read and written at the same time. The best way to reproduce this behavior is by suspending the computer and then waking it back up - I don't know the exact cause of this, I can only go on what I see from my debug log files.
Main Code: Added comments for clarity
#!/usr/bin/env bash
cd "${0%/*}";
trap "kill -- -$$" EXIT; #The bg. scripts are on a while [ 1 ] loop, have to kill them.
rm -r ../input/*;
mkdir ../input/; #Just in case.
for tFile in ./*; do
#Run all of the available scripts in the current directory in the background.
if [ $(basename $tFile) != "main.sh" ]; then ("$tFile" &); fi;
done;
echo -e '{ "version": 1 }\n['; #I3-Bar can use infinite array of JSON input.
while [ 1 ]; do
input=../input/*; #All of the scripts put their output in this folder as separate text files
input=$(sort -nr <(printf "%s\n" $input));
output="";
for tFile in $input; do
#Read and add all of the files to one output string.
if [ $tFile == "../input/*" ]; then break; fi;
output+="$(cat $tFile),";
done;
if [ "$output" == "" ]; then
echo -e "[{\"full_text\":\"ERR: No input files found\",\"color\":\"#ff0000\"}],\n";
else
echo -e "[${output::-1}],\n";
fi;
sleep 0.2s;
done;
Example Input Script:
#!/usr/bin/env bash
cd "${0%/*}";
while [ 1 ]; do
echo -e "{" \
"\"name\":\"clock\"," \
"\"separator_block_width\":12," \
"\"full_text\":\"$(date +"%H:%M:%S")\"}" > ../input/0_clock;
sleep 1;
done;
The Problem
The problem isn't the script itself, but the fact, that i3-Bar receives a malformed JSON input (-> parse error), and terminates - I'll show such log later.
Another problem is, that the background scripts should run asynchronously, because some need to update every 1 second nad some only every 1 minute, etc. So the use of a FIFO isn't really an option, unless I create some ugly inefficient hacky stuff.
I know there is a need for IPC here, but I have no idea how to effieciently do this.
Script output from randomly crashing - waking up error looks the same
[{ "separator_block_width":12, "color":"#BAF2F8", "full_text":"192.168.1.104 "},{ "separator_block_width":12, "color":"#BAF2F8", "full_text":"100%"}],
[{ "separator_block_width":12, "color":"#BAF2F8", "full_text":"192.168.1.104 "},,],
(Error is created by the second line)
As you see, the main script tries to read the file, doesn't get any output, but the comma is still there -> malformed JSON.
The immediate error is easy to fix: don't append an entry to output if the corresponding file is empty:
for tFile in $input; do
[[ $tFile != "../input/*" ]] &&
[[ -s $tFile ]] &&
output+="$(<$tFile),"
done
There is a potential race condition here, though. Just because a particular input file exists doesn't mean that the data is fully written to it yet. I would change your input scripts to look something like
#!/usr/bin/env bash
cd "${0%/*}";
while true; do
o=$(mktemp)
printf '{"name": "clock", "separator_block_width": 12, "full_text": %(%H:%M:%S)T}\n' > "$o"
mv "$o" ../input/0_clock
sleep 1
done
Also, ${output%,} is a safer way to trim a trailing comma when necessary.