In my assessment I'm asked to write a shell script using only bash commands and another shell script using only SQL queries. These scripts should do the following:
1. Clean data in the .csv file (not important at the moment)
2. Sum up earnings based upon gender
3. Produce a simple HTML table
I have made the SQL query produce the correct numbers and HTML file, but with som help from other bash commands.
For the file that should only contain bash commands I'm able to get the table but one of the numbers are wrong.
I'm very new to bash scripting and SQL queries so the code isn't very optimised.
The following is a shortned version of the sample input:
CSV input
title,site,country,year_release,box_office,director,number_of_subjects,subject,type_of_subject,race_known,subject_race,person_of_color,subject_sex,lead_actor_actress
10 Rillington Place,http://www.imdb.com/title/tt0066730/,UK,1971,-,Richard Fleischer,1,John Christie,Criminal,Unknown,,0,Male,Richard Attenborough
12 Years a Slave,http://www.imdb.com/title/tt2024544/,US/UK,2013,56700000,Steve McQueen,1, Solomon Northup,Other,Known,African American,1,Male,Chiwetel Ejiofor
127 Hours,http://www.imdb.com/title/tt1542344/,US/UK,2010,18300000,Danny Boyle,1,Aron Ralston,Athlete,Unknown,,0,Male,James Franco
1987,http://www.imdb.com/title/tt2833074/,Canada,2014,-,Ricardo Trogi,1,Ricardo Trogi,Other,Known,White,0,Male,Jean-Carl Boucher
20 Dates,http://www.imdb.com/title/tt0138987/,US,1998,537000,Myles Berkowitz,1,Myles Berkowitz,Other,Unknown,,0,Male,Myles Berkowitz
21,http://www.imdb.com/title/tt0478087/,US,2008,81200000,Robert Luketic,1,Jeff Ma,Other,Known,Asian American,1,Male,Jim Sturgess
24 Hour Party People,http://www.imdb.com/title/tt0274309/,UK,2002,1130000,Michael Winterbottom,1,Tony Wilson,Musician,Known,White,0,Male,Steve Coogan
42,http://www.imdb.com/title/tt0453562/,US,2013,95000000,Brian Helgeland,1,Jackie Robinson,Athlete,Known,African American,1,Male,Chadwick Boseman
8 Seconds,http://www.imdb.com/title/tt0109021/,US,1994,19600000,John G. Avildsen,1,Lane Frost,Athlete,Unknown,,0,Male,Luke Perry
84 Charing Cross Road,http://www.imdb.com/title/tt0090570/,US/UK,1987,1080000,David Hugh Jones,2,Frank Doel,Author,Unknown,,0,Male,Anthony Hopkins
84 Charing Cross Road,http://www.imdb.com/title/tt0090570/,US/UK,1987,1080000,David Hugh Jones,2,Helene Hanff,Author,Unknown,,0,Female,Anne Bancroft
A Beautiful Mind,http://www.imdb.com/title/tt0268978/,US,2001,171000000,Ron Howard,1,John Nash,Academic,Unknown,,0,Male,Russell Crowe
A Dangerous Method,http://www.imdb.com/title/tt1571222/,Canada/UK,2011,5700000,David Cronenberg,3,Carl Gustav Jung,Academic,Known,White,0,Male,Michael Fassbender
A Dangerous Method,http://www.imdb.com/title/tt1571222/,Canada/UK,2011,5700000,David Cronenberg,3,Sigmund Freud,Academic,Known,White,0,Male,Viggo Mortensen
A Dangerous Method,http://www.imdb.com/title/tt1571222/,Canada/UK,2011,5700000,David Cronenberg,3,Sabina Spielrein,Academic,Known,White,0,Female,Keira Knightley
A Home of Our Own,http://www.imdb.com/title/tt0107130/,US,1993,1700000,Tony Bill,1,Frances Lacey,Other,Unknown,,0,Female,Kathy Bates
A Man Called Peter,http://www.imdb.com/title/tt0048337/,US,1955,-,Henry Koster,1,Peter Marshall,Other,Known,White,0,Male,Richard Todd
A Man for All Seasons,http://www.imdb.com/title/tt0060665/,UK,1966,-,Fred Zinnemann,1,Thomas More,Historical,Known,White,0,Male,Paul Scofield
A Matador's Mistress,http://www.imdb.com/title/tt0491046/,US/UK,2008,-,Menno Meyjes,2,Lupe Sino,Actress ,Known,Hispanic (White),0,Female,PenÌÎå©lope Cruz
For the SQL queries only file this is my code so far (produces right numbers and correct table):
python3 csv2sqlite.py --table-name test_table --input table.csv --output table.sqlite
echo -e '<TABLE BORDER = "1">
<TR><TH>Gender</TH>
<TH>Total Amount [$]</TH>
</TR>' >> tmp1.txt
sqlite3 biopics.sqlite 'SELECT subject_sex,SUM(earnings) FROM table \
GROUP BY subject_sex;' -html > tmp2.txt
cat tmp2.txt >> tmp1.txt
echo '</TABLE>' >> tmp1.txt
cp tmp1.txt $1
cat $1
rm tmp1.txt tmp2.txt
For the bash only file this is my code so far:
echo -e '<TABLE BORDER = "1">
<TR><TH>Gender</TH>
<TH>Total Amount [$]</TH>
</TR>' >> tmp1.txt
awk -F ',' '{for (i=1;i<=NF;i++)
if ($1)
a[$13] += $5} END{for (i in a) printf("<TR><TD> %s </TD><TD> %i </TD></TR>\n", i, a[i])}' table.csv | sort | head -2 > tmp2.txt
cat tmp2.txt >> tmp1.txt
echo -e "</TABLE>" >> tmp1.txt
cp tmp1.txt $1
cat $1
rm tmp1.txt tmp2.txt
The expected output should look like this:
<TABLE BORDER = "1">
<TR><TH>Gender</TH>
<TH>Total Amount [$]</TH>
</TR>
<TR><TD>Female</TD>
<TD>8480000.0</TD>
</TR>
<TR><TD>Male</TD>
<TD>455947000.0</TD>
</TR>
</TABLE>
Thank you in advance!
#! /bin/bash
awk -F, '{
if (NR != 1)
{
if (sum[$13] == "")
{
sum[$13]=0
}
sum[$13]+=$5
}
}
END {
print "<TABLE BORDER = \"1\">"
print "<TR><TH>Gender</TH><TH>Total Amount [$]</TH></TR>"
for ( gender in sum )
{
print "<TR><TD>"gender"</TD>", "<TD>"sum[gender]"</TD></TR>"
}
print "</TABLE>"
}' table.csv
Here try this if it works for you.
UPDATE:
What I understand from your comment is that you want to sort data as per the sum.
#! /bin/bash
awk -F, -v OFS=, '{
if (NR != 1)
{
if (sum[$13] == "")
{
sum[$13]=0
}
sum[$13]+=$5
}
}
END {
for ( gender in sum )
{
print gender, sum[gender]
}
}' table.csv | sort -nk 2,2 |
awk -v firstline="$(sed -n '1p' table.csv)" '{
printrow($0)
}
BEGIN {
split(firstline, headers, ",")
print "<html>"
print "<TABLE BORDER = "1">"
printrow(headers[5]","headers[13], 1)
}
END {
print "</table>"
print "</html>"
}
function printrow(row, flag)
{
# if flag == 0 or null "<TD>" else "<TH>"
len = split(row, cells, ",")
print "<TR>"
for (i = 1 ; i <= len ; ++i)
{
if (!flag)
print "<TD>"cells[i]"</TD>"
else
print "<TH>"cells[i]"</TH>"
}
print "</TR>"
}'
Above, I have basically divided what you need into 2 modules,
Manipulating data in table:
1) Just organises the table
2) Sorts data as per the 2nd column. This one I should have had done in the first awk script itself but it was a little shorter this way.
Converting it into an html table:
The second awk script receives output from the first one.
It sets the headings and tags.
I feel its more modular this way. This just makes it easier to make modifications. First script for data manipulation and second for placing headers or tags.
What I would personally like is giving the second awk script its own executable file. Now simply using first script for data manipulation and then passing it to another script for setting html tags and headers.
There might be better alternatives, I suggested the best I knew.
I'm reading JSON in a shell script using JQ. Here, I'm unable to interpret the variables $HOME, $HOST, $PEMFILE in my shell script on the fly.
JSON File:
{
"script": {
"install": "${HOME}/lib/install.sh $HOST $PEMFILE",
"Setup": "${HOME}/lib/setup.sh $HOST $PEMFILE $VAR1 $VAR2"
}
}
Shell Script:
#!/bin/bash
examplefile="../lib/example.json"
HOST=ec2-..-...-...-...us-west-2.compute.amazonaws.com
PEMFILE=${HOME}/test.pem
installScript=($(jq '.script.install' $examplefile))
bash "$installScript"
Is there a way I can interpret these variables on the fly without modifying the JSON?
P.S I don't want to use eval.
It is easy using gnu utility envsubst:
installScript=$(jq -r '.script.install' "$examplefile" | envsubst)
Here is a solution using env and gsub to perform the replacement.
Note that env requires the variables to be passed as environment variables as opposed to shell variables.
#!/bin/bash
examplefile="../lib/example.json"
HOST=ec2-..-...-...-...us-west-2.compute.amazonaws.com
PEMFILE=${HOME}/test.pem
export HOST
export PEMFILE
installScript=$(jq -Mr '
.script.install | gsub("(?<x>[$][{]?\\w+[}]?)"; env[.x|gsub("[${}]+";"")] )
' $examplefile)
echo $installScript
Sample Output
/home/runner/lib/install.sh ec2-..-...-...-...us-west-2.compute.amazonaws.com /home/runner/test.pem
Try it online!
Specific solution
Here's a jq solution to the stated problem, though it will only work for "global" environment variables.
def substitute:
gsub("\\${HOME}"; env.HOME)
| gsub("\\$HOST"; env.HOST)
| gsub("\\$PEMFILE"; env.PEMFILE)
| gsub("\\$VAR1"; env.VAR1)
| gsub("\\$VAR2"; env.VAR2)
;
walk( if type=="string" then substitute else . end )
If your jq does not already have walk/1, then please either upgrade your jq or snarf the def from https://github.com/stedolan/jq/blob/master/src/builtin.jq
The solution above is a bit brittle but it could easily be robustified or generalized, as shown in the next section.
General solution
walk(if type == "string"
then gsub("\\$(?<x>[A-Za-z_][A-Za-z0-9_]+)"; "\(env[.x])")
| gsub("\\${(?<x>[A-Za-z_][A-Za-z0-9_]+)}"; "\(env[.x])")
else . end)
#!/bin/sh
TMP=$(mktemp /tmp/$$.XXX)
cat<<E_O_F > $TMP
cat <<EOF
$(cat so-dollar-variables.json)
EOF
E_O_F
. $TMP
/bin/rm "$TMP"
I've been hitting this on and off for years. I think I've finally got a decent pure-bash solution: uses regex matching and indirect parameter substitution
# read the file
json=$(< file.json)
echo step 0
echo "$json"
# set the relevant vars, just plain shell variables
HOST=_host_
PEMFILE=_pemfile_
VAR1=_var1_
VAR2=_var2_
# replace '$var' forms
while [[ $json =~ ("$"([[:alnum:]_]+)) ]]; do
json=${json//${BASH_REMATCH[1]}/${!BASH_REMATCH[2]}}
done;
echo
echo step 1
echo "$json"
# replace '${var}' forms
while [[ $json =~ ("$""{"([[:alnum:]_]+)"}") ]]; do
json=${json//${BASH_REMATCH[1]}/${!BASH_REMATCH[2]}}
done
echo
echo step 2
echo "$json"
Output
step 0
{
"script": {
"install": "${HOME}/lib/install.sh $HOST $PEMFILE",
"Setup": "${HOME}/lib/setup.sh $HOST $PEMFILE $VAR1 $VAR2"
}
}
step 1
{
"script": {
"install": "${HOME}/lib/install.sh _host_ _pemfile_",
"Setup": "${HOME}/lib/setup.sh _host_ _pemfile_ _var1_ _var2_"
}
}
step 2
{
"script": {
"install": "/home/jackman/lib/install.sh _host_ _pemfile_",
"Setup": "/home/jackman/lib/setup.sh _host_ _pemfile_ _var1_ _var2_"
}
}
The magic is:
the regular expression, where I capture both $VAR and VAR, and
[[ $json =~ ("$"([[:alnum:]_]+)) ]]
# ..........1 2 21
the parameter substitution, where I search for the string "$VAR" and replace it with the indirect variable expansion ${!VAR}
${json//${BASH_REMATCH[1]}/${!BASH_REMATCH[2]}}