bash - extract data from mysql table (GROUP BY)- how to process - mysql

I have mySQL table:
+----+---------------------+-------+
| id | timestamp | value |
+----+---------------------+-------+
| 1 | 2016-03-29 18:53:28 | 1 |
| 2 | 2016-03-29 20:26:06 | 1 |
| 3 | 2016-03-29 20:26:22 | 1 |
+----+---------------------+-------+
3 rows in set (0.00 sec)
It is a table to hold water consumption data (each 1 in value is a 1 liter of water).
I wrote a bash script to extract data - sum of litres of water by months.
watersum=`echo " SELECT MONTHNAME(timestamp), SUM(value) FROM woda GROUP BY YEAR(timestamp), MONTH(timestamp);" | mysql -s -u$SQUSER -p$SQPASS -h$SQHOST $SQLDB`
echo $watersum
gives me:
March 693 April 9768 May 11277 June 11987 July 10047 August 8570
I would like to save this data in json file. How do convert the string in $watersum to a json string?

Make watersum an array
watersum=( $(echo " SELECT MONTHNAME(timestamp), SUM(value) FROM woda GROUP BY YEAR(timestamp), MONTH(timestamp);" | mysql -s -u$SQUSER -p$SQPASS -h$SQHOST $SQLDB) )
echo "{" && for((i=0;i<"${#watersum[#]}";i+=2))
do
echo -n "\"${watersum[$i]}\":\"${watersum[((i+1))]}\"";
(( (i+2) == "${#watersum[#]}" )) || echo ","
done && echo;echo "}"
Output
{
"March":"693",
"April":"9768",
"May":"11277",
"June":"11987",
"July":"10047",
"August":"8570"
}

Related

shell: treatment of the multi-line format according to its column patterns

Dealing with multi-line CSV file, I am looking for a possible Bash shell workflow that could be useful for its treatment. Here is format of the file containing data in multi-column format:
/scratch_p/johnycash/results_test_docking/7000/7000_01_lig_cne_1000.dlg: 6 | -4.86 | 2 | -4.79 | 4 |####
/scratch_p/johnycash/results_test_docking/7000/7000_01_lig_cne_1001.dlg: 2 | -5.25 | 10 | -5.22 | 8 |########
/scratch_p/johnycash/results_test_docking/7000/7000_01_lig_cne_1002.dlg: 5 | -5.76 | 6 | -5.48 | 3 |###
/scratch_p/johnycash/results_test_docking/7000/7000_01_lig_cne_1003.dlg: 4 | -3.88 | 17 | -3.50 | 3 |###
/scratch_p/johnycash/results_test_docking/7000/7000_01_lig_cne_1009.dlg: 5 | -4.51 | 5 | -4.39 | 4 |####
/scratch_p/johnycash/results_test_docking/7000/7000_01_lig_cne_100.dlg: 3 | -4.40 | 11 | -4.38 | 9 |#########
/scratch_p/johnycash/results_test_docking/7000/7000_01_lig_cne_1010.dlg: 1 | -5.07 | 15 | -4.51 | 5 |#####
/scratch_p/johnycash/results_test_docking/7000/7000_01_lig_cne_150.dlg: 4 | -5.01 | 5 | -4.82 | 3 |###
/scratch_p/johnycash/results_test_docking/7000/7000_01_lig_cne_156.dlg: 2 | -5.38 | 11 | -4.70 | 3 |###
/scratch_p/johnycash/results_test_docking/7000/7000_01_lig_cne_157.dlg: 1 | -4.22 | 10 | -4.16 | 7 |#######
/scratch_p/johnycash/results_test_docking/7000/7000_01_lig_cne_167.dlg: 2 | -3.85 | 3 | -3.69 | 9 |#########
/scratch_p/johnycash/results_test_docking/7000/7000_01_lig_cne_168.dlg: 2 | -4.42 | 12 | -4.01 | 6 |######
/scratch_p/johnycash/results_test_docking/7000/7000_01_lig_cne_169.dlg: 2 | -4.94 | 17 | -4.80 | 5 |#####
/scratch_p/johnycash/results_test_docking/7000/7000_01_lig_cne_16.dlg: 1 | -6.23 | 4 | -5.77 | 4 |###
According to the format: all the columns with valuable information are divided by | with the exception of the first column (name of the line), divided by : from the rest. The script should operate with following post-processing:
Descending sorting of all lines according to the value from the third column (from mostly negative to positive values);
Set up some filter to the last column (according to the number of #), discarding all of the lines containing #, ## or ###. Alternatively this filter can be applied on the penultimate column, which expresses the number of #characters as a number.
While I can do the first task using sort
sort -t '|' -k 3 filename.csv
and the second may be achieved using AWK
awk '(NR>1) && ($8 > 2) ' filename.csv > filename_processed.txt
how could I combine the both commands in efficient fashion taking into account the format of my file?
Could you please try following, written and tested in shown samples in GNU awk.
awk '
BEGIN{
FS=OFS="|"
}
gsub(/#/,"&",$6)>4
' Input_file | sort -t'|' -nk 3 > output_file
EDIT: As per OP's comment to get last 10% lines from starting of Input_file you could following, take above command's output into a output file and could run following.
awk -v lines="$(wc -l < output_file)" '
BEGIN{
tenPer=int(lines/10)
}
FNR>(tenPer){exit}
1
' output_file
For getting 10% last lines of output_file try:
tac output_file |
awk -v lines="$(wc -l < output_file)" 'BEGIN{tenPer=int(lines/10)} FNR>tenPer{exit} 1' |
tac
OR
awk -v lines="$(wc -l < output_file)" 'BEGIN{tenPer=int(lines/10)} FNR>=(lines-tenPer)' output_file
You can try:
sort -nr -k 4 scratch.scv | grep -v -E "[^#]#{1,3}$"
Sort base on column value and eject the line with 1-3 number of #.
It is better to sort at the end with fewer lines.
grep -E "#{4}$" file | sort -t"|" -nk3
If you need to filter for different number of # modify the number in the expression of grep. If you need reversed sorting add the r parameter to the sort command. If you need sorting per different column, modify the k argument.
If your commands are really all you need, trivially
awk '(NR>1) && ($8 > 2) ' filename.csv |
sort -t '|' -k 3 filename.csv > filename_processed.txt

Print column names only once in shell script

Here is my shell script and It is working fine without any errors. But i want to get output differently.
Script:
#!/bin/bash
DB_USER='root'
DB_PASSWD='123456'
DB_NAME='job'
Table_Name='status_table'
#sql=select job_name, date,status from $DB_NAME.$Table_Name where job_name='$f1' and date=CURDATE()
file="/root/jobs.txt"
while read -r f1
do
mysql -N -u$DB_USER -p$DB_PASSWD <<EOF
select job_name, date,status from $DB_NAME.$Table_Name where job_name='$f1' and date=CURDATE()
EOF
done <"$file"
Source Table:
mysql> select * from job.status_table
+---------+----------+------------+-----------+
| Job_id | Job Name | date | status |
+---------+----------+--------+---------------+
| 111 | AA | 2016-12-01 | completed |
| 112 | BB | 2016-12-01 | completed |
| 113 | CC | 2016-12-02 | completed |
| 112 | BB | 2016-12-01 | completed |
| 114 | DD | 2016-12-02 | completed |
| 201 | X | 2016-12-03 | completed |
| 202 | y | 2016-12-04 | completed |
| 203 | z | 2016-12-03 | completed |
| 111 | A | 2016-12-04 | completed |
+---------+----------+------------+-----------+
Input text file
[rteja#server0 ~]# more jobs.txt
AA
BB
CC
DD
X
Y
Z
A
ABC
XYZ
Output - Supressed coumn names
(mysql -N -u$DB_USER -p$DB_PASSWD <<EOF)
[rteja#server0 ~]# ./script.sh
AA 2016-12-01 completed
BB 2016-12-01 completed
Output - without Suppressed column names, output printing the columns names for every loop iteration.
(mysql -u$DB_USER -p$DB_PASSWD <<EOF)
[rteja#server0 ~]# ./script.sh
job_name date status
AA 2016-12-01 completed
job_name date status
BB 2016-12-01 completed
Challenges:
1. Want to print column names only once in output and the result i want to store in CSV file.
2. I don't want to expose password & username in code to everyone. Is there way to hide like i heard we can create environmental variables and call it in the script. And we can set the permissions for the environmental variable file to prevent everyone to access it, and only our should be able to access it.
Rather than executing a select query multiple times you can run a single query as:
job_name in ('AA','BB','CC'...)
To do that first read complete file in an array using mapfile:
mapfile -t arr < jobs.txt
Then format the array values into a list of values suited for IN operator:
printf -v cols "'%s'," "${arr[#]}"
cols="(${cols%,})"
Display your values:
echo "$cols"
('AA','BB','CC','DD','X','Y','Z','A','ABC','XYZ')
Finally run your SQL query as:
mysql -N -u$DB_USER -p$DB_PASSWD <<EOF
select job_name, date,status from $DB_NAME.$Table_Name
where job_name IN "$cols" and date=CURDATE();
EOF
To securely connecting to MySQL use login-paths (.mylogin.cnf)
As per MySQL manual:
The best way to specify server connection information is with your .mylogin.cnf file. Not only is this file encrypted, but any logging of the utility execution does not expose the connection information.

Query to split the comma seperated column value into the multiple rows

i have one mysql table name called mem_exam_dates.
+---------+-------------+----------------------------------+
| rec_id | mem_name | exam_dates |
+---------+-------------+----------------------------------+
| 1 | Raju | 2015-01-05,2015-05-09,2018-05-09 |
| 2 | Rajes | 2015-10-05,2015-12-09,2018-09-09 |
+---------+-------------+----------------------------------+
now i want to display the result as below.
+-------+---------------+
| Raju | Exam Dates |
+-------+---------------+
| 2015-01-05 |
| 2015-05-09 |
| 2018-05-09 |
+-----------------------+
i am writing the query like
select * from mem_exam_dates where rec_id=1
from the above query i am getting total exam dates as single string.
but i want the exam dates as below.
+----------------+
| 2015-01-05 |
| 2015-05-09 |
| 2018-05-09 |
+----------------+
what is the query for that one?
if anybody knows let me know...
Thanks in advance
kalyan
In MySQL you can extract a part from string with SUBSTRING_INDEX.
So you can try the follow SQL
SELECT SUBSTRING_INDEX(exam_dates, ',', 1) as first
, SUBSTRING_INDEX(SUBSTRING_INDEX(exam_dates, ',', 1), ',', -1) as second
, SUBSTRING_INDEX(exam_dates, ',', -1) as third
In .NET you can use split()
https://msdn.microsoft.com/de-de/library/b873y76a%28v=VS.110%29.aspx
If you have the column in a variable named exam_dates you can use:
string [] dates = exam_dates.Split(new Char [] {','});
So you have an array of all dates.
You need to process the data in the server side language. If you are using PHP, you would use explode.
For example:
$dates= explode(",", $longdatestring);
echo $dates[0]; // date1
echo $dates[1]; // date2
To find the number of dates returned, simply count the elements in your array.
$numberofdates = count($dates);
Or look through the array untill you run out of elements.
foreach ($dates as $date) {
echo $date;
}
For further details in PHP:
https://secure.php.net/manual/en/function.explode.php

Count and select all dates for a specific field in MySQL

i have a data format like this:
+----+--------+---------------------+
| ID | utente | data |
+----+--------+---------------------+
| 1 | Man1 | 2014-02-10 12:12:00 |
+----+--------+---------------------+
| 2 | Women1 | 2015-02-10 12:12:00 |
+----+--------+---------------------+
| 3 | Man2 | 2016-02-10 12:12:00 |
+----+--------+---------------------+
| 4 | Women1 | 2014-03-10 12:12:00 |
+----+--------+---------------------+
| 5 | Man1 | 2014-04-10 12:12:00 |
+----+--------+---------------------+
| 6 | Women1 | 2014-02-10 12:12:00 |
+----+--------+---------------------+
I want to make a report that organise the ouptout in way like this:
+---------+--------+-------+---------------------+---------------------+---------------------+
| IDs | utente | count | data1 | data2 | data3 |
+---------+--------+-------+---------------------+---------------------+---------------------+
| 1, 5 | Man1 | 2 | 2014-02-10 12:12:00 | 2014-04-10 12:12:00 | |
+---------+--------+-------+---------------------+---------------------+---------------------+
| 2, 4, 6 | Women1 | 3 | 2015-02-10 12:12:00 | 2014-03-10 12:12:00 | 2014-05-10 12:12:00 |
+---------+--------+-------+---------------------+---------------------+---------------------+
All the row thath include the same user (utente) more than one time will be included in one row with all the dates and the count of records.
Thanks
While it's certainly possible to write a query that returns the data in the format you want, I would suggest you to use a GROUP BY query and two GROUP_CONCAT aggregate functions:
SELECT
GROUP_CONCAT(ID) as IDs,
utente,
COUNT(*) as cnt,
GROUP_CONCAT(data ORDER BY data) AS Dates
FROM
tablename
GROUP BY
utente
then at the application level you can split your Dates field to multiple columns.
Looks like a fairly standard "Breaking" report, complicated only by the fact that your dates extend horizontally instead of down...
SELECT * FROM t ORDER BY utente, data
$lastutente = $lastdata = '';
echo "<table>\n";
while ($row = fetch()) {
if ($lastutente != $row['utente']) {
if ($lastutente != '') {
/****
* THIS SECTION REF'D BELOW
***/
echo "<td>$cnt</td>\n";
foreach ($datelst[] as $d)
echo "<td>$row[data]</td>\n";
for ($i = count($datelst); $i < $NumberOfDateCells; $i++)
echo "<td> </td>\n";
echo "</tr>\n";
/****
* END OF SECTION REF'D BELOW
***/
}
echo "<tr><td>$row[utente]</td>\n"; // start a new row - you probably want to print other stuff too
$datelst = array();
$cnt = 0;
}
if ($lastdata != $row['data']) {
datelst[] = $row['data'];
}
$cnt += $row['cnt']; // or $cnt++ if it's one per row
}
print the end of the last row - see SECTION REF'D ABOVE
echo "</table>\n";
You could add a GROUP BY utente, data to your query above to put a little more load on mysql and a little less on your code - then you should have SUM(cnt) as cnt or COUNT(*) as cnt.

How to find duplicate set of mysql ids from bash script

+---------+-------------+--------------+
|table_uid| group_uid | product_uid |
+---------+-------------+--------------+
| 8901 | 5206 | 184 |
| 8902 | 5206 | 523 |
| 9194 | 5485 | 184 |
| 9195 | 5485 | 523 |
| 7438 | 1885 | 184 |
| 7439 | 1885 | 184 |
+---------+-------------+--------------+
My goal here is to show any group_uids that contain the same exact set of product_uids. So group_uid 5206 and 5485 would end up displaying while 1885 would not since it does not have the same set of product_uids. I have to accomplish this through bash script as I do not have the ability to do this in MySQL 3.23.58 (yes, it's horribly old and I hate it but not my choice). I'm trying to store the table_uid, group_uid and product_uid in an array and then compare each group_uid to see if they contain the same product_uids. Help is appreciated!
How are you displaying the table above?
You might have some success with GROUP_CONCAT Can I concatenate multiple MySQL rows into one field?
SELECT group_uid, GROUP_CONCAT(product_uid,separator ','), count(*)
FROM <tab>
GROUP BY group_uid
HAVING count(*) > 1
I'm not sure how it would order the strings as I don't have mysql at present
Here's a bit of awk to collect the groups that belong to the same set of products:
awk '
NR > 3 && NF == 7 {
prod[$4] = prod[$4] $6 " "
}
END {
for (group in prod)
groups[prod[group]] = groups[prod[group]] group " "
for (key in groups)
print key ":" groups[key]
}
' mysql.out
184 523 :5206 5485
184 184 :1885
If you know which set of product ids you're interested in, you can pass that in:
awk -v prod_ids="184 523" '
NR > 3 && NF == 7 {
prod[$4] = prod[$4] $6 " "
}
END {
for (group in prod)
groups[prod[group]] = groups[prod[group]] group " "
key = prod_ids " "
if (key in groups)
print groups[key]
}
' mysql.out
5206 5485