Mysql difference between max() and min()? - mysql

I got a problem with a mySql query and max() function.
If I do :
Select * from Data group by experiment having min(timestamp)
This query return what I want, and correct value.
I got this :
+----------+---------+----------+---------------------+----------------+------------+
| id | mote_id | label_id | timestamp | value | experiment |
+----------+---------+----------+---------------------+----------------+------------+
| 3768806 | 10 | 30 | 2014-04-22 14:37:07 | 0 | 13 |
| 10989209 | 12 | 22 | 2014-04-25 10:44:03 | 2.532958984375 | 15 |
| 11943537 | 6 | 19 | 2014-05-05 17:20:15 | 1228 | 16 |
| 12042549 | 16 | 26 | 2014-05-06 10:48:59 | 22.86 | 17 |
| 12176642 | 15 | 23 | 2014-05-07 15:19:35 | 0 | 18 |
| 12195344 | 10 | 6 | 2014-05-07 15:27:23 | 3460 | 19 |
| 12222470 | 15 | 8 | 2014-05-07 15:38:38 | 1 | 21 |
| 12343934 | 10 | 19 | 2014-05-12 10:35:42 | 742 | 23 |
+----------+---------+----------+---------------------+----------------+------------+
But, if i do :
Select * from Data group by experiment having max(timestamp)
This query return wrong values... like this :
+----------+---------+----------+---------------------+----------------+------------+
| id | mote_id | label_id | timestamp | value | experiment |
+----------+---------+----------+---------------------+----------------+------------+
| 3768806 | 10 | 30 | 2014-04-22 14:37:07 | 0 | 13 |
| 10989209 | 12 | 22 | 2014-04-25 10:44:03 | 2.532958984375 | 15 |
| 11943537 | 6 | 19 | 2014-05-05 17:20:15 | 1228 | 16 |
| 12042549 | 16 | 26 | 2014-05-06 10:48:59 | 22.86 | 17 |
| 12176642 | 15 | 23 | 2014-05-07 15:19:35 | 0 | 18 |
| 12195344 | 10 | 6 | 2014-05-07 15:27:23 | 3460 | 19 |
| 12222470 | 15 | 8 | 2014-05-07 15:38:38 | 1 | 21 |
| 12343934 | 10 | 19 | 2014-05-12 10:35:42 | 742 | 23 |
+----------+---------+----------+---------------------+----------------+------------+
In the first query, if I replace min(timestamp) by timestamp=min(timestamp), it works, but in the second, "timestamp=max(timestamp)" return nothing
Finally, Select experiment,max(timestamp) return correct values.
mysql> select *,max(timestamp) from Data group by experiment;
+----------+---------+----------+---------------------+----------------+------------+---------------------+
| id | mote_id | label_id | timestamp | value | experiment | max(timestamp) |
+----------+---------+----------+---------------------+----------------+------------+---------------------+
| 3768806 | 10 | 30 | 2014-04-22 14:37:07 | 0 | 13 | 2014-04-24 16:03:29 |
| 10989209 | 12 | 22 | 2014-04-25 10:44:03 | 2.532958984375 | 15 | 2014-05-05 10:34:35 |
| 11943537 | 6 | 19 | 2014-05-05 17:20:15 | 1228 | 16 | 2014-05-06 10:35:15 |
| 12042549 | 16 | 26 | 2014-05-06 10:48:59 | 22.86 | 17 | 2014-05-07 15:19:33 |
| 12176642 | 15 | 23 | 2014-05-07 15:19:35 | 0 | 18 | 2014-05-07 15:27:23 |
| 12195344 | 10 | 6 | 2014-05-07 15:27:23 | 3460 | 19 | 2014-05-07 15:38:01 |
| 12222470 | 15 | 8 | 2014-05-07 15:38:38 | 1 | 21 | 2014-05-07 16:30:38 |
| 12343934 | 10 | 19 | 2014-05-12 10:35:42 | 742 | 23 | 2014-05-14 09:25:44 |
+----------+---------+----------+---------------------+----------------+------------+---------------------+
I know I can make a subquery to solve my probleme, but the tables contains thousands rows, and this solution is too long...
Ps : I can't use Select*, max(timestamp) even if it works because the query is run by EJB in JEE.

You select not determined values grouped by field experiment. No one can give you a guarantee that non-agregated fields would correspond to MIN or MAX values of some aggregated field.
You HAVE TO use sub-query or self-join to get the right records.
See more here: http://dev.mysql.com/doc/refman/5.6/en/example-maximum-column-group-row.html

The HAVING clause expects a boolean expression. In other DBMS your code sample would trigger an error. In MySQL, you'll get the expression cast to boolean:
Zero → false
Non-zero → true
And since your expression is constant for the whole set, it won't filter out partial rows.
As about this:
HAVING timestamp = max(timestamp)
The HAVING clause evaluates after WHERE and GROUP BY. At that point, using individual row values of the timestamp column doesn't make any sense. As usual, MySQL allows that but you must take into account that:
In standard SQL, a query that includes a GROUP BY clause cannot refer
to nonaggregated columns in the HAVING clause that are not named in
the GROUP BY clause. A MySQL extension permits references to such
columns to simplify calculations. This extension assumes that the
nongrouped columns will have the same group-wise values. Otherwise,
the result is indeterminate.
In other words, your results are arbitrary (not even random).

Related

Sort Columns by Column_Totals through Alter Table or Select Query in MySQL at Runtime

I want to alter or generate Select Query of the Source_Table below at runtime by getting the column total (sum) first then sort according to its result:
Source_Table:
+----+------------+-----------+-----------+-----------+-----------+-----------+
| ID | Name | Field_1 | Field_2 | Field_3 | Field_4 | Field_5 |
+----+------------+-----------+-----------+-----------+-----------+-----------+
| 1 | abc | 10 | 18 | 5 | 21 | 6 |
+----+------------+-----------+-----------+-----------+-----------+-----------+
| 2 | ghq | 22 | 14 | 12 | 11 | 23 |
+----+------------+-----------+-----------+-----------+-----------+-----------+
| 3 | xyz | 35 | 8 | 16 | 7 | 4 |
+----+------------+-----------+-----------+-----------+-----------+-----------+
The Result_Table I am looking at is:
|--------------- sorted fields based on total --------------|
+------------+-----------+-----------+-----------+-----------+-----------+
| Name | Field_5 | Field_3 | Field_4 | Field_2 | Field_1 |
+------------+-----------+-----------+-----------+-----------+-----------+
| abc | 4 | 5 | 21 | 18 | 10 |
+------------+-----------+-----------+-----------+-----------+-----------+
| ghq | 23 | 12 | 11 | 14 | 22 |
+------------+-----------+-----------+-----------+-----------+-----------+
| xyz | 4 | 16 | 7 | 8 | 35 |
+------------+-----------+-----------+-----------+-----------+-----------+
| Total | 31 | 33 | 39 | 40 | 67 | --> get column sum and sort from lowest to highest
+------------+-----------+-----------+-----------+-----------+-----------+
I am not so sure if this is possible with MySQL as I am not able to find good reference in the internet for this case. But I will try..

MySql - Join self to determine time in state

I have a table (let's call it WorkflowLog) whose records are log records about a workflow processor.
The workflow processor moves records in various tables (Table1, Table2, etc) thru defined flowchart states. Each state has a number.
The WorkflowLog records are as follows:
| ID | TimeStamp | TableN | RecordId | Action | OldState | NewState |
-------------------------------------------------------------------------------------------
| 1 | 2016-09-16 15:50:00 | Table1 | 21 | State Change | 0 | 10 |
| 2 | 2016-09-16 15:50:00 | Table1 | 21 | Other Info | 0 | 10 |
| 3 | 2016-09-16 15:55:00 | Table2 | 21 | State Change | 0 | 10 |
| 4 | 2016-09-16 15:57:00 | Table1 | 21 | State Change | 10 | 20 |
| 5 | 2016-09-16 15:58:00 | Table1 | 21 | State Change | 20 | 30 |
| 6 | 2016-09-16 15:59:00 | Table1 | 21 | State Change | 30 | 20 |
| 7 | 2016-09-16 16:00:00 | Table1 | 21 | State Change | 20 | 30 |
| 8 | 2016-09-16 16:01:00 | Table1 | 52 | State Change | 0 | 10 |
| 9 | 2016-09-16 16:02:00 | Table1 | 21 | State Change | 30 | 999 |
| 10 | 2016-09-16 16:03:00 | Table3 | 25 | State Change | 0 | 10 |
I would like to determine the amount of time spent in each table state. Please NOTE that the workflow can loop so the record can be in state 20 (or any state) multiple times.
My first try was:
select
DFT1.ID as DFT1_Id,
DFT2.ID as DFT2_Id,
DFT1.TableN as TableName,
DFT1.RecordId as RecordId,
DFT1.OldState as State,
DFT2.Timestamp as EntryTime,
DFT1.TimeStamp as ExitTime,
TIMEDIFF(DFT1.TimeStamp, DFT2.Timestamp) as TimeInState
from WorkflowLog DFT1
inner join WorkflowLog DFT2
ON DFT1.TableN=DFT2.TableN AND
DFT1.RecordId=DFT2.RecordId AND
DFT1.`Action`='State Change' AND
DFT2.`Action`='State Change' AND
DFT1.OldState=DFT2.NewState
Order BY DFT1_Id
But this does not handle when a record loops in the workflow (as record 6 captures). It seems that I have to do something similar to the above BUT only match the previous most recent.
I am at a loss on how to do that.
Ultimately what I want is an output like:
| ID1 | ID2 | TableN | RecId | State | EntryTime | ExitTime | TimeInState |
------------------------------------------------------------------------------------------------
| 4 | 1 | Table1 | 21 | 10 | 2016-09-16 15:50:00 | 2016-09-16 15:57:00 | 00:07:00 |
| 5 | 4 | Table1 | 21 | 20 | 2016-09-16 15:57:00 | 2016-09-16 15:58:00 | 00:01:00 |
| 6 | 5 | Table1 | 21 | 30 | 2016-09-16 15:58:00 | 2016-09-16 15:59:00 | 00:01:00 |
| 7 | 6 | Table1 | 21 | 20 | 2016-09-16 15:59:00 | 2016-09-16 16:00:00 | 00:01:00 |
| 9 | 7 | Table1 | 21 | 30 | 2016-09-16 16:00:00 | 2016-09-16 16:02:00 | 00:02:00 |
ETC ...
EDIT:
As suggested I have added an SqlFiddle for the above with my first try SQL.
There are extra matches in the Result set due to Table1 record 21 going thru state 20 and 30 twice. Here are my comments about the SqlFiddle result set:
| ID | ID | My Comment |
| 4 | 1 | Good |
| 5 | 6 | Not Good - Matched with Future record |
| 5 | 4 | Good |
| 6 | 7 | Not Good - Matched with Future record |
| 6 | 5 | Good |
| 7 | 6 | Good |
| 7 | 4 | Not Good - Match to First time thru state 20 |
| 9 | 5 | Not Good - Match to First time thru state 30 |
| 9 | 7 | Good |

MySQL get multiple rows into columns

I have a table called visits where concat(s_id, c_id) is unique and id is the primary key. s_id is the ID number of a website and c_id is a campaign ID number. I want to show all the hits each campaign is getting and group by the site. I want each site on a single row
+-----+------+------+------+
| id | s_id | c_id | hits |
+-----+------+------+------+
| 1 | 13 | 8 | 245 |
| 2 | 13 | 8 | 458 |
| 3 | 13 | 3 | 27 |
| 4 | 13 | 4 | 193 |
| 5 | 14 | 1 | 320 |
| 6 | 14 | 1 | 183 |
| 7 | 14 | 3 | 783 |
| 8 | 14 | 4 | 226 |
| 9 | 5 | 8 | 671 |
| 10 | 5 | 8 | 914 |
| 11 | 5 | 3 | 548 |
| 12 | 5 | 4 | 832 |
| 13 | 22 | 8 | 84 |
| 14 | 22 | 1 | 7 |
| 15 | 22 | 3 | 796 |
| 16 | 22 | 4 | 0 |
+----+------+------+-------+
I would like to have the following result set:
s_id | hits | hits | hits| hits
13 | 245 | 458 | 27 | 193
14 | 320 | 183 | 783 | 226
5 | 671 | 914 | 548 | 832
22 | 84 | 7 | 796 | 0
Here is what I have tried which does not pull all the hits columns back.
SELECT v.*, v2.* FROM visits v
INNER JOIN visits v2 on v.s_id = v2.s_id
GROUP BY s_id
How can I get multiple rows into columns?
If your'e data set is not crazy huge and you are just trying to get the multiple rows as a single row.... one way to do this...
SELECT
s_id,
GROUP_CONCAT(hits SEPARATOR ',') as hits_list
FROM
visits
GROUP BY s_id
Since it doesn't use any joins or subqueries etc, i find this way to be quite fast.
you can later split/explode the data based on the ',' separator in PHP or whatever language you are using.
$hits = explode($hits_list, ','); //get them in an array

MySQL - How to use GROUP BY / ORDER BY with "nested" dataset?

My (sub)query results in following dataset:
+---------+------------+-----------+
| item_id | version_id | relevance |
+---------+------------+-----------+
| 1 | 1 | 30 |
| 1 | 2 | 30 |
| 2 | 3 | 22 |
| 3 | 4 | 30 |
| 4 | 5 | 18 |
| 3 | 6 | 30 |
| 2 | 7 | 22 |
| 1 | 8 | 30 |
| 5 | 9 | 48 |
| 4 | 10 | 18 |
| 5 | 11 | 48 |
| 3 | 12 | 30 |
| 3 | 13 | 31 |
| 4 | 14 | 19 |
| 2 | 15 | 22 |
| 1 | 16 | 30 |
| 5 | 17 | 49 |
| 2 | 18 | 22 |
+---------+------------+-----------+
18 rows in set (0.00 sec)
Items and versions are stored in separate InnoDB-tables.
Both tables have auto-incrementing primary keys.
Versions have a foreign key to items (item_id).
My question: How do I get a subset based on relevance?
I would like to fetch the following subset containing the most relevant versions:
+---------+------------+-----------+
| item_id | version_id | relevance |
+---------+------------+-----------+
| 1 | 16 | 30 |
| 2 | 18 | 22 |
| 3 | 13 | 31 |
| 4 | 14 | 19 |
| 5 | 17 | 49 |
+---------+------------+-----------+
It would be even more ideal to fetch the MAX(version_id) in case of equal relevance.
I tried grouping, joining, ordering, etcetera in many ways but I'm not able to get the desired result.
Some of the things I tried is:
SELECT item_id, version_id, relevance
FROM (subquery) a
GROUP BY item_id
ORDER BY relevance DESC, version_id DESC
But of course the ordering happens after the fact, so that both relevance and MAX(version_id) information is lost.
Please advice.
This is how you can do this:
SELECT t1.item_id, max(t1.version_id), t1.relevance FROM t t1
LEFT JOIN t t2 ON t1.item_id = t2.item_id AND t1.relevance < t2.relevance
WHERE t2.relevance IS NULL
GROUP BY t1.item_id
ORDER BY t1.item_id, t1.version_id
Output:
| ITEM_ID | VERSION_ID | RELEVANCE |
|---------|------------|-----------|
| 1 | 16 | 30 |
| 2 | 18 | 22 |
| 3 | 13 | 31 |
| 4 | 14 | 19 |
| 5 | 17 | 49 |
Fiddle here.

SQL/MySQL SELECT and average over certain values

I have to work with an analysis tool that measures the Web Service calls to a server per hour. These measurments are inserted in a database. The following is a snippet of such a measurement:
mysql> SELECT * FROM sample s LIMIT 4;
+---------+------+-------+
| service | hour | calls |
+---------+------+-------+
| WS04 | 04 | 24 |
| WS12 | 11 | 89 |
| WSI64 | 03 | 35 |
| WSX52 | 01 | 25 |
+---------+------+-------+
4 rows in set (0.00 sec)
As the end result I would like to know the sum of all web services completions per hour of day. Obviously, this can be easily done with SUM() and GROUP BY:
mysql> SELECT hour, SUM(calls) FROM sample s GROUP BY hour;
+------+------------+
| hour | SUM(calls) |
+------+------------+
| 00 | 634 |
| 01 | 642 |
| 02 | 633 |
| 03 | 624 |
| 04 | 420 |
| 05 | 479 |
| 06 | 428 |
| 07 | 424 |
| 08 | 473 |
| 09 | 434 |
| 10 | 485 |
| 11 | 567 |
| 12 | 526 |
| 13 | 513 |
| 14 | 555 |
| 15 | 679 |
| 16 | 624 |
| 17 | 796 |
| 18 | 752 |
| 19 | 843 |
| 20 | 827 |
| 21 | 774 |
| 22 | 647 |
| 23 | 533 |
+------+------------+
12 rows in set (0.00 sec)
My problem is that in old sets, the web service calls in the hours from [00-11] were already summed up. The simple statement as listed above would therefore lead to
mysql> SELECT hour, SUM(calls) FROM sample s GROUP BY hour;
+------+------------+
| hour | SUM(calls) |
+------+------------+
| 00 | 6243 | <------ sum of hours 00-11!
| 12 | 526 |
| 13 | 513 |
| 14 | 555 |
| 15 | 679 |
| 16 | 624 |
| 17 | 796 |
| 18 | 752 |
| 19 | 843 |
| 20 | 827 |
| 21 | 774 |
| 22 | 647 |
| 23 | 533 |
+------+------------+
13 rows in set (0.00 sec)
This is an undesirable result. To make the old sets [00,12,...,23] comparable to the new sets [00,01,...,23] I would like to have one statement that averages the value of [00] and distributes it over the missing hours, e.g.:
+------+------------+
| hour | SUM(calls) |
+------+------------+
| 00 | 6243/11 |
| 01 | 6243/11 |
[...]
| 12 | 526 |
[...]
| 23 | 533 |
+------+------------+
I can easily do this using temporary tables or views, but i don't know how to accomplish this without them.
Any ideas? Cause this is driving me crazy :P
You'll need a rowset with 12 rows in it to make a join.
The most simple solution will be combining 12 SELECT statements in a union:
SELECT COALESCE(morning.hour, sample.hour),
SUM(CASE WHEN morning.hour IS NULL THEN calls ELSE calls / 12 END) AS calls
FROM sample
LEFT JOIN
(
SELECT 0 AS hour
UNION ALL
SELECT 1
...
UNION ALL
SELECT 11
) AS morning
ON sample.hour = 0 AND sample.service IN ('old_service1', 'old_service2')
GROUP BY
1
You're probably best doing this with temp tables / views (I'd recommend a view over a temp table) or you will end up with a nasty case specific statement that will be a nightmare to manage over time.