MySql - Join self to determine time in state - mysql

I have a table (let's call it WorkflowLog) whose records are log records about a workflow processor.
The workflow processor moves records in various tables (Table1, Table2, etc) thru defined flowchart states. Each state has a number.
The WorkflowLog records are as follows:
| ID | TimeStamp | TableN | RecordId | Action | OldState | NewState |
-------------------------------------------------------------------------------------------
| 1 | 2016-09-16 15:50:00 | Table1 | 21 | State Change | 0 | 10 |
| 2 | 2016-09-16 15:50:00 | Table1 | 21 | Other Info | 0 | 10 |
| 3 | 2016-09-16 15:55:00 | Table2 | 21 | State Change | 0 | 10 |
| 4 | 2016-09-16 15:57:00 | Table1 | 21 | State Change | 10 | 20 |
| 5 | 2016-09-16 15:58:00 | Table1 | 21 | State Change | 20 | 30 |
| 6 | 2016-09-16 15:59:00 | Table1 | 21 | State Change | 30 | 20 |
| 7 | 2016-09-16 16:00:00 | Table1 | 21 | State Change | 20 | 30 |
| 8 | 2016-09-16 16:01:00 | Table1 | 52 | State Change | 0 | 10 |
| 9 | 2016-09-16 16:02:00 | Table1 | 21 | State Change | 30 | 999 |
| 10 | 2016-09-16 16:03:00 | Table3 | 25 | State Change | 0 | 10 |
I would like to determine the amount of time spent in each table state. Please NOTE that the workflow can loop so the record can be in state 20 (or any state) multiple times.
My first try was:
select
DFT1.ID as DFT1_Id,
DFT2.ID as DFT2_Id,
DFT1.TableN as TableName,
DFT1.RecordId as RecordId,
DFT1.OldState as State,
DFT2.Timestamp as EntryTime,
DFT1.TimeStamp as ExitTime,
TIMEDIFF(DFT1.TimeStamp, DFT2.Timestamp) as TimeInState
from WorkflowLog DFT1
inner join WorkflowLog DFT2
ON DFT1.TableN=DFT2.TableN AND
DFT1.RecordId=DFT2.RecordId AND
DFT1.`Action`='State Change' AND
DFT2.`Action`='State Change' AND
DFT1.OldState=DFT2.NewState
Order BY DFT1_Id
But this does not handle when a record loops in the workflow (as record 6 captures). It seems that I have to do something similar to the above BUT only match the previous most recent.
I am at a loss on how to do that.
Ultimately what I want is an output like:
| ID1 | ID2 | TableN | RecId | State | EntryTime | ExitTime | TimeInState |
------------------------------------------------------------------------------------------------
| 4 | 1 | Table1 | 21 | 10 | 2016-09-16 15:50:00 | 2016-09-16 15:57:00 | 00:07:00 |
| 5 | 4 | Table1 | 21 | 20 | 2016-09-16 15:57:00 | 2016-09-16 15:58:00 | 00:01:00 |
| 6 | 5 | Table1 | 21 | 30 | 2016-09-16 15:58:00 | 2016-09-16 15:59:00 | 00:01:00 |
| 7 | 6 | Table1 | 21 | 20 | 2016-09-16 15:59:00 | 2016-09-16 16:00:00 | 00:01:00 |
| 9 | 7 | Table1 | 21 | 30 | 2016-09-16 16:00:00 | 2016-09-16 16:02:00 | 00:02:00 |
ETC ...
EDIT:
As suggested I have added an SqlFiddle for the above with my first try SQL.
There are extra matches in the Result set due to Table1 record 21 going thru state 20 and 30 twice. Here are my comments about the SqlFiddle result set:
| ID | ID | My Comment |
| 4 | 1 | Good |
| 5 | 6 | Not Good - Matched with Future record |
| 5 | 4 | Good |
| 6 | 7 | Not Good - Matched with Future record |
| 6 | 5 | Good |
| 7 | 6 | Good |
| 7 | 4 | Not Good - Match to First time thru state 20 |
| 9 | 5 | Not Good - Match to First time thru state 30 |
| 9 | 7 | Good |

Related

MySQL: Get everyday incremental data

I want to fetch the data from Table based on date but in an incremental way.
Suppose I have data like this which is grouped by date
| DATE | Count |
| 2015-06-23 | 10 |
| 2015-06-24 | 8 |
| 2015-06-25 | 6 |
| 2015-06-26 | 3 |
| 2015-06-27 | 2 |
| 2015-06-29 | 2 |
| 2015-06-30 | 3 |
| 2015-07-01 | 1 |
| 2015-07-02 | 3 |
| 2015-07-03 | 4 |
So the result should come like this
| DATE | Count| Sum|
| 2015-06-23 | 10 | 10 |
| 2015-06-24 | 8 | 18 |
| 2015-06-25 | 6 | 24 |
| 2015-06-26 | 3 | 27 |
| 2015-06-27 | 2 | 29 |
| 2015-06-29 | 2 | 31 |
| 2015-06-30 | 3 | 34 |
| 2015-07-01 | 1 | 35 |
| 2015-07-02 | 3 | 38 |
| 2015-07-03 | 4 | 42 |
You would join every other previous date on that date, and then sum the count on that
If you give me your table structure, I can make it run.
id, name, date_joined
SELECT counts.theCount, sum(counts.theCount), table.date_joined
FROM yourTable
LEFT JOIN
(SELECT count(*) as theCount, table.date_joined
FROM yourTable
GROUP BY table.date_joined
) as counts
ON
yourTable.date_joined> counts.date_joined
GROUP BY yourTable.date_joined

MySQL query to meet specific needs

My table is as follow:
-------------------------------------------
| rec_id | A_id | B_id |Date(YYYY-MM-DD)|
-------------------------------------------
| 1 | 1 | 6 | 2014-01-01 |
| 2 | 5 | 1 | 2014-01-02 |
| 3 | 2 | 6 | 2015-01-03 |
| 4 | 6 | 1 | 2014-01-04 |
| 5 | 7 | 1 | 2014-01-05 |
| 6 | 3 | 6 | 2014-01-06 |
| 7 | 8 | 1 | 2014-01-07 |
| 8 | 4 | 6 | 2014-01-08 |
| 9 | 9 | 1 | 2014-01-09 |
| 10 | 10 | 21 | 2014-01-10 |
| 11 | 12 | 21 | 2014-01-11 |
| 12 | 11 | 2 | 2014-01-12 |
| 13 | 1 | 1 | 2014-12-31 |
| 14 | 2 | 2 | 2014-12-31 |
| 15 | 1 | 1 | 2015-01-31 |
| 16 | 10 | 21 | 2015-01-31 |
| 17 | 1 | 21 | 2014-10-31 |
This table represents the possession of various "A_id" to a specific "B_id" with a date when it is possessed. The possession of each "A_id" can be changed later on at any time. That means the only the latest possession is considered.
I want to find out all the "A_id" that are currently (possessed in latest date) in possession of a specific "B_id". For example, for "B_id" = 6 the possessed "A_id" at present are as follows:
---------------------------
| A_id | Date(YYYY-MM-DD) |
---------------------------
| 2 | 2015-01-03 |
| 3 | 2014-01-06 |
| 4 | 2014-01-08 |
Similarly, for "B_id" = 21 the possessed "A_id" at present are as follows:
---------------------------
| A_id | Date(YYYY-MM-DD) |
---------------------------
| 10 | 2015-01-31 |
| 12 | 2014-01-11 |
I would highly appreciate your kind help in this regard.
One way to accomplish this is to use a correlated not exists predicate that makes sure that there doesn't exists any later possession for each A_ID with another B_ID.
SELECT A_ID, MAX(PDATE) AS DATE
FROM YOUR_TABLE T
WHERE B_ID = 6
AND NOT EXISTS (
SELECT 1
FROM YOUR_TABLE
WHERE A_ID = T.A_ID
AND PDATE > T.PDATE
AND B_ID <> T.B_ID
)
GROUP BY A_ID

Picking out specific values from a group in MySQL

This seems like such a simple problem, but I can't find a good solution. I'm trying to select information from a slightly misformatted table. Basically, wherever sequence=0, the person_id should actually be a company_id. This company_id then applies to all the rows which have the same group_id.
Someone thought it was a good idea to format things this way instead of simply having a company_id column, but it makes trying to select by company very difficult. It would make my programming much easier to simply add this extra column, and fix the formatting.
I want to turn something like this:
+----------+------------+-----------+----------+
| group_id | date | person_id | sequence |
+----------+------------+-----------+----------+
| 1 | 2012-08-31 | 10 | 0 |
| 1 | 2012-08-31 | 11 | 1 |
| 1 | 2012-08-31 | 12 | 2 |
| 2 | 1999-04-16 | 10 | 0 |
| 2 | 1999-04-16 | 21 | 1 |
| 2 | 1999-04-16 | 22 | 2 |
| 2 | 1999-04-16 | 23 | 3 |
| 2 | 1999-04-16 | 24 | 4 |
| 3 | 2001-01-09 | 30 | 0 |
| 3 | 2001-01-09 | 31 | 1 |
| 3 | 2001-01-09 | 11 | 2 |
| 3 | 2001-01-09 | 12 | 3 |
+----------+------------+-----------+----------+
Into this:
+------------+----------+------------+-----------+----------+
| company_id | group_id | date | person_id | sequence |
+------------+----------+------------+-----------+----------+
| 10 | 1 | 2012-08-31 | 11 | 1 |
| 10 | 1 | 2012-08-31 | 12 | 2 |
| 10 | 2 | 1999-04-16 | 21 | 1 |
| 10 | 2 | 1999-04-16 | 22 | 2 |
| 10 | 2 | 1999-04-16 | 23 | 3 |
| 10 | 2 | 1999-04-16 | 24 | 4 |
| 30 | 3 | 2001-01-09 | 31 | 1 |
| 30 | 3 | 2001-01-09 | 11 | 2 |
| 30 | 3 | 2001-01-09 | 12 | 3 |
+------------+----------+------------+-----------+----------+
The only way I can think of how to achieve this is with nested SELECT statements, which are very inefficient considering I have about 100M rows. It's a one time fix though, so I don't mind letting it run overnight.
If you permanently want to change your table to include a company_id column then do this:
First alter the table and add the new column:
alter table your_table add company_id int;
Then update all rows to set the company to the person_id = 0 for the group:
UPDATE your_table a
JOIN your_table b ON a.group_id = b.group_id
SET a.company_id = b.person_id
WHERE b.sequence = 0;
And finally remove the rows with sequence = 0:
DELETE FROM your_table WHERE sequence = 0;
Sample SQL Fiddle
The end result will be:
| group_id | date | person_id | sequence | company_id |
|----------|------------|-----------|----------|------------|
| 1 | 2012-08-31 | 11 | 1 | 10 |
| 1 | 2012-08-31 | 12 | 2 | 10 |
| 2 | 1999-04-16 | 21 | 1 | 10 |
| 2 | 1999-04-16 | 22 | 2 | 10 |
| 2 | 1999-04-16 | 23 | 3 | 10 |
| 2 | 1999-04-16 | 24 | 4 | 10 |
| 3 | 2001-01-09 | 31 | 1 | 30 |
| 3 | 2001-01-09 | 11 | 2 | 30 |
| 3 | 2001-01-09 | 12 | 3 | 30 |

Mysql difference between max() and min()?

I got a problem with a mySql query and max() function.
If I do :
Select * from Data group by experiment having min(timestamp)
This query return what I want, and correct value.
I got this :
+----------+---------+----------+---------------------+----------------+------------+
| id | mote_id | label_id | timestamp | value | experiment |
+----------+---------+----------+---------------------+----------------+------------+
| 3768806 | 10 | 30 | 2014-04-22 14:37:07 | 0 | 13 |
| 10989209 | 12 | 22 | 2014-04-25 10:44:03 | 2.532958984375 | 15 |
| 11943537 | 6 | 19 | 2014-05-05 17:20:15 | 1228 | 16 |
| 12042549 | 16 | 26 | 2014-05-06 10:48:59 | 22.86 | 17 |
| 12176642 | 15 | 23 | 2014-05-07 15:19:35 | 0 | 18 |
| 12195344 | 10 | 6 | 2014-05-07 15:27:23 | 3460 | 19 |
| 12222470 | 15 | 8 | 2014-05-07 15:38:38 | 1 | 21 |
| 12343934 | 10 | 19 | 2014-05-12 10:35:42 | 742 | 23 |
+----------+---------+----------+---------------------+----------------+------------+
But, if i do :
Select * from Data group by experiment having max(timestamp)
This query return wrong values... like this :
+----------+---------+----------+---------------------+----------------+------------+
| id | mote_id | label_id | timestamp | value | experiment |
+----------+---------+----------+---------------------+----------------+------------+
| 3768806 | 10 | 30 | 2014-04-22 14:37:07 | 0 | 13 |
| 10989209 | 12 | 22 | 2014-04-25 10:44:03 | 2.532958984375 | 15 |
| 11943537 | 6 | 19 | 2014-05-05 17:20:15 | 1228 | 16 |
| 12042549 | 16 | 26 | 2014-05-06 10:48:59 | 22.86 | 17 |
| 12176642 | 15 | 23 | 2014-05-07 15:19:35 | 0 | 18 |
| 12195344 | 10 | 6 | 2014-05-07 15:27:23 | 3460 | 19 |
| 12222470 | 15 | 8 | 2014-05-07 15:38:38 | 1 | 21 |
| 12343934 | 10 | 19 | 2014-05-12 10:35:42 | 742 | 23 |
+----------+---------+----------+---------------------+----------------+------------+
In the first query, if I replace min(timestamp) by timestamp=min(timestamp), it works, but in the second, "timestamp=max(timestamp)" return nothing
Finally, Select experiment,max(timestamp) return correct values.
mysql> select *,max(timestamp) from Data group by experiment;
+----------+---------+----------+---------------------+----------------+------------+---------------------+
| id | mote_id | label_id | timestamp | value | experiment | max(timestamp) |
+----------+---------+----------+---------------------+----------------+------------+---------------------+
| 3768806 | 10 | 30 | 2014-04-22 14:37:07 | 0 | 13 | 2014-04-24 16:03:29 |
| 10989209 | 12 | 22 | 2014-04-25 10:44:03 | 2.532958984375 | 15 | 2014-05-05 10:34:35 |
| 11943537 | 6 | 19 | 2014-05-05 17:20:15 | 1228 | 16 | 2014-05-06 10:35:15 |
| 12042549 | 16 | 26 | 2014-05-06 10:48:59 | 22.86 | 17 | 2014-05-07 15:19:33 |
| 12176642 | 15 | 23 | 2014-05-07 15:19:35 | 0 | 18 | 2014-05-07 15:27:23 |
| 12195344 | 10 | 6 | 2014-05-07 15:27:23 | 3460 | 19 | 2014-05-07 15:38:01 |
| 12222470 | 15 | 8 | 2014-05-07 15:38:38 | 1 | 21 | 2014-05-07 16:30:38 |
| 12343934 | 10 | 19 | 2014-05-12 10:35:42 | 742 | 23 | 2014-05-14 09:25:44 |
+----------+---------+----------+---------------------+----------------+------------+---------------------+
I know I can make a subquery to solve my probleme, but the tables contains thousands rows, and this solution is too long...
Ps : I can't use Select*, max(timestamp) even if it works because the query is run by EJB in JEE.
You select not determined values grouped by field experiment. No one can give you a guarantee that non-agregated fields would correspond to MIN or MAX values of some aggregated field.
You HAVE TO use sub-query or self-join to get the right records.
See more here: http://dev.mysql.com/doc/refman/5.6/en/example-maximum-column-group-row.html
The HAVING clause expects a boolean expression. In other DBMS your code sample would trigger an error. In MySQL, you'll get the expression cast to boolean:
Zero → false
Non-zero → true
And since your expression is constant for the whole set, it won't filter out partial rows.
As about this:
HAVING timestamp = max(timestamp)
The HAVING clause evaluates after WHERE and GROUP BY. At that point, using individual row values of the timestamp column doesn't make any sense. As usual, MySQL allows that but you must take into account that:
In standard SQL, a query that includes a GROUP BY clause cannot refer
to nonaggregated columns in the HAVING clause that are not named in
the GROUP BY clause. A MySQL extension permits references to such
columns to simplify calculations. This extension assumes that the
nongrouped columns will have the same group-wise values. Otherwise,
the result is indeterminate.
In other words, your results are arbitrary (not even random).

MySQL Query for averages

good morning. I have this table:
mysql> select * from Data;
+---------------------------+--------+-------+
| affyId | exptId | level |
+---------------------------+--------+-------+
| 31315_at | 3 | 250 |
| 31324_at | 3 | 91 |
| 31325_at | 1 | 191 |
| 31325_at | 2 | 101 |
| 31325_at | 4 | 51 |
| 31325_at | 5 | 71 |
| 31325_at | 6 | 31 |
| 31356_at | 3 | 91 |
| 31362_at | 3 | 260 |
| 31510_s_at | 3 | 257 |
| 5321_at | 4 | 90 |
| 5322_at | 4 | 90 |
| 5323_at | 4 | 90 |
| 5324_at | 3 | 57 |
| 5324_at | 4 | 90 |
| 5325_at | 4 | 90 |
| AFFX-BioB-3_at | 3 | 97 |
| AFFX-BioB-5_at | 3 | 20 |
| AFFX-BioB-M_at | 3 | 20 |
| AFFX-BioB-M_at | 5 | 214 |
| AFFX-BioB-M_at | 7 | 20 |
| AFFX-BioB-M_at | 8 | 40 |
| AFFX-BioB-M_at | 9 | 20 |
| AFFX-HSAC07/X00351_M_at | 3 | 86 |
| AFFX-HUMBAPDH/M33197_3_st | 3 | 277 |
| AFFX-HUMTFFR/M11507_at | 3 | 90 |
| AFFX-M27830_3_at | 3 | 271 |
| AFFX-MurIL10_at | 3 | 8 |
| AFFX-MurIL10_at | 5 | 8 |
| AFFX-MurIL10_at | 6 | 4 |
| AFFX-MurIL2_at | 3 | 20 |
| AFFX-MurIL4_at | 5 | 78 |
| AFFX-MurIL4_at | 6 | 20 |
| U95-32123_at | 1 | 128 |
| U95-32123_at | 2 | 128 |
| U98-40474_at | 1 | 57 |
| U98-40474_at | 2 | 57 |
+---------------------------+--------+-------+
37 rows in set (0.00 sec)
If I wanna look for the average expression level (level) of each array probe (affyId) across all experiments, I do SELECT affyId, AVG(level) AS average FROM Data GROUP BY affyId;
However, I can't figure out how to look for the average expression level of each array probe (affyId) for each experiment... It must be something similar to the last query, but I don't obtain good results... any help?
PD: someone told me I should give some reputation or click to some green button if somebody solves my question... Is it right? How do I do it? I'm pretty new on this website...
This shows the average for every affyId:
SELECT affyId, AVG(level) AS average FROM Data GROUP BY affyId
This the average for every exptId:
SELECT exptId, AVG(level) AS average FROM Data GROUP BY exptId
and this the average for every exptId in every affyId:
SELECT affyId, exptId, AVG(level) AS average FROM Data GROUP BY exptId, affyId
Just add that to the group by clause
SELECT affyId, exptId, AVG(level) AS average
FROM Data
GROUP BY affyId, exptId;