SQL Query Help - Grouping By Sequences of Digits - mysql

I have a table, which includes the following columns and data:
id dtime instance data dtype
1 2012-10-22 10000 d 1
2 2012-10-22 10000 d 1
..
7 2012-10-22 10004 d 1
..
15 2012-10-22 10000 # 1
16 2012-10-22 10004 d 1
17 2012-10-22 10000 d 1
I want to group sequences of 'd's in the data column, with the '#' at the end of the sequence.
This could have been done by grouping via the instance column, which is an individual stream of data, however there can be multiple sequences within the stream.
I also want to end a sequence if there are no data columns in the same instance for, say, 3 seconds after the last data of that instance and no '#'s have been found within that interval.
I have managed to do exactly this using cursors and while loops, which worked reasonably well for tables with 1000s of rows, however this query will be used on far more rows eventually, and these two methods would take around a minute with a dataset of just 3-5000 rows.
Reading on this website and others, it seems that set-based logic may be the way to go, however I can think of no way to do what I need without some kind of loop on each row that compares it to every other to build the 'sequences'.
If anyone could help, or point me in the direction of something that could, it would be greatly appreciated. :)
I would ideally like the data to be output in the following format:
datacount instance lastdata dtime
20 10000 # 2012-10-22
19 10000 d 2012-10-22
22 10004 # 2012-10-22
20 10022 # 2012-10-22
Where (datacount) is a count of the number of rows in a 'sequence' (which is the data leading up to a '#' or 3 second delay), (instance) is the instance ID from the original table, (lastdata) is the last data value in the sequence, (dtime) is the datetime value of the last data value.

Let me show you how to do this for the final '#'. The time difference follows a similar idea. The key idea is to get the next '#' after the current row. For this you need a correlated subquery. After that, you can do a group by:
select groupid, count(*) as NumInSeq, max(dtime) as LastDateTime
from (select t.*,
(select min(t2.id) from t t2 where t2.id > t.id and t2.data = '#'
) as groupid
from t
) t
group by groupid
Handling the time sequence is a bit more complicated. It is something like this:
select groupid, count(*) as NumInSeq, max(dtime) as LastDateTime,
(case when sum(case when data = '#' then 1 else 0 end) > 0 then '#' else 'd' end) as FinalData
from (select t.*,
(select min(t2.id)
from t t2
where t2.id > t.id and
(t2.data = '#' or UNIX_TIMESTAMP(t2.dtime) - UNIX_TIMESTAMP(t.dtime) < 3
) as groupid
from t
) t
group by groupid

Related

Selecting rows until a column value isn't the same

SELECT product.productID
, product.Name
, product.date
, product.status
FROM product
INNER JOIN shelf ON product.sheldID=shelf.shelfID
WHERE product.weekID = $ID
AND product.date < '$day'
OR (product.date = '$day' AND shelf.expire <= '$time' )
ORDER BY concat(product.date,shelf.expire)
I am trying to stop the SQL statement at a specific value e.g. bad.
I have tried using max-date, but am finding it hard as am making the time stamp in the query. (Combining date/time)
This example table shows that 3 results should be returned and if the status "bad" was the first result than no results should be returned. (They are ordered by date and time).
ProductID Date status
1 2017-03-27 Good
2 2017-03-27 Good
3 2017-03-26 Good
4 2017-03-25 Bad
5 2017-03-25 Good
Think I may have fixed it, I added this to my while loop.
The query gives the results in order by present to past using date and time, this while loop checks if the column of that row is equal to 'bad' if it is does something (might be able to use an array to fill it up with data). If not than the loop is broken.
I know it doesn't seem ideal but it works lol
while ($row = mysqli_fetch_assoc($result)) {
if ($row['status'] == "bad") {
$counter += 1;
}
else{
break;}
I will provide an answer just with your output as if it was just one table. It will give you the main ideia in how to solve your problem.
Basically I created a column called ord that will work as a row_number (MySql doesn't support it yet AFAIK). Then I got the minimum ord value for a bad status then I get everything from the data where ord is less than that.
select y.*
from (select ProductID, dt, status, #rw:=#rw+1 ord
from product, (select #rw:=0) a
order by dt desc) y
where y.ord < (select min(ord) ord
from (select ProductID, status, #rin:=#rin+1 ord
from product, (select #rin:=0) a
order by dt desc) x
where status = 'Bad');
Result will be:
ProductID dt status ord
-------------------------------------
1 2017-03-27 Good 1
2 2017-03-27 Good 2
3 2017-03-26 Good 3
Also tested with the use case where the Bad status is the first result, no results will be returned.
See it working here: http://sqlfiddle.com/#!9/28dda/1

If more than 10% of results are over X in mysql

I have a database table with lists of temperature readings from many locations in a number of buildings. I need a query that will give me a true or false if more than 10% of the readings in a building, taken on a date, are greater than X
I am not looking for a average. If there are 100 measurements taken in a building on a date, and 10 of them are over X (say 80 degrees) then create a flag.
The table is laid out as
Building # location # date temperature
| 123 | 555 |2016-04-08 | 68.5 |
| 123 | 556 |2016-04-08 | 70.2 |
| 123 | 557 |2016-04-08 | 65.4 |
| 888 | 999 |2013-03 22 | 80.4 |
Typically a building would have over 100 readings. There are many hundreds of building/date entries in the table
Can this be done with a single mysql query and can you share that query with me?
I obviously haven't made my question clear.
The result I am looking for is a single True or False.
If more than 10% of the results for a building/date combination were over X (say 80%) then show true, or some flag equal to true.
The known fields will be building and date. The location is not relevant, and can be ignored. So given the input of building (123) and date (2016-04-08) are more than 10% of the entries in the table that have that building number and date greater than X (e.g. 80). The only data to be tested are those for that building and date. So the query would end in:
where building_id=`123` AND date =`2016-04-08`
I am NOT looking for an average or a median. I am NOT looking to see a list of the data for that 10%. I am just looking for true or false.
You can use conditional aggregation, something like this:
select building, date,
(case when avg(temperature > x) > 0.1 then 'Y' else 'N' end) as flag
from t
group by building, date;
To return building and date, and "create a flag" for rows where more than 10% of the readings for that building on that date are over a given value X ...
SELECT r.building
, DATE(r.date)
, ( SUM(r.reading > X ) > SUM(.10) ) AS _flag
FROM myreadings r
GROUP BY r.building, DATE(r.date)
Absent more specification about the actual resultset you want to return, we're just guessing at what result set you want to return.
FOLLOWUP
Based on the update to the question... to return a row for a single building and a single date, add the WHERE clause as shown in the question. And remove expressions from the SELECT list.
SELECT ( SUM(r.reading > X ) > SUM(.10) ) AS _flag
FROM myreadings r
WHERE r.building = '123'
AND r.date >= '2016-04-08'
AND r.date < '2016-04-08' + INTERVAL 1 DAY
If there are no rows for the given building and given date, the query will return zero rows. If there is at least one row, and the number of rows that have a reading greater than X is more than 10% of the total number of rows, the query will return a single row, with _flag having a value of 1 (TRUE). Otherwise, the query will return a single row with _flag having a value of 0 (FALSE).
If you want the query to return a row even when there are no matching rows in the table, that can be accomplished with a more complex SQL statement.
If you want the query to return string values 'TRUE' or 'FALSE', that can be accomplished as well.
Again, absent an example of the resultset you are expecting to have returned, (without an actual specification which we can compare a resultset to), we're just guessing.

Mysql single column result to multiple column result

I have a problem with a MySQL query, the problem is I have the following table:
id, rep, val dates
1 rep1 200 06/01/2014
2 rep2 300 06/01/2014
3 rep3 400 06/01/2014
4 rep4 500 06/01/2014
5 rep5 100 06/01/2014
6 rep1 200 02/06/2014
7 rep2 300 02/06/2014
8 rep3 900 02/06/2014
9 rep4 700 02/06/2014
10 rep5 600 02/06/2014
and I want a result like this:
rep 01/06/2014 02/06/2014
rep1 200 200
rep2 300 300
rep3 400 900
rep4 500 700
rep5 100 600
thank you very much!
You seem to want the most recent row for each rep. Here is an approach that often performs well:
select t.*
from table t
where not exists (select 1
from table t2
where t2.repid = t.repid and
t2.id > t.id
);
This transforms the problem to: "Get me the rows in table t where there is no other row with the same repid and a larger id." That is the same logic as getting the last one, just convoluted a bit to help the database know what to do.
For performance reasons, an index on t(repid, id) is helpful.
You seem to want the val for each of the dates.
Assuming the dates you are interested in are fixed then you can do that as follows. For output date column you check of the row matches the date for that column. If so you use the value of val , if not you just use 0. Then you sum all the resulting values, grouping by rep. I have assumed a fixed format of date.
SELECT rep, SUM(IF(dates='2014/06/01'), val, 0) AS '2014/06/01', SUM(IF(dates='2014/06/02'), val, 0) AS '2014/06/02'
FROM sometable
GROUP BY rep
Or if you just wanted the highest val for each day
SELECT rep, MAX(IF(dates='2014/06/01'), val, 0) AS '2014/06/01', MAX(IF(dates='2014/06/02'), val, 0) AS '2014/06/02'
FROM sometable
GROUP BY rep
If the number of dates is variable then not really a direct way to do it (as the number of resulting columns would vary). It would be easiest to do this manly in your calling script based on the following, giving you one row per rep / possible date with a sum of the values of val for that rep / date combination:-
SELECT rep, sub0.dates, SUM(IF(sometable.dates=sub0.dates), val, 0)
FROM sometable
CROSS JOIN
(
SELECT DISTINCT dates
FROM sometable
) sub0
GROUP BY rep, sub0.dates

Checking consecutive values at a MySQL query

I have a MySQL table like this:
ID - Time - Value
And I'm getting every pair of ID, Time (grouped by ID) where Value is greater than a certain threshold. So basicaly, I'm getting every ID which has at least one time a value greater than the threshold. The query looks like this:
SELECT ID, Time FROM mydb.MYTABLE
WHERE Value>%s AND Time>=%s AND Time<=%s
GROUP BY ID
EDIT: The Time checks allow to operate in a time range of my choice between all the data which is into the table; it has nothing else to do with what I am asking.
It works perfectly, but now I want to add some filtering: I want it to avoid those times the value is greater than the threshold (let's call it alarms) if the alarm hasn't happened also the Time just before or just after. I mean: if the alarm accurs at a single, isolated instant of time instead of two consecutive instants of time, I'll consider it is a false alarm and avoid it to be returned at the query response.
Of course I can do this with a call for each Id to check for this, but I'd like to do this in a single query to make it faster. I guess I could use conditionals, but I don't have that expertise at MySQL.
Any help?
EDIT2: Example for Threshold = 10
ID - Time - Value
1 - 2004 - 9
1 - 2005 - 11
1 - 2006 - 8
2 - 2107 - 12
2 - 2109 - 13
3 - 3402 - 11
3 - 3403 - 12
In this example, only ID 3 should be a valid alarm, since 2 consecutive time values for this ID have their value > threshold. ID 1 has a single, isolated alarm, so it should be filteres. For ID 2 there are 2 alarms, but not consecutive, so it should be also filtered.
Something like this:
10 - is a threshold
0 - minimum of the time period
100000 - maximum of the time period
select ID, min(Time)
from
(
SELECT ID, Time,
(select max(time) from t
where Time<t1.Time
and Id=t1.Id
and Value>10) LAG_G,
(select max(time) from t
where Time<t1.Time
and Id=t1.Id
and Value<=10) LAG_L,
(select min(time) from t
where Time>t1.Time
and Id=t1.Id
and Value>10) LEAD_G,
(select min(time) from t
where Time>t1.Time
and Id=t1.Id
and Value<=10) LEAD_L
FROM t as t1
WHERE Value>10 AND Time>=0 AND Time<=100000
) t3
where ifnull(LAG_G,0)>ifnull(LAG_L,0)
OR
ifnull(LEAD_G,100000)<ifnull(LEAD_L,100000)
GROUP BY ID
SQLFiddle demo
This query works for searching near records.
If you need to search records by Time (+1, -1 ) as you've mentioned in the comment try this query:
select ID, min(Time) from t as t1
where Value>10
AND Time>=%s2 AND Time<=%s1
and
(
Exists(select 1 from t where Value>10
and Id=t1.Id
and Time=t1.Time-1)
OR
Exists(select 1 from t where Value>10
and Id=t1.Id
and Time=t1.Time+1)
)
group by ID
SQLFiddle demo
such alarm ?
SELECT ID, Time , count(if(value>%treshold ,1,0)) alert_active
FROM mydb.MYTABLE
WHERE Value>%s3 AND Time>=%s2 AND Time<=%s1
GROUP BY ID;
i don't understand exactly:
In this example, only ID 3 should be a valid alarm, since 2
consecutive time values for this ID have their value > threshold. ID 1
has a single, isolated alarm, so it should be filteres. For ID 2 there
are 2 alarms, but not consecutive, so it should be also filtered.
I guess that You want filter alerts:
SELECT ID, Time
FROM mydb.MYTABLE
WHERE Value>%s3 AND Time>=%s2 AND Time<=%s1
GROUP BY ID
having value<%treshold;

Simple MySQL Query - Change table format around

I'm fairly sure this is a fairly easy answer but the answer is completely slipping my mind.
I have a database table that is currently formatted like:
event_id | elem_id | value
1 1 Value 1
1 2 Value 2
2 1 Value 3
2 2 Value 4
Both event_id and elem_id are undetermined numbers and have infinite possibilities.
How would I query it for example based on event_id 1 to get the data to be formatted as such:
event_id | 1 | 2
1 Value 1 Value 2
Knowing that elem_id is a number >= n so potentially there could be 50 elem_id yet I still need the data in that format.
Like I said I can't for the life of me figure out the query to assemble it that way. Any help would be GREATLY appreciated.
Try following:
SELECT
`event_id`,
(SELECT t2.`value` FROM table t2 WHERE t2.`event_id` = t1.`event_id` AND t2.`elem_id` = 1),
(SELECT t3.`value` FROM table t3 WHERE t3.`event_id` = t1.`event_id` AND t3.`elem_id` = 2)
FROM `table` t1 GROUP BY `event_id`;
Also you can use different way, and get elem_ids and values in comma-separated format in two cells
SELECT `event_id`, GROUP_CONCAT(`elem_id`), GROUP_CONCAT(`value`) FROM `table` GROUP BY `event_id`;
and you can change separator with following syntax: GROUP_CONCAT(field SEPARATOR '::')