Grouping to find min,max for each group - mysql

This would be relatively easy if I only cared about a single min and max for each group, the problem is my requirement is to find the various boundaries. An example data set is as follows:
BoundaryColumn GroupIdentifier
1 A
3 A
4 A
7 A
8 B
9 B
11 B
13 A
14 A
15 A
16 A
What I need from the sql is a result set as follows:
min max groupid
1 7 A
8 11 B
13 16 A
Essentially finding the boundaries for each cluster of the groups.
The data would be stored in either oracle11g or mysql so syntax can be provided for either platform.

A disclaimer: It's a lot easier to query partial results and process something like this with a front-end language. That said...
The following query works for Oracle (which supports analytic queries) but not for MySQL (which does not). There's a SQL Fiddle here.
WITH BoundX AS (
SELECT * FROM (
SELECT
BoundaryColumn,
GroupIdentifier,
LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
FROM MyTable
ORDER BY BoundaryColumn
)
WHERE GIDLag IS NULL OR GroupIdentifier <> GIDLag
OR GIDLead IS NULL OR GroupIdentifier <> GIDLead
)
SELECT MIN, MAX, GROUPID
FROM (
SELECT
BoundaryColumn AS MIN,
LEAD(BoundaryColumn) OVER (ORDER BY BoundaryColumn) AS MAX,
GroupIdentifier AS GROUPID,
GIDLag,
GIDLead
FROM BoundX
)
WHERE GROUPID = GIDLead
Here's the logic, step by step. You may be able to improve on this, because I get the feeling there's one subquery too many here...
This query pulls the prior and following GroupIdentifier values into each row:
SELECT
BoundaryColumn,
GroupIdentifier,
LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
FROM MyTable
ORDER BY BoundaryColumn
The result looks like this:
BoundaryColumn GroupIdentifier GIDLag GIDLead
1 A A
3 A A A
4 A A A
7 A A B
8 B A B
9 B B B
11 B B A
13 A B A
14 A A A
15 A A A
16 A A
If you add logic to get rid of all the rows where GIDLag = GIDLead = GroupIdentifier, you'll end up with the boundaries:
WITH BoundX AS (
SELECT * FROM (
SELECT
BoundaryColumn,
GroupIdentifier,
LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
FROM MyTable
ORDER BY BoundaryColumn
)
WHERE GIDLag IS NULL OR GroupIdentifier <> GIDLag
OR GIDLead IS NULL OR GroupIdentifier <> GIDLead
)
SELECT
BoundaryColumn AS MIN,
LEAD(BoundaryColumn) OVER (ORDER BY BoundaryColumn) AS MAX,
GroupIdentifier AS GROUPID,
GIDLag,
GIDLead
FROM BoundX
With this addition the results are:
MIN MAX GROUPID GIDLAG GIDLEAD
--- --- ------- ------ -------
1 7 A A
7 8 A A B
8 11 B A B
11 13 B B A
13 16 A B A
16 A A
Finally, include only those rows where GroupID = GIDLead. That's the query at the top of this answer. The results are:
MIN MAX GROUPID
--- --- -------
1 7 A
8 11 B
13 16 A

Take a look at this site regarding "runs" of data: http://www.sqlteam.com/article/detecting-runs-or-streaks-in-your-data
Armed with the knowledge provided in that link, you could write a query like this:
SELECT BoundaryColumn,
GroupIdentifier,
(
SELECT COUNT(*)
FROM Table T
WHERE T.GroupIdentifier <> TR.GroupIdentifier
AND T.BoundaryColumn <= TR.BoundaryColumn
) as RunGroup
FROM Table TR
Using this information, you could then group by "RunGroup", and select the GroupIdentifier and min/max BoundaryColumn.
EDIT: I've felt the peer pressure, here's an SQLFiddle with my version of the answer: http://www.sqlfiddle.com/#!8/9a24c/4/0

Another approach(Oracle). Here we simply divide result set returned by the query issued against table t1(your table) into logical groups(grp). Each new group starts when a value of GroupIdentifier changes:
select min(q.BoundaryColumn) as MinB
, max(q.BoundaryColumn) as MaxB
, max(q.GroupIdentifier) as groupid
from ( select s.BoundaryColumn
, s.GroupIdentifier
, sum(grp) over(order by s.BoundaryColumn) as grp
from ( select BoundaryColumn
, GroupIdentifier
, case
when GroupIdentifier <> lag(GroupIdentifier)
over(order by BoundaryColumn)
then 1
end as grp
from t1) s
) q
group by q.grp
Result:
MINB MAXB GROUPID
---------- ---------- -------
1 7 A
8 11 B
13 16 A
SQLfiddle Demo

Related

Incrementing count ONLY for duplicates in MySQL

Here is my MySQL table. I updated the question by adding an 'id' column to it (as instructed in the comments by others).
id data_id
1 2355
2 2031
3 1232
4 9867
5 2355
6 4562
7 1232
8 2355
I want to add a new column called row_num to assign an incrementing number ONLY for duplicates, as shown below. Order of the results does not matter.
id data_id row_num
3 1232 1
7 1232 2
2 2031 null
1 2355 1
5 2355 2
8 2355 3
6 4562 null
4 9867 null
I followed this answer and came up with the code below. But following code adds a count of '1' to non-duplicate values too, how can I modify below code to add a count only for duplicates?
select data_id,row_num
from (
select data_id,
#row:=if(#prev=data_id,#row,0) + 1 as row_num,
#prev:=data_id
from my_table
)t
If you are running MySQL 8.0, you can do this more efficiently with window functions only:
select
data_id,
case when count(*) over(partition by data_id) > 1
then row_number() over(partition by data_id order by data_id) row_num
end
from mytable
When the window count returns more than 1, you know that the current data_id has duplicates, in which case you can use row_number() to assign the incrementing number.
Note that, in absence of an ordering columns to uniquely identify each record within groups sharing the same data_id, it is undefined which record will actually get each number.
I am assuming that id is the column that defines the order on the rows.
In MySQL 8 you can use row_number() to get the number of each data_id and a CASE with EXISTS to exclude the rows which have no duplicate.
SELECT t1.data_id,
CASE
WHEN EXISTS (SELECT *
FROM my_table t2
WHERE t2.data_id = t1.data_id
AND t2.id <> t1.id) THEN
row_number() OVER (PARTITION BY t1.data_id
ORDER BY t1.id)
END row_num
FROM my_table t1;
In older versions you can use a subquery counting the rows with the same data_id but smaller id. With an EXISTS in a HAVING clause you can exclude the rows that have no duplicate.
SELECT t1.data_id,
(SELECT count(*)
FROM my_table t2
WHERE t2.data_id = t1.data_id
AND t2.id < t1.id
HAVING EXISTS (SELECT *
FROM my_table t2
WHERE t2.data_id = t1.data_id
AND t2.id <> t1.id)) + 1 row_num
FROM my_table t1;
db<>fiddle
Join with a query that returns the number of duplicates.
select t1.data_id, IF(t2.dups > 1, row_num, '') AS row_num
from (
select data_id,
#row:=if(#prev=data_id,#row,0) + 1 as row_num,
#prev:=data_id
from my_table
order by data_id
) AS t1
join (
select data_id, COUNT(*) AS dups
FROM my_table
GROUP BY data_id
) AS t2 ON t1.data_id = t2.data_id
If you want to have the old "order" of the old table, you need much more code
SELECT
data_id, IF (row_num = 1 AND cntid = 1, NULL,row_num)
FROM
(SELECT
#row:=IF(#prev = t1.data_id, #row, 0) + 1 AS row_num,
cntid,
#prev:=t1.data_id data_id
FROM
(SELECT
*
FROM
my_table
ORDER BY data_id) t1
INNER JOIN (SELECT Count(*) cntid,data_id FROM my_table GROUP BY data_id)t2
ON t1.data_id = t2.data_id) t2
data_id | IF (row_num = 1 AND cntid = 1, NULL,row_num)
------: | -------------------------------------------:
1232 | 1
1232 | 2
2031 | null
2355 | 1
2355 | 2
2355 | 3
4562 | null
9867 | null
db<>fiddle here

SQL query to find new column

I need your help. I have a table named Test_Result with 2 columns as shown below.
ID Source_ID
10 1
20 2
30 2
40 3
50 3
60 3
70 4
I am trying to get output as below,but unable to get logic.
ID Parent_ID Source_ID
10 Null 1
20 Null 2
30 20 2
40 Null 3
50 40 3
60 50 3
70 Null 4
Kindly help me with this scenario. I attached question in picture for as well.
Regards,
Abhi
These solutions (ROW_NUMBER/LAG) will work for MySQL 8.0+ or MariaDB 10.2
You could use ROW_NUMBER() and join to previous row:
CREATE TABLE tab(ID INT ,Source_ID INT);
INSERT INTO tab(id, Source_id)
SELECT 10, 1
UNION ALL SELECT 20 , 2
UNION ALL SELECT 30, 2
UNION ALL SELECT 40 , 3
UNION ALL SELECT 50 , 3
UNION ALL SELECT 60 , 3
UNION ALL SELECT 70 , 4;
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(ORDER BY id) AS rn
FROM tab
)
SELECT c1.ID,
CASE WHEN c1.Source_ID = c2.Source_ID THEN c2.Id END AS Parent_Id,
c1.Source_ID
FROM cte c1
LEFT JOIN cte c2
ON c1.rn = c2.rn+1;
Rextester Demo
EDIT:
Using LAG() windowed function:
SELECT c1.ID,
CASE
WHEN c1.Source_ID = LAG(Source_ID) OVER w THEN LAG(ID) OVER w
END AS Parent_Id,
c1.Source_ID
FROM tab c1
WINDOW w AS (ORDER BY ID)
ORDER BY id;
DBFiddle
EDIT2:
Simulating LAG using variables:
SET #lag_Source_id='';
SET #lag_Id = '';
SELECT ID,
CASE WHEN Source_Id = lag_Source_ID THEN lag_ID END AS Parent_ID
,Source_ID
FROM (
SELECT ID
, Source_ID
, #lag_Source_id AS lag_Source_id
, #lag_Source_id:= Source_ID AS curr_Source_ID
, #lag_Id AS lag_ID
, #lag_Id := ID AS curr_ID
FROM tab
ORDER BY id
) AS sub
RextesterDemo2
if you are using mysql database simply do this,
SELECT ID, (ID + Source_ID) AS Parent_ID, Source_ID FROM tableName LIMIT 10;

MySQL select upto first occurrence of condition matching

Id | Price
----------------
1 | 10
2 | 20
3 | 40
4 | 10
I need to select ids where first occurrence of summation of price is greater than or equal 55 matching from the bottom. At this case --
I will have 4,3,2 ids selected.
Well, this is kinda tricky for MySQL since it doesn't support any window fuctions and becuase you want to include the first occurrence as well. You can try this:
SELECT * FROM (
SELECT t.id,
(SELECT sum(s.price) FROM YourTable s
WHERE s.id <= t.id) as cuml_sum
FROM YourTable t) ss
WHERE ss.cuml_sum < 55
--Will select all the record will the sum < 55
UNION ALL
SELECT * FROM (
SELECT t.id,
(SELECT sum(s.price) FROM YourTable s
WHERE s.id <= t.id) as cuml_sum
FROM YourTable t) tt
WHERE tt.cuml_sum >= 55
ORDER BY tt.cuml_sum
LIMIT 1
--Will select the first record that have sum >= 55

finding second position in mysql

I need to pull the name of the students who stood second positions from grade 1 to grade 12. each grade has separate databases with similar table structure
I have the following data:
Set 1
uid marks
1 10
2 20
3 17
4 17
5 20
6 20
Set 2
uid marks
1 10
2 20
3 17
4 17
5 20
6 17
7 20
I need a query which can say uid 3,4 are second in set 1 and 3,4,6 are second in set 2.
i need it in a single query because there are several set of databases
what could be the possible way?
I tried:
SELECT * FROM TBL WHERE marks ! = SELECT MAX(marks) from tbl
but it fetched all marks except the highest
Try this out:
SELECT uid, marks FROM (
SELECT uid, marks, #rank := #rank + (#prevMarks != marks) rank, #prevMarks := marks
FROM t, (SELECT #rank := 0, #prevMarks := 0) init
ORDER BY marks
) s
WHERE rank = 2
Fiddle here.
Another alternative without User Defined Variables:
SELECT t.uid, t.marks FROM t
JOIN (
SELECT DISTINCT marks FROM t
ORDER BY marks
LIMIT 1, 1
) s
ON t.marks = s.marks
Output:
| UID | MARKS |
|-----|-------|
| 3 | 17 |
| 4 | 17 |
Use LIMIT and ORDER BY
SELECT * FROM TBL ORDER BY marks DESC LIMIT 1,1
There you ordered all students by marks fro hi to low. And then limit return from second (0 is first record) and return only one record.
If need all students with second mark, the use subquery
SELECT * FROM TBL WHERE marks = (
SELECT marks FROM TBL ORDER BY marks DESC GROUP BY marks LIMIT 1,1
)
SELECT *
FROM table
WHERE mark = (
SELECT MAX(mark)
FROM table
WHERE mark <
(
SELECT MAX(mark)
FROM table
)
)
Try this
SELECT t.marks, t.uid, (
SELECT COUNT( marks ) +1
FROM tbl t1
WHERE t.marks < t1.marks
) AS rank
FROM tbl t
LIMIT 0 , 30
now you can use rank column with bit modification below
SELECT * from (
SELECT t.marks, t.uid, (
SELECT COUNT( marks ) +1
FROM tbl t1
WHERE t.marks < t1.marks
) AS rank
FROM tbl t
) alias where rank=n (2 here)

SQL count non sequential dates

Here's the data:
empID Date Type
----- -------- ----
1 1/1/2012 u
1 1/2/2012 u
1 1/3/2012 u
1 2/2/2012 u
4 1/1/2012 u
4 1/3/2012 u
4 1/4/2012 u
4 1/6/2012 u
Would return:
empID count
----- -----
1 2
4 3
When two dates are "together" they count as one occurrence, if the dates are separated out, they count as two occurrences. This is for tracking employee attendance... how would the SQL statement look to group by "together" dates and count them as 1... I'm really struggling with the logic.
SELECT
empID
, COUNT(*) AS cnt
FROM
tableX AS x
WHERE
NOT EXISTS
( SELECT *
FROM tableX AS y
WHERE y.empID = x.empID
AND DATEADD ("d", -1, x.[Date]) = y.[Date]
)
GROUP BY
empID ;
try this:
;WITH CTE as
(select *,ROW_NUMBER() over (partition by empID order by date) as rn from test2 t1)
select empID,COUNT(*) as count
from CTE c1
where isnull((DATEDIFF(day,(select date from CTE where c1.rn=rn+1 and empID=c1.empID ),c1.date)),0) <> 1
group by empID