Select oldest two records from group - mysql

I've found a number of examples showing how to select a single oldest/newest row from a grouped set, but am having trouble getting the oldest two rows from a data set.
Here's my sample table:
CREATE TABLE IF NOT EXISTS `orderTable` (
`customer_id` varchar(10) NOT NULL,
`order_id` varchar(4) NOT NULL,
`date_added` date NOT NULL,
PRIMARY KEY (`customer_id`,`order_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `orderTable` (`customer_id`, `order_id`, `date_added`) VALUES
('1234', '5A', '1997-01-22'),
('1234', '88B', '1992-05-09'),
('0487', 'F9', '2002-01-23'),
('5799', 'A12F', '2007-01-23'),
('1234', '3A', '2009-01-22'),
('3333', '7FHS', '2009-01-22'),
('0487', 'Z33', '2004-06-23'),
('3333', 'FF44', '2013-09-11'),
('3333', '44f5', '2013-09-02');
This query returns more than two rows:
SELECT customer_id, order_id, date_added
FROM orderTable T1
WHERE (
select count(*) FROM orderTable T2
where T2.order_id = T1.order_id AND T2.date_added <= T1.date_added
) <= 2;
Since I am not looking for a single row, this is not a standard greatest-n-per-group type query.
What am I missing that I can get the first two orders for each customer_id?

The best (i.e. most performant) approach is to use a User Defined Variable in the query.
SELECT tmp.customer_id, tmp.date_added
FROM (
SELECT
customer_id, date_added,
IF (#prev <> customer_id, #rownum := 1, #rownum := #rownum+1 ) rank,
#prev := customer_id
FROM orderTable t
JOIN (SELECT #rownum := NULL, #prev := 0) r
ORDER BY t.customer_id
) tmp
WHERE tmp.rank <= 2
ORDER BY customer_id, date_added
Results:
| CUSTOMER_ID | DATE_ADDED |
|-------------|----------------------------------|
| 0487 | January, 23 2002 00:00:00+0000 |
| 0487 | June, 23 2004 00:00:00+0000 |
| 1234 | May, 09 1992 00:00:00+0000 |
| 1234 | January, 22 1997 00:00:00+0000 |
| 3333 | January, 22 2009 00:00:00+0000 |
| 3333 | September, 02 2013 00:00:00+0000 |
| 5799 | January, 23 2007 00:00:00+0000 |
Fiddle here.
Note that the join is just being used to initialise the variables.

Your original query should be (use customer_id in subquery)
SELECT customer_id, order_id, date_added
FROM orderTable T1
WHERE (
select count(*) FROM orderTable T2
where T2.customer_id = T1.customer_id AND T2.date_added <= T1.date_added
) <= 2;
You can also use variables:
SELECT customer_id, order_id, date_added FROM (
SELECT customer_id, order_id, date_added,
#rownum := if(#prev_cust = customer_id, #rownum + 1,1) as rn,
#prev_cust := customer_id cust_var
FROM orderTable T1,
(SELECT #rownum := 0) r,
(SELECT #prev_cust := '') c
order by customer_id, date_added
) o where o.rn < 3;
SQL DEMO

Here's another (deliberately incomplete) method, though others may have a point about performance...
SELECT x.*
, COUNT(*) rank
FROM ordertable x
JOIN ordertable y
ON y.customer_id = x.customer_id
AND y.date_added <= x.date_added
GROUP
BY x.customer_id
, x.date_added;

This should produce the results you're after, but the outer SELECT won't be the most efficient as it's filtering on a derived table.
SELECT ranked.*
FROM (
SELECT ot.* ,
#rownum := IF( ot.customer_id = #previous , #rownum +1, 1 ) rank,
#previous := ot.customer_id
FROM orderTable ot,
(SELECT #rownum :=1, #previous := NULL) init
ORDER BY customer_id, date_added
) ranked
WHERE rank <=2

Related

How to group data based on ranking in mysql

I am struggling to create query I want to group data by customer id based on score . customer have multiple score I want to combine customer score by their ranking
below the table structure
CREATE TABLE `score` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`customer_id` varchar(10) DEFAULT NULL,
`score` int(6) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=latin1;
insert into `score`(`id`,`customer_id`,`score`)
values (1,'C1',20), (2,'C1',10),(3,'C3',30),(4,'C1',30),(5,'C2',40),
(6,'C2',50),(7,'C2',20),(8,'C1',50),(9,'C3',20),
(10,'C1',50);
Table result look like
id customer_id score
1 C1 20
2 C1 10
3 C3 30
4 C1 30
5 C2 40
6 C2 50
7 C2 20
Desire result :
customer_id score Rank
C1 30 1
C1 20 2
C1 10 3
C2 50 1
C2 40 2
C2 20 3
C3 30 1
try this
SELECT
a.score AS score,
#rn := IF(#PREV = customer_id, #rn + 1, 1) AS rank,
#PREV := customer_id AS cutomerId
FROM score AS a
JOIN (SELECT #PREV := NULL, #rn := 0) AS vars
ORDER BY customer_id, score DESC, id
You can use variables for this:
SELECT id, customer_id, score,
#rnk := IF(#cid = customer_id, #rnk + 1,
IF(#cid := customer_id, 1, 1)) AS rank
FROM score
CROSS JOIN (SELECT #rnk := 0, #cid := '') AS v
ORDER BY customer_id, score DESC
Demo here

How would I return the result of SQL math operations?

So I was taking a test recently with some higher level SQL problems. I only have what I would consider "intermediate" experience in SQL and I've been working on this for a day or so now. I just can't figure it out.
Here's the problem:
You have a table with 4 columns as such:
EmployeeID int unique
EmployeeType int
EmployeeSalary int
Created date
Goal: I need to retrieve the difference between the latest two EmployeeSalary for any EmployeeType with more than 1 entry. It has to be done in one statement (nested queries are fine).
Example Data Set: http://sqlfiddle.com/#!9/0dfc7
EmployeeID | EmployeeType | EmployeeSalary | Created
-----------|--------------|----------------|--------------------
1 | 53 | 50 | 2015-11-15 00:00:00
2 | 66 | 20 | 2014-11-11 04:20:23
3 | 66 | 30 | 2015-11-03 08:26:21
4 | 66 | 10 | 2013-11-02 11:32:47
5 | 78 | 70 | 2009-11-08 04:47:47
6 | 78 | 45 | 2006-11-01 04:42:55
So for this data set, the proper return would be:
EmployeeType | EmployeeSalary
-------------|---------------
66 | 10
78 | 25
The 10 comes from subtracting the latest two EmployeeSalary values (30 - 20) for the EmployeeType of 66. The 25 comes from subtracting the latest two EmployeeSalary values (70-45) for EmployeeType of 78. We skip EmployeeID 53 completely because it only has one value.
This one has been destroying my brain. Any clues?
Thanks!
How to make really simple query complex?
One funny way(not best performance) to do it is:
SELECT final.EmployeeType, SUM(salary) AS difference
FROM (
SELECT b.EmployeeType, b.EmployeeSalary AS salary
FROM tab b
JOIN (SELECT EmployeeType, GROUP_CONCAT(EmployeeSalary ORDER BY Created DESC) AS c
FROM tab
GROUP BY EmployeeType
HAVING COUNT(*) > 1) AS sub
ON b.EmployeeType = sub.EmployeeType
AND FIND_IN_SET(b.EmployeeSalary, sub.c) = 1
UNION ALL
SELECT b.EmployeeType, -b.EmployeeSalary AS salary
FROM tab b
JOIN (SELECT EmployeeType, GROUP_CONCAT(EmployeeSalary ORDER BY Created DESC) AS c
FROM tab
GROUP BY EmployeeType
HAVING COUNT(*) > 1) AS sub
ON b.EmployeeType = sub.EmployeeType
AND FIND_IN_SET(b.EmployeeSalary, sub.c) = 2
) AS final
GROUP BY final.EmployeeType;
SqlFiddleDemo
EDIT:
The keypoint is MySQL doesn't support windowed function so you need to use equivalent code:
For example solution in SQL Server:
SELECT EmployeeType, SUM(CASE rn WHEN 1 THEN EmployeeSalary
ELSE -EmployeeSalary END) AS difference
FROM (SELECT *,
ROW_NUMBER() OVER(PARTITION BY EmployeeType ORDER BY Created DESC) AS rn
FROM #tab
) AS sub
WHERE rn IN (1,2)
GROUP BY EmployeeType
HAVING COUNT(EmployeeType) > 1
LiveDemo
And MySQL equivalent:
SELECT EmployeeType, SUM(CASE rn WHEN 1 THEN EmployeeSalary
ELSE -EmployeeSalary END) AS difference
FROM (
SELECT t1.EmployeeType, t1.EmployeeSalary,
count(t2.Created) + 1 as rn
FROM #tab t1
LEFT JOIN #tab t2
ON t1.EmployeeType = t2.EmployeeType
AND t1.Created < t2.Created
GROUP BY t1.EmployeeType, t1.EmployeeSalary
) AS sub
WHERE rn IN (1,2)
GROUP BY EmployeeType
HAVING COUNT(EmployeeType) > 1;
LiveDemo2
The dataset of the fiddle is different from the example above, which is confusing (not to mention a little perverse). Anyway, there's lots of ways to skin this particular cat. Here's one (not the fastest, however):
SELECT a.employeetype, ABS(a.employeesalary-b.employeesalary) diff
FROM
( SELECT x.*
, COUNT(*) rank
FROM employees x
JOIN employees y
ON y.employeetype = x.employeetype
AND y.created >= x.created
GROUP
BY x.employeetype
, x.created
) a
JOIN
( SELECT x.*
, COUNT(*) rank
FROM employees x
JOIN employees y
ON y.employeetype = x.employeetype
AND y.created >= x.created
GROUP
BY x.employeetype
, x.created
) b
ON b.employeetype = a.employeetype
AND b.rank = a.rank+1
WHERE a.rank = 1;
a very similar but faster solution looks like this (although you sometimes need to assign different variables between tables a and b - for reasons I still don't fully understand)...
SELECT a.employeetype
, ABS(a.employeesalary-b.employeesalary) diff
FROM
( SELECT x.*
, CASE WHEN #prev = x.employeetype THEN #i:=#i+1 ELSE #i:=1 END i
, #prev := x.employeetype prev
FROM employees x
, (SELECT #prev := 0, #i:=1) vars
ORDER
BY x.employeetype
, x.created DESC
) a
JOIN
( SELECT x.*
, CASE WHEN #prev = x.employeetype THEN #i:=#i+1 ELSE #i:=1 END i
, #prev := x.employeetype prev
FROM employees x
, (SELECT #prev := 0, #i:=1) vars
ORDER
BY x.employeetype
, x.created DESC
) b
ON b.employeetype = a.employeetype
AND b.i = a.i + 1
WHERE a.i = 1;

mysql row number count down and dynamic number of row

I believe it can be solve by temp table/stored procedure but in case it can be done by single SQL statement.
Goal: List all row with count down by year, however number of row of each year is different. Row can be order by date
Result Arm to:
|-Count Down-|-Date-------|
| 3 | 2013-01-01 | <- Start with number of Row of each year
| 2 | 2013-03-15 |
| 1 | 2013-06-07 |
| 5 | 2014-01-01 | <- Start with number of Row of each year
| 4 | 2014-03-17 |
| 3 | 2014-07-11 |
| 2 | 2014-08-05 |
| 1 | 2014-11-12 |
SQL:
Select #row_number:=#row_number-1 AS CountDown, Date
FROM table JOIN
(Select #row_number:=COUNT(*), year(date) FROM table GROUP BY year(date))
Is there any solution for that?
The subquery that gets the count by year needs to return the year, so you can join it with the main table to get the starting number for the countdown. And you need to detect when the year changes, so you need another variable for that.
SELECT #row_number := IF(YEAR(d.Date) = #prevYear, #row_number-1, y.c) AS CountDown,
d.Date, #prevYear := YEAR(d.Date)
FROM (SELECT Date
FROM Table1
ORDER BY Date) AS d
JOIN
(Select count(*) AS c, year(date) AS year
FROM Table1
GROUP BY year(date)) AS y
ON YEAR(d.Date) = y.year
CROSS JOIN (SELECT #prevYear := NULL) AS x
DEMO
You can do the count down using variables (or correlated subqueries). The following does the count, but the returned data is not in the order you specify:
select (#rn := if(#y = year(date), #rn + 1,
if(#y := year(date), 1, 1)
)
) as CountDown, t1.*
from table1 cross join
(select #y := 0, #rn := 0) vars
order by date desc;
That is easily fixed with another subquery:
select t.*
from (select (#rn := if(#y = year(date), #rn + 1,
if(#y := year(date), 1, 1)
)
) as CountDown, t1.*
from table1 cross join
(select #y := 0, #rn := 0) vars
order by date desc
) t
order by date;
Note the complicated expression for assigning CountDown. This expression is setting both variables (#y and #rn) in a single expression. MySQL does not guarantee the order of evaluation of expressions in a select. If you assign these in different expressions, then they might be executed in the wrong order.

What's the SQL idiom for zipping — in the functional sense — two queries?

For example, if I have a set of classes and a set of classrooms, and I want to pair the two up with some arbitrary pairing:
> SELECT class_name FROM classes ORDER BY class_name
Calculus
English
History
> SELECT room_name FROM classrooms ORDER BY room_name
Room 101
Room 102
Room 201
I'd like to "zip" them like this:
> SELECT class_name FROM classes ORDER … ZIP SELECT room_name FROM classrooms ORDER …
Calculus | Room 101
English | Room 102
History | Room 201
Currently I'm dealing with MySQL… but possibly — optimistically? — there is a reasonably standards compliant way to do this?
One way to do it in MySql
SELECT c.class_name, r.room_name
FROM
(
SELECT class_name, #n := #n + 1 rnum
FROM classes CROSS JOIN (SELECT #n := 0) i
ORDER BY class_name
) c JOIN
(
SELECT room_name, #m := #m + 1 rnum
FROM classrooms CROSS JOIN (SELECT #m := 0) i
ORDER BY room_name
) r
ON c.rnum = r.rnum
Output:
| CLASS_NAME | ROOM_NAME |
-------------|-----------|
| Calculus | Room 101 |
| English | Room 102 |
| History | Room 201 |
Here is SQLFIddle demo
Same thing in Postgres will look like
SELECT c.class_name, r.room_name
FROM
(
SELECT class_name,
ROW_NUMBER() OVER (ORDER BY class_name) rnum
FROM classes
) c JOIN
(
SELECT room_name,
ROW_NUMBER() OVER (ORDER BY room_name) rnum
FROM classrooms
) r
ON c.rnum = r.rnum
Here is SQLFiddle demo
And in SQLite
SELECT c.class_name, r.room_name
FROM
(
SELECT class_name,
(SELECT COUNT(*)
FROM classes
WHERE c.class_name >= class_name) rnum
FROM classes c
) c JOIN
(
SELECT room_name,
(SELECT COUNT(*)
FROM classrooms
WHERE r.room_name >= room_name) rnum
FROM classrooms r
) r
ON c.rnum = r.rnum
Here is SQLFiddle demo
This is a form of join, but you need to create the join key. Alas, though, this requires a full outer join, because you do not know which list is longer.
So, you can do this by using variables to enumerate the rows and then using union all and group by to get the values:
select max(case when which = 'class' then name end) as class_name,
max(case when which = 'room' then name end) as room_name
from ((SELECT class_name as name, #rnc := #rnc + 1 as rn, 'class' as which
FROM classes cross join
(select #rnc := 0) const
ORDER BY class_name
) union all
(select room_name, #rnr := #rnr + 1 as rn, 'room'
from classrooms cross join
(select #rnr := 0) const
ORDER BY room_name
)
) t
group by rn;

How to group continuous ranges using MySQL

I have a table that contains categories, dates and rates. Each category can have different rates for different dates, one category can have only one rate at a given date.
Id CatId Date Rate
------ ------ ------------ ---------
000001 12 2009-07-07 1
000002 12 2009-07-08 1
000003 12 2009-07-09 1
000004 12 2009-07-10 2
000005 12 2009-07-15 1
000006 12 2009-07-16 1
000007 13 2009-07-08 1
000008 13 2009-07-09 1
000009 14 2009-07-07 2
000010 14 2009-07-08 1
000010 14 2009-07-10 1
Unique index (catid, Date, Rate)
I would like for each category to group all continuous dates ranges and keep only the begin and the end of the range.
For the previous example, we would have:
CatId Begin End Rate
------ ------------ ------------ ---------
12 2009-07-07 2009-07-09 1
12 2009-07-10 2009-07-10 2
12 2009-07-15 2009-07-16 1
13 2009-07-08 2009-07-09 1
14 2009-07-07 2009-07-07 2
14 2009-07-08 2009-07-08 1
14 2009-07-10 2009-07-10 1
I found a similar solution in the forum which did not exactly give the result
WITH q AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY CatId, Rate ORDER BY [Date]) AS rnd,
ROW_NUMBER() OVER (PARTITION BY CatId ORDER BY [Date]) AS rn
FROM my_table
)
SELECT CatId AS catidd, MIN([Date]) as beginn, MAX([Date])as endd, Rate
FROM q
GROUP BY CatId, rnd - rn, Rate
SEE SQL FIDDLE
How can I do the same thing in mysql?
Please help!
MySQL doesn't support analytic functions, but you can emulate such behaviour with user-defined variables:
SELECT CatID, Begin, MAX(Date) AS End, Rate
FROM (
SELECT my_table.*,
#f:=CONVERT(
IF(#c<=>CatId AND #r<=>Rate AND DATEDIFF(Date, #d)=1, #f, Date), DATE
) AS Begin,
#c:=CatId, #d:=Date, #r:=Rate
FROM my_table JOIN (SELECT #c:=NULL) AS init
ORDER BY CatId, Rate, Date
) AS t
GROUP BY CatID, Begin, Rate
See it on sqlfiddle.
SELECT catid,min(ddate),max(ddate),rate
FROM (
SELECT
Catid,
Ddate,
rate,
#rn := CASE WHEN (#prev <> rate
or DATEDIFF(ddate, #prev_date)>1) THEN #rn+1 ELSE #rn END AS rn,
#prev := rate,
#prev_id := catid ,
#prev_date :=ddate
FROM (
SELECT CatID,Ddate,rate
FROM rankdate
ORDER BY CatID, Ddate ) AS a ,
(SELECT #prev := -1, #rn := 0, #prev_id:=0 ,#prev_date:=-1) AS vars
) T1 group by catid,rn
Note: The line (SELECT #prev := -1, #rn := 0, #prev_id:=0 ,#prev_date:=-1) AS vars is not necessary in Mysql Workspace, but it is in the PHP mysql_query function.
SQL FIDDLE HERE
I know I am late, still posting a solution that worked for me.
Had the same issue, here's how I got it
Found a good solution using variables
SELECT MIN(id) AS id, MIN(date) AS date, MIN(state) AS state, COUNT(*) cnt
FROM (
SELECT #r := #r + (#state != state OR #state IS NULL) AS gn,
#state := state AS sn,
s.id, s.date, s.state
FROM (
SELECT #r := 0,
#state := NULL
) vars,
t_range s
ORDER BY
date, state
) q
GROUP BY gn
More details at : https://explainextended.com/2009/07/24/mysql-grouping-continuous-ranges/