How to group data based on ranking in mysql - mysql

I am struggling to create query I want to group data by customer id based on score . customer have multiple score I want to combine customer score by their ranking
below the table structure
CREATE TABLE `score` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`customer_id` varchar(10) DEFAULT NULL,
`score` int(6) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=latin1;
insert into `score`(`id`,`customer_id`,`score`)
values (1,'C1',20), (2,'C1',10),(3,'C3',30),(4,'C1',30),(5,'C2',40),
(6,'C2',50),(7,'C2',20),(8,'C1',50),(9,'C3',20),
(10,'C1',50);
Table result look like
id customer_id score
1 C1 20
2 C1 10
3 C3 30
4 C1 30
5 C2 40
6 C2 50
7 C2 20
Desire result :
customer_id score Rank
C1 30 1
C1 20 2
C1 10 3
C2 50 1
C2 40 2
C2 20 3
C3 30 1

try this
SELECT
a.score AS score,
#rn := IF(#PREV = customer_id, #rn + 1, 1) AS rank,
#PREV := customer_id AS cutomerId
FROM score AS a
JOIN (SELECT #PREV := NULL, #rn := 0) AS vars
ORDER BY customer_id, score DESC, id

You can use variables for this:
SELECT id, customer_id, score,
#rnk := IF(#cid = customer_id, #rnk + 1,
IF(#cid := customer_id, 1, 1)) AS rank
FROM score
CROSS JOIN (SELECT #rnk := 0, #cid := '') AS v
ORDER BY customer_id, score DESC
Demo here

Related

Select multiple tables in mysql

I need to make the following query:
I have 4 tables, the first is the main, in which with the 'id' is foreign in the other 3 tables. I need to get the date and description of each of the tables where it presents the id_tabla1. In some tables I have more records than in the other.
Is it possible to relate these tables?
Table 1 main
id_table1
Name
Table 2
id_table2
date
description
fk_table1
Table 3
id_table3
date
description
fk_table1
Table 4
id_table4
date
description
fk_table1
I want to get something like this:
This type of operation is a bit of a pain in MySQL. In fact, the result is not particularly "relational", because each column is a separate list. You can't do a join because there is no join key.
You can generate one in MySQL using variables and then use aggregation. Here is an example with two tables:
select id_table1,
max(t2_date) as t2_date,
max(t2_desc) as t2_desc,
max(t3_date) as t3_date,
max(t3_desc) as t3_desc
from ((select id_table1, NULL as t2_date, NULL as t2_desc, NULL as t3_date, NULL as t3_desc, 1 as rn
from table1 t1
) t1 union all
(select fk_table1, date as t2_date, description as t2_desc, NULL as t3_date, NULL as t3_desc,
(#rn1 := if(#fk1 = fk_table1, #rn1 + 1,
if(#fk1 := fk_table1, 1, 1)
)
) as rn
from table1 t1 cross join
(select #rn1 := 0, #fk1 := 0) params
order by fk_table1, date
) t1 union all
(select fk_table1, NULL, NULL, date as t3_date, description as t3_desc
(#rn2 := if(#fk2 = fk_table1, #rn2 + 1,
if(#fk2 := fk_table1, 1, 1)
)
) as rn
from table1 t1 cross join
(select #rn2 := 0, #fk2 := 0) params
order by fk_table1, date
)
) t
group by id_table1, rn;

How to get temporal sequence by mysql

In my table there is an id column, a date column and a status column like this:
ID DATE STATUS
1 0106 A
1 0107 A
1 0112 A
1 0130 B
1 0201 A
2 0102 C
2 0107 C
I want to get a temporal sequence of each ID. Which means if in the neighboring time one id is in the same status, then the former ones will be omitted. The query result is like:
ID DATE STATUS
1 0112 A
1 0130 B
1 0201 A
2 0107 C
How can I realize it by MySQL?
You have to use variable to do this:
select `id`, `date`, `status`
from (
select *, #rowno:=if(#grp = `STATUS`, #rowno + 1 , 1) as rowno, #grp := `STATUS`
from yourtable
cross join (select #grp := null, #rowno := 0) t
order by `id`, `date` desc
) t1
where rowno = 1
order by `id`, `date`
SqlFiddle Demo

how to find repeated value and count in mysql

I have table with 3 columns, now how find value if it appears next 3 times immediately
i.e 1st trnas_value appears in next 3 consecutive times (repeaded 4 times) and 2nd and 6th also rows also repeated the same.date column is sorted from A_Z
date tran_val name
23mar 22 mark
24mar 22 mark
25mar 22 mark
26mar 22 mark
27mar 22 mark
28jan 99 john
29jan 99 john
30jan 99 john
31jan 99 john
output
name trans_value consecutive_count
mark 22 2
john 99 1
we have a code which is not giving the above output..
SELECT name,
tran_val,
MAX(cnt - 3) AS consecutive_count
FROM
(
SELECT date,
tran_val,
name,
#cnt:=IF(#tran_val=tran_val AND #name=name, #cnt + 1, 1) AS cnt,
#tran_val:=tran_val,
#name:=name
FROM some_table
CROSS JOIN (SELECT #cnt:=0, #tran_val:=0, #name:='') sub0
ORDER BY `date`
) sub1
GROUP BY name,
tran_val
any modification in the above code which will get desired output.thanks
Try this:
SELECT `tran_val`, `name`, COUNT(*) - 3
FROM (
SELECT `date`, `tran_val`, `name`, rn - seq AS grp
FROM (
SELECT `date`, `tran_val`, `name`,
#rn := #rn + 1 AS rn,
#seq := IF(#name = `name` And #val = `tran_val`, #seq+1, 1) AS seq,
#name := name,
#val := tran_val
FROM mytable
CROSS JOIN (SELECT #rn := 0, #seq := 0, #name = '', #val = 0) AS vars
ORDER BY `date`) AS t ) AS s
GROUP BY `tran_val`, `name`, grp
HAVING COUNT(*) > 3
You need two separate variable to enumerate sequences:
#rn just enumerates consecutive table rows
#seq enumerates consecutive table rows having the same name, tran_val values.
The difference between these two variables, i.e. #rn - #seq, identifies islands of consecutive table rows having the same name, tran_val values.
Edit: I added a HAVING clause to the query so as to filter out islands having a population of 3 or less consecutive rows.
Demo here

Group by IDs and TIMESTAMPDIFF one column in same table

I am trying to find out "how many unique messages has been sent to a person on a specific boat within a timeframe, and what is the minimum days between those texts" and display it including the count.
A person is represented by 'id', boat by 'id2' and message by 'text'.
CREATE TABLE `stacktable` (
`timestamp` DATETIME NOT NULL,
`id` VARCHAR(15) NOT NULL,
`id2` VARCHAR(3) NULL DEFAULT NULL,
`text` VARCHAR(255) NULL DEFAULT NULL,
`id3` INT(10) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id3`)
);
insert into stacktable (timestamp,id,id2,text) VALUES
('2015-01-01 00:00:01',1,10,'ABC'),
('2015-01-01 00:00:01',2,11,'ABC'),
('2015-01-01 00:00:01',3,12,'ABC'),
('2015-01-01 00:00:02',3,12,'ABC'),
('2015-01-01 00:00:02',1,10,'ABC'),
('2015-01-04 00:00:01',1,10,'ABC'),
('2015-01-04 00:00:01',1,10,'BCD'),
('2015-01-04 00:00:01',2,11,'ABC'),
('2015-01-04 00:00:01',2,11,'BCD'),
('2015-01-04 00:00:01',3,12,'ABC'),
('2015-01-04 00:00:01',3,12,'BCD'),
('2015-01-04 00:00:01',3,13,'CDE'),
('2015-01-07 00:00:01',2,11,'BCD'),
('2015-01-07 00:00:01',3,12,'BCD'),
('2015-01-07 00:00:01',3,13,'CDE'),
('2015-01-07 00:00:01',3,13,'DEF'),
('2015-01-08 00:00:01',3,12,'ABC'),
('2015-01-08 00:00:01',4,14,'EFG'),
('2015-01-09 00:00:01',4,14,'EFG'),
('2015-01-09 00:00:02',4,15,'FGH'),
('2015-01-10 00:00:01',4,14,'EFG'),
('2015-01-10 00:00:01',4,14,'FGH'),
('2015-01-10 00:00:01',4,15,'FGH'),
('2015-01-11 00:00:01',4,14,'EFG'),
('2015-01-15 00:00:01',4,14,'EFG');
To show what I am trying to achieve:
select * from stacktable where id = 1
timestamp id id2 text id3
2015-01-01 00:00:01 1 10 ABC 1 First entry for id+id2+text (ABC)
2015-01-01 00:00:02 1 10 ABC 5 Second entry for same keys id+id2+text 1 second later
2015-01-04 00:00:01 1 10 ABC 6 Third entry for same keys id+id2+text 2 days later
2015-01-04 00:00:01 1 10 BCD 7 First entry for id+id2+text (BCD)
I only want to count records that has "same id,id2 and text within a period of 2 days", but also show the "minimum diffdate in days between the hits".
The output I want from this would be:
id id2 text count(*) mindiffdatebetweenhits
-------------------------------------------
1 10 ABC 3 0 count id3s 1,5 and 6, minimumdaydiff is between id3 1 and 5 = 0 days
3 12 ABC 3 0 count id3s 3,4 and 10, minimumdaydiff is between id3 3 and 4 = 0 days
4 14 EFG 4 1 count id3s 18,19,21 and 24, minimumdaydiff is equal between all hits = 1 day
4 15 FGH 2 0 count id3s 20 and 23, minimumdaydiff is between id3 20 and 23 = 0 days
How can I get the desired output?
This should do it, assuming sequences of only one row are to be discarded:
select id, id2, text, seq, count(id) as total, min(diff) as mindiff
from (
select t1.row, t2.row row2, t1.id, t1.id2, t1.text, t1.id3,
TIMESTAMPDIFF(DAY, t1.timestamp, t2.timestamp) as diff,
IF (TIMESTAMPDIFF(DAY, t1.timestamp, t2.timestamp) > 2, #seq * (1 and #seq := #seq +1), #seq) as seq
from (select (#row := #row + 1) as row, id, id2, text, id3, timestamp
from (select id, id2, text, id3, timestamp
from stacktable
order by id, id2, text) sorted,
(select #row := 0) setup) t1
left join (select (#row2 := #row2 + 1) as row, id, id2, text, id3, timestamp
from (select id, id2, text, id3, timestamp
from stacktable
order by id, id2, text) sorted,
(select #row2 := 0) setup) t2
on (t1.id = t2.id and t1.id2 = t2.id2 and t1.text=t2.text and t1.row = t2.row - 1),
(select #seq := 1) setup_sequence
) t3
group by id, id2, text, seq
having total > 1
To facilitate reading, the query uses the same subquery tow times, t1 and t2, and all it does is sort and subsequently number the rows of the table:
select (#row := #row + 1) as row, id, id2, text, id3, timestamp
from (select id, id2, text, id3, timestamp
from stacktable
order by id, id2, text) sorted,
(select #row := 0) setup
See fiddle. Note that the sequence counter is really not unique between all sequences. It's not a bug. It's only unique between sequences of same id,id2,text.
The sequence counter update is a bit tricky: #seq * (1 and #seq := #seq +1). It relies on the first #seq being set up for the multiplication before being updated. I'm not sure this is deterministic or consistent accross engines. However, the query can also be changed to avoid it by joining the records of t1 with the previous record instead of the next record (in t2). (not tried out)

Select oldest two records from group

I've found a number of examples showing how to select a single oldest/newest row from a grouped set, but am having trouble getting the oldest two rows from a data set.
Here's my sample table:
CREATE TABLE IF NOT EXISTS `orderTable` (
`customer_id` varchar(10) NOT NULL,
`order_id` varchar(4) NOT NULL,
`date_added` date NOT NULL,
PRIMARY KEY (`customer_id`,`order_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `orderTable` (`customer_id`, `order_id`, `date_added`) VALUES
('1234', '5A', '1997-01-22'),
('1234', '88B', '1992-05-09'),
('0487', 'F9', '2002-01-23'),
('5799', 'A12F', '2007-01-23'),
('1234', '3A', '2009-01-22'),
('3333', '7FHS', '2009-01-22'),
('0487', 'Z33', '2004-06-23'),
('3333', 'FF44', '2013-09-11'),
('3333', '44f5', '2013-09-02');
This query returns more than two rows:
SELECT customer_id, order_id, date_added
FROM orderTable T1
WHERE (
select count(*) FROM orderTable T2
where T2.order_id = T1.order_id AND T2.date_added <= T1.date_added
) <= 2;
Since I am not looking for a single row, this is not a standard greatest-n-per-group type query.
What am I missing that I can get the first two orders for each customer_id?
The best (i.e. most performant) approach is to use a User Defined Variable in the query.
SELECT tmp.customer_id, tmp.date_added
FROM (
SELECT
customer_id, date_added,
IF (#prev <> customer_id, #rownum := 1, #rownum := #rownum+1 ) rank,
#prev := customer_id
FROM orderTable t
JOIN (SELECT #rownum := NULL, #prev := 0) r
ORDER BY t.customer_id
) tmp
WHERE tmp.rank <= 2
ORDER BY customer_id, date_added
Results:
| CUSTOMER_ID | DATE_ADDED |
|-------------|----------------------------------|
| 0487 | January, 23 2002 00:00:00+0000 |
| 0487 | June, 23 2004 00:00:00+0000 |
| 1234 | May, 09 1992 00:00:00+0000 |
| 1234 | January, 22 1997 00:00:00+0000 |
| 3333 | January, 22 2009 00:00:00+0000 |
| 3333 | September, 02 2013 00:00:00+0000 |
| 5799 | January, 23 2007 00:00:00+0000 |
Fiddle here.
Note that the join is just being used to initialise the variables.
Your original query should be (use customer_id in subquery)
SELECT customer_id, order_id, date_added
FROM orderTable T1
WHERE (
select count(*) FROM orderTable T2
where T2.customer_id = T1.customer_id AND T2.date_added <= T1.date_added
) <= 2;
You can also use variables:
SELECT customer_id, order_id, date_added FROM (
SELECT customer_id, order_id, date_added,
#rownum := if(#prev_cust = customer_id, #rownum + 1,1) as rn,
#prev_cust := customer_id cust_var
FROM orderTable T1,
(SELECT #rownum := 0) r,
(SELECT #prev_cust := '') c
order by customer_id, date_added
) o where o.rn < 3;
SQL DEMO
Here's another (deliberately incomplete) method, though others may have a point about performance...
SELECT x.*
, COUNT(*) rank
FROM ordertable x
JOIN ordertable y
ON y.customer_id = x.customer_id
AND y.date_added <= x.date_added
GROUP
BY x.customer_id
, x.date_added;
This should produce the results you're after, but the outer SELECT won't be the most efficient as it's filtering on a derived table.
SELECT ranked.*
FROM (
SELECT ot.* ,
#rownum := IF( ot.customer_id = #previous , #rownum +1, 1 ) rank,
#previous := ot.customer_id
FROM orderTable ot,
(SELECT #rownum :=1, #previous := NULL) init
ORDER BY customer_id, date_added
) ranked
WHERE rank <=2