Mysql Join with limit? - mysql

I have a table with category, product and count. All integers.
I'm looking for the most efficient query that will give me the top 10 products (highest count) for each category.
I've tried several subselects and joins but couldn't figure out how to do it in a single query. Thanks for your help.

select a.* from t a where 10 > (
select count(*) from t b
where b.category=a.category
and b.count<a.count
)
I think this is what you need.

A slightly modified query from this article in my blog:
Advanced row sampling
SELECT l.*
FROM (
SELECT category,
COALESCE(
(
SELECT count
FROM mytable li
WHERE li.category = dlo.category
ORDER BY
li.category DESC, li.count DESC, li.id DESC
LIMIT 9, 1
), CAST(-1 AS DECIMAL)) AS mcount
COALESCE(
(
SELECT id
FROM mytable li
WHERE li.category = dlo.category
ORDER BY
li.category DESC, li.count DESC, li.id DESC
LIMIT 9, 1
), CAST(-1 AS DECIMAL)) AS mid
FROM (
SELECT DISTINCT category
FROM mytable dl
) dlo
) lo, mytable l
WHERE l.category >= lo.category
AND l.category <= lo.category
AND (l.count, l.id) >= (lo.mcout, lo.id)
You need to create a composite index on (category, count, id) for this to work efficiently.
Note the usage of l.category >= lo.category AND l.category <= lo.category instead of mere: l.category = lo.category
This is a hack to make MySQL use efficient Range check for each record

This article addresses your problem I think.
Basically, it says that if your table is small, you can do a self inequality join, like this:
SELECT t1.*, COUNT(*) AS countRank
FROM tbl AS t1
JOIN tbl AS t2 ON t1.category=t2.category AND t1.count <= t2.count
GROUP BY t1.category, t1.count
HAVING countRank <= 10
ORDER BY category,count DESC;
It's an expensive operation, but for a small table you should be fine. If you have a large table, you should forget about doing it with one query and implement a different approach to the solution.

SET #row = 0;
SET #category = 0;
 
SELECT top.*
FROM (
  SELECT IF(#category = p.cId, #row := #row + 1, #row := 1) rowNumber,
    (#category := p.cId) categoryId,
    p.pId
  FROM (
    SELECT c.cId,
      c.pId
    FROM prod pr
      INNER JOIN cat_prod c ON c.pId = pr.id
    GROUP BY c.cId, c.pId
    ) p
  ) top
HAVING top.rowNumber < 4;

select a.* from `table` a where a.product in (
select b.product from `table` b
where b.category=a.category
order by b.count desc
limit 10
)
I think this is a good way, but mysql returns:
MySQL 返回:文档
#1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

Related

Converting query from MySQL V8.0 to V5.6

I developed a system based in MySQL V8.0 and, unfortunately, I need downgrade for the most used version in web hostings, that is V5.6.
My queries are almost all like this:
SELECT tb_d.*, id_classification
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY '1' ORDER BY od_category ASC, $order_by_gp_mid) AS row_final, tb_c.*
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY id_category ORDER BY MOD(`row` * $prime_group, 512)) AS rd_category, tb_b.*
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY '1' ORDER BY od_category ASC, $order_by_gp_mid) AS `row`, tb_a.*
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY id_category ORDER BY od_category ASC, tb_x.$order_by_gp_mid) AS rn_category, tb_x.*
FROM (
SELECT
1, od_category, id_category, `group`, category,
id_post, thumbnail, head, visibility, created_datetime, `view`, yeslike, save,
lk_post_category_pic_uvw.fk_pic,
ROUND((yeslike + total_save) / view, 3) AS rank_ratio
FROM tb_category
JOIN (
SELECT fk_post, fk_category, fk_pic
FROM lk_post_category_pic
UNION
SELECT fk_post, `from`, fk_pic
FROM lk_post_model_pic
JOIN tb_model ON tb_model.id_model = lk_post_model_pic.fk_model
WHERE `from` != ''
) AS lk_post_category_pic_uvw ON lk_post_category_pic_uvw.fk_category = tb_category.id_category
JOIN tb_post ON lk_post_category_pic_uvw.fk_post = tb_post.id_post
LEFT JOIN vw_total_save_by_post ON vw_total_save_by_post.fk_post = tb_post.id_post
WHERE visibility = 'show'
) AS tb_x
) AS tb_a
WHERE rn_category <= 35
) AS tb_b
) AS tb_c
WHERE rd_category = 1
) AS tb_d
JOIN lk_post_classification_pic ON tb_d.id_post = lk_post_classification_pic.fk_post
JOIN tb_classification ON lk_post_classification_pic.fk_classification = tb_classification.id_classification
Being that the variable $order_by_gp_mid can assuming these values:
$order_by_gp_mid = 'created_datetime DESC';
$order_by_gp_mid = 'rank_ratio DESC';
$order_by_gp_mid = '`view` DESC';
$order_by_gp_mid = 'created_datetime DESC';
$order_by_gp_mid = 'created_datetime ASC';
So, the biggest problem here are the lines ROW_NUMBER() OVER(PARTITION BY blabla...).
I need rewrite this queries, and so many others too, for MySQL 5.6.
I already read many topics about here in stackoverflow but, I don't how to do it because my queries are very complexs.
So, I don't know what to do anymore, I'm absolutely exhausted of computer programming.

MySQL join on substring is slow

I have a query where I do a join on a substring, the problem is that this is really slow to complete. Is there a more effecient way to write this?
SELECT *, SUM(s.pris*s.antall) AS total, SUM(s.antall) AS antall
FROM ecs_statistikk AS s
JOIN butikk_ordre AS bo ON ordreId=bo.ecs_ordre_id AND butikkNr=bo.site_id
JOIN ecs_supplier AS l ON SUBSTRING( s.artikkelId, 1,2 )=l.lev_id
WHERE s.salgsDato>='2016-6-01' AND s.salgsDato<='2016-09-30'
GROUP BY l.lev_id ORDER BY total DESC
First, I would check indexes. For this query:
SELECT *, SUM(s.pris*s.antall) AS total, SUM(s.antall) AS antall
FROM ecs_statistikk s JOIN
butikk_ordre bo
ON s.ordreId = bo.ecs_ordre_id AND
s.butikkNr = bo.site_id JOIN
ecs_supplier l
ON SUBSTRING(s.artikkelId, 1, 2 ) = l.lev_id
WHERE s.salgsDato >= '2016-06-01' AND s.salgsDato <= '2016-09-30'
GROUP BY l.lev_id
ORDER BY total DESC ;
You want indexes on ecs_statistikk(salgsDato, ordreId, butikkNr, artikkelId), butikk_ordre(ecs_ordre_id, site_id), and ecs_supplier(lev_id)`.
Next, I would question whether you need the last JOIN at all. Does this do what you want?
SELECT LEFT(s.artikkelId, 2) as lev_id, *,
SUM(s.pris*s.antall) AS total, SUM(s.antall) AS antall
FROM ecs_statistikk s JOIN
butikk_ordre bo
ON s.ordreId = bo.ecs_ordre_id AND
s.butikkNr = bo.site_id
WHERE s.salgsDato >= '2016-06-01' AND s.salgsDato <= '2016-09-30'
GROUP BY LEFT(s.artikkelId, 2)
ORDER BY total DESC ;

SQL: How to get cells by 2 last dates from 3 different tables?

I have 3 tables (stars mach the ids from the table before):
product:
prod_id* prod_name prod_a_id prod_b_id prod_user
keywords:
key_id** key_word key_prod* kay_country
data:
id dat_id** dat_date dat_rank_a dat_traffic_a dat_rank_b dat_traffic_b
I want to run a query (in a function that gets a $key_id) that outputs all these columns but only for the last 2 dates(dat_date) from the 'data' table for the key_id inserted - so that for every key_word - I have the two last dat_dates + all the other variables included in my SQL query:
So... This is what I have so far. and I don't know how to get only the MAX vars. I tried using "max(dat_date)" in different ways that didn't work.
SELECT prod_id, prod_name, prod_a_id, prod_b_id, key_id, key_word, kay_country, dat_date, dat_rank_a, dat_rank_b, dat_traffic_a, dat_traffic_b
FROM keywords
INNER JOIN data
ON keywords.key_id = data.dat_id
INNER JOIN prods
ON keywords.key_prod = prods.prod_id
Is there a possability to do this with only one query?
EDIT (FOR IgorM):
public function newnew() {
$query = $this->db->query('WITH CTE AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY dat_id ORDER BY dat_date ASC) AS
RowNo FROM data
)
SELECT *
FROM CTE
INNER JOIN keywords
ON keywords.key_id = CTE.dat_id
INNER JOIN prods
ON keywords.key_prod = prods.prod_id
WHERE RowNo < 3
');
$result = $query->result();
return $result;
}
This is the error on the output:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'CTE AS ( SELECT *, ROW_NUMBER() OVER (' at line 1
WITH CTE AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY dat_id ORDER BY dat_date ASC) AS RowNo FROM data ) SELECT * FROM CTE INNER JOIN keywords ON keywords.key_id = CTE.dat_id INNER JOIN prods ON keywords.key_prod = prods.prod_id WHERE RowNo < 3
For SQL
WITH CTE AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY dat_id ORDER BY dat_date ASC) AS
RowNo FROM data
)
SELECT *
FROM CTE
INNER JOIN keywords
ON keywords.key_id = CTE.dat_id
INNER JOIN prods
ON keywords.key_prod = prods.prod_id
WHERE RowNo < 3
For MySQL (not tested)
SET #row_number:=0;
SET #dat_id = '';
SELECT *,
#row_number:=CASE WHEN #dat_id=dat_id THEN #row_number+1 ELSE 1 END AS row_number,
#dat_id:=dat_id AS dat_id_row_count
FROM data d
INNER JOIN keywords
ON keywords.key_id = d.dat_id
INNER JOIN prods
ON keywords.key_prod = prods.prod_id
WHERE d.row_number < 3
The other approach is self joining. I don't want to take credit for somebody else's job, so please look on the following example:
ROW_NUMBER() in MySQL
Look for the following there:
SELECT a.i, a.j, (
SELECT count(*) from test b where a.j >= b.j AND a.i = b.i
) AS row_number FROM test a
If you only want to do this for one key_id at a time (as alluded to in your responses to other answers) and only want two rows, you can just do:
SELECT p.prod_id,
p.prod_name,
p.prod_a_id,
p.prod_b_id,
k.key_id,
k.key_word,
k.key_country,
d.dat_date,
d.dat_rank_a,
d.dat_rank_b,
d.dat_traffic_a,
d.dat_traffic_b
FROM keywords k
JOIN data d
ON k.key_id = d.dat_id
JOIN prods p
ON k.key_prod = p.prod_id
WHERE k.key_id = :key_id /* Bind in key id */
ORDER BY d.dat_date DESC
LIMIT 2;
Whether you want this depends on your data structure and whether there is more than one key/prod combination per date.
Another option limiting just the data rows would be:
SELECT p.prod_id,
p.prod_name,
p.prod_a_id,
p.prod_b_id,
k.key_id,
k.key_word,
k.key_country,
d.dat_date,
d.dat_rank_a,
d.dat_rank_b,
d.dat_traffic_a,
d.dat_traffic_b
FROM keywords k
JOIN (
SELECT dat_id,
dat_date,
dat_rank_a,
dat_rank_b,
dat_traffic_a,
dat_traffic_b
FROM data
WHERE dat_id = :key_id /* Bind in key id */
ORDER BY dat_date DESC
LIMIT 2
) d
ON k.key_id = d.dat_id
JOIN prods p
ON k.key_prod = p.prod_id;
If you want some kind of grouped results for all the keywords, you'll need to look at the other answers.
I think a window function is the best way to go. without knowing a lot about the structure of the data you can try a subquery of what you are trying to restrict and then joining that to the rest of the data. Then within the where clause restrict the rows you pull back.
select p.prod_id, p.prod_name, p.prod_a_id, p.prod_b_id,
t.key_id, t.key_word, t.kay_country, t.dat_date,
t.dat_rank_a, t.dat_rank_b, t.dat_traffic_a, t.dat_traffic_b
from
(
select
k.key_id, k.key_word, k.kay_country, d.dat_date, d.dat_rank_a,
d.dat_rank_b, d.dat_traffic_a, d.dat_traffic_b,
row_number() over (partition by dat_id order by dat_date desc) as 'RowNum'
from keywords as k
inner join
data as d on k.key_id = d.dat_id
) as t
inner join
prods as p on t.key_prod = p.prod_id
where tmp.RowNum <=2
This is a "groupwise max" problem. Reference. CTE does not exist in MySQL.
I'm not totally clear on how your tables are linked, but here is a stab:
SELECT
*
FROM
( SELECT #prev := '', #n := 0 ) init
JOIN
( SELECT #n := if(k.key_id != #prev, 1, #n + 1) AS n,
#prev := k.key_id,
d.*, k.*, p.*
FROM data d
JOIN keywords k ON k.key_id = d.dat_id
JOIN prods p ON k.key_prod = p.prod_id
ORDER BY
k.key_id ASC,
d.dat_date ASC
) x
WHERE n <= 2
ORDER BY k.key_id, n;
you can use this query:
select prod_id, prod_name, prod_a_id, prod_b_id, key_id, key_word,
kay_country, dat_date, dat_rank_a, dat_rank_b, dat_traffic_a, dat_traffic_b
from keywords where dat_date in (
SELECT MAX(dat_date) FROM keywords temp_1
where temp_1.prod_id = keywords.prod_id
union all
SELECT MAX(dat_date) FROM keywords
WHERE dat_date NOT IN (SELECT MAX(dat_date ) FROM keywords temp_2 where
temp_2.prod_id = keywords.prod_id)
)

Way to reduce execution time of this query in mysql

Some time ago I needed a little help here to build a custom query. And this query worked fine till now.
 
When I run the query (in a procedure) I get the error:
Error Code: 2013. Lost connection to MySQL server during query
My access to my.ini via ssh is read only (because my db is in a shared host "godaddy") so I can't increase the execution time (actual is 60)
Is there one way to optimize this query to make it more fast?
The query is:
SELECT #curRank := #curRank + 1 as rank, p.nick,(kills + ((p.vpos - p.vneg)*5) + (top * 5) - deaths) as score
FROM (SELECT
(SELECT uuid FROM players WHERE players.uuid = p.uuid LIMIT 1) as uuid,
(SELECT nick FROM nicks n WHERE n.pid = p.id ORDER BY id DESC LIMIT 1) as nick,
(SELECT COUNT(*) FROM kills k WHERE k.pid = p.id ) as kills,
(SELECT COUNT(*) FROM deaths d WHERE d.pid = p.id ) as deaths,
(SELECT COUNT(*) FROM headshots h WHERE h.pid = p.id ) as hs,
(SELECT COUNT(*) FROM votos vp WHERE vp.vid = p.id AND tipo="p") as vpos,
(SELECT COUNT(*) FROM votos vn WHERE vn.vid = p.id AND tipo="n") as vneg,
(SELECT COUNT(*) FROM top_rounds t WHERE t.pid = p.id ) as top,
(SELECT #curRank := 0) as rank
FROM players p
) p ORDER BY score DESC LIMIT 30;
Note: all pid's and p.id's already are indexes
Untested (due to lack of sample data):
SELECT p.nick,
(IFNULL(k.cnt, 0)
+ ((IFNULL(vpos.cnt, 0) - IFNULL(vneg.cnt, 0))*5)
+ (IFNULL(t.cnt, 0) * 5) - IFNULL(d.cnt, 0) AS score
FROM players p
LEFT JOIN (
SELECT pid, COUNT(*) AS cnt
FROM kills
GROUP BY pid
) AS k ON p.id = k.pid
⋮
LEFT JOIN (
SELECT pid, COUNT(*) AS cnt
FROM top_rounds
GROUP BY pid
) AS t ON p.id = t.pid
ORDER BY score DESC
LIMIT 30
i.e. make sure each inner query runs once only for all the players. Each subquery results in a table which maps player id to corresponding count. Since there might be zero matching rows, we have to use LEFT JOIN and translate NULL into 0 using IFNULL(foo.cnt, 0).
If you need to index rows, you can add an extra outer query for that alone, but personally I'd prefer to handle that outside SQL in the application which processes the query result.

SQl Server 2008 Performance Issue for Count(distinct()) and SUM. How can avoid this issue?

The below one is my query. It's taking 12 seconds for process. I have created the index for T.DataViewId, but it's still taking long time due to Count(distinct()) and Sum. Thanks in Advance.
;WITH my_cte
AS (SELECT T.name AS name,
T.id AS id,
Count(DISTINCT( DD.dynamictableid )) AS counts,
Round(Sum(D.[employees]), 0) AS measure1
FROM dbo.treehierarchy T
LEFT JOIN dbo.dynamicdatatableid DD
ON T.id = DD.hierarchyid
AND T.dataviewid = DD.dataviewid
LEFT JOIN dbo.demo1 D
ON D.[demo1id] = DD.dynamictableid
WHERE T.dataviewid = 2
AND T.parentid = 0
GROUP BY T.id,
T.name)
SELECT name, id, counts, row_num, measure1
FROM (SELECT name,
id,
counts,
Row_number()
OVER(
ORDER BY counts DESC) AS row_num,
measure1
FROM my_cte) innertable
WHERE ( row_num BETWEEN 1 AND 15 )
It looks as if you only need top 15 records of descending counts. It could be done simply like this :
SELECT
TOP 15 T.name AS name,
T.id AS id,
Count(DISTINCT( DD.dynamictableid )) AS counts,
Round(Sum(D.[employees]), 0) AS measure1
FROM
dbo.treehierarchy T
LEFT JOIN
dbo.dynamicdatatableid DD
ON
T.id = DD.hierarchyid
AND
T.dataviewid = DD.dataviewid
LEFT JOIN
dbo.demo1 D
ON
D.[demo1id] = DD.dynamictableid
WHERE
T.dataviewid = 2
AND
T.parentid = 0
GROUP BY
T.id,T.name
ORDER BY
3 DESC