group by slows down the query

group by slows down the query - mysql

I have on the products table the following index: (product_t,productid,forsale). The MySQL manual says:
The GROUP BY names only columns that form a leftmost prefix of the index and no other columns.
http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html
When I do the following query
SELECT z.product_t, COUNT(z.productid)
FROM xcart_products z
JOIN xcart_products_lng w ON z.productid = w.productid
AND w.code = 'US'
WHERE z.forsale = 'Y'
group by z.product_t
And therefore using the left most index field (product_t), the execution time is still massive:
+-----------+--------------------+
| product_t | COUNT(z.productid) |
+-----------+--------------------+
| B | 4 |
| C | 10521 |
| D | 1 |
| F | 16 |
| G | 363 |
| J | 16 |
| L | 749 |
| M | 22 |
| O | 279 |
| P | 5304 |
| S | 22 |
| W | 662 |
+-----------+--------------------+
12 rows in set (0.81 sec)
When I use the whole index (product_t,productid,forsale), the execution time is blazing fast (0.005 seconds). How should I change it to make the query go faster?
I think the query somehow could be improved through the use of a semi join... However i'm not sure how...

The slow down might not be related to the GROUP BY clause. Try adding an index for w.code and z.forsale individually.
MySQL Profiling might also help you in your endeavour

The most obvious answer would be to create a new index for product_t only. Cany you create new indexes?

Thankfully I found a way how to increase the speed. I did have another table where all the product_t's are defined. Each product_t there also uses the 'code' field (to specify the translation of each product_t). In this table I have an index defined on product_t and code as well. By changing the query to:
SELECT z.product_t, COUNT(z.productid)
FROM xcart_products z
JOIN xcart_products_lng w ON z.productid = w.productid
JOIN xcart_products_product_t_lng p ON z.product_t = p.product_t
AND p.code = 'US'
WHERE z.forsale = 'Y'
group by p.product_t,p.code
I managed to increase the speed to 0.10 seconds. The reason why it's faster is because the grouping by is now done on a whole index. Secondly the where code='US' is replaced from the big products_lng table, to the small product_t_lng table. I think this is the most efficient the query can be...
mysql> SELECT z.product_t, COUNT( z.productid )
-> FROM xcart_products z
-> JOIN xcart_products_lng w ON z.productid = w.productid
-> JOIN xcart_products_product_t_lng p ON z.product_t = p.product_t
-> AND p.code = 'US'
-> WHERE z.forsale = 'Y'
-> GROUP BY p.product_t, p.code
-> ;
+-----------+----------------------+
| product_t | COUNT( z.productid ) |
+-----------+----------------------+
| B | 4 |
| C | 10521 |
| F | 16 |
| G | 363 |
| L | 749 |
| M | 22 |
| O | 279 |
| P | 5304 |
| S | 22 |
| W | 662 |
+-----------+----------------------+
10 rows in set (0.14 sec)

Related

optimizing short text comparisons

i have 2 tables qs and local.
qs has 2 columns (actually built from several other columns) that are part of the comparison i need to do:
f1 | t1
abcdaa | abcdbb
local just has one column that's part of the comparison:
rangeA
abcd
I am trying to find the entries in qs that do not have a matching substring in local
I've tried this in about a dozen different ways, and i must be missing something , since it's taking an unusual amount of time.
here is the fastest method I've found so far:
CREATE TEMPORARY TABLE `tempB` SELECT f1, t1,
LEFT(f1,2) AS l2,LEFT(f1,3) AS l3,LEFT(f1,4) AS l4,LEFT(f1,5) AS l5,LEFT(f1,6) AS l6,LEFT(f1,7) AS l7,LEFT(f1,8) AS l8,
LEFT(f1,9) AS l9,LEFT(f1,10) AS l10,LEFT(f1,11) AS l11,LEFT(f1,12) AS l12,LEFT(f1,13) AS l13,
LEFT(t1,2) AS lt2,LEFT(t1,3) AS lt3,LEFT(t1,4) AS lt4,LEFT(t1,5) AS lt5,LEFT(t1,6) AS lt6,LEFT(t1,7) AS lt7,LEFT(t1,8) AS lt8,
LEFT(t1,9) AS lt9,LEFT(t1,10) AS lt10,LEFT(t1,11) AS lt11,LEFT(t1,12) AS lt12,LEFT(t1,13) AS lt13 FROM
(SELECT CONCAT(c1,n1,s1) AS f1, CONCAT(c1,n1,s2) AS t1 FROM qs WHERE c1 ='a')tab0 ORDER BY f1 ASC;
CREATE TEMPORARY TABLE `tempB2` SELECT rangeA FROM local WHERE rangeA LIKE 'a%' ORDER BY rangeA ASC;
CREATE TEMPORARY TABLE `tempB3` SELECT rangeA AS rangeAA FROM local WHERE rangeA LIKE 'a%' ORDER BY rangeA ASC;
SELECT f1,t1, rangeA, rangeAA FROM tempB
LEFT JOIN tempB2 ON rangeA IN(l2,l3,l4,l5,l6,l7,l8,l9,l10,l11,l12,l13)
LEFT JOIN tempB3 ON rangeAA IN(lt2,lt3,lt4,lt5,lt6,lt7,lt8,lt9,lt10,lt11,lt12,lt13)
WHERE rangeA IS NULL OR rangeAA IS NULL
creating the temp tables is fast and starting with one character at a time (in this case 'a') significantly reduces the size of the datasets, but this is still very very slow even with only a few hundred thousand rows in each temp table.
I've tried using just f1 and t1 with a
ON f1 LIKE CONCAT (rangeA,'%')
but that seemed to be even slower.
Any other ideas?
Note that rangeA is at least 2 characters long and at most 13 characters long. hence the LEFTs.
example data:
qs :
c1 | n1 | s1 | s2
ab | cd | aa | bb
bb | bbb | bb | bc
cbc | cc | cdd | ddd
ddd | e | ddf | def
local :
rangeA
abcd
bdddd
cbcccdd
dddedd
expected result:
f1 | t1 | f1match | t1match
bbbbbbb | bbbbbbc | NULL | NULL
cbccccdd | cbcccddd | NULL | cbcccdd
dddeddf | dddedef | dddedd | NULL

Thank you Paul Spiegel for making this work.
Let's set up some test data.
mysql> select * from qs;
+----+---------------+-------------------+
| id | f1 | t1 |
+----+---------------+-------------------+
| 6 | match1 | no match |
| 7 | match1 | match2 |
| 8 | foo match1 | match1 bar |
| 9 | no match | abc match2 123 |
| 10 | no match | no match |
| 11 | also no match | again not a match |
+----+---------------+-------------------+
mysql> select * from local;
+--------+
| rangeA |
+--------+
| match1 |
| match2 |
+--------+
And we expect only those rows which neither f1 nor t1 match any row in local.
+----+---------------+-------------------+
| id | f1 | t1 |
+----+---------------+-------------------+
| 10 | no match | no match |
| 11 | also no match | again not a match |
+----+---------------+-------------------+
UPDATE: Indexing qs(f1,t1) and local(rangeA) will help performance.
create index index_qs_fields on qs(f1,t1);
create index index_local_rangeA on local(rangeA);
instr finds a substring in a string, that simplifies many things.
We can do this with a left excluding join. That is to get only the rows on the left side (qs) which have no match on the right (local).
We do a normal left join to check for matches.
select qs.*, rangeA
from qs
left join local on
instr(f1,rangeA) or
instr(t1,rangeA)
+----+---------------+-------------------+--------+
| id | f1 | t1 | rangeA |
+----+---------------+-------------------+--------+
| 1 | match1 | no match | match1 |
| 2 | match1 | match2 | match1 |
| 3 | foo match1 | match1 bar | match1 |
| 2 | match1 | match2 | match2 |
| 4 | no match | abc match2 123 | match2 |
| 5 | no match | no match | NULL |
| 6 | also no match | again not a match | NULL |
+----+---------------+-------------------+--------+
And turn it into an excluding join by filtering for only those which don't match at all.
select qs.*, rangeA
from qs
left join local on
instr(f1,rangeA) or
instr(t1,rangeA)
where rangeA is null
+----+---------------+-------------------+
| id | f1 | t1 |
+----+---------------+-------------------+
| 5 | no match | no match |
| 6 | also no match | again not a match |
+----+---------------+-------------------+
dbfiddle
UPDATE: Lots of entries in local can make this slow. We can try optimizing it by joining all the matches together into one regular expression. This might be faster.
We can construct our regex using group_concating all the matches together as a single regex.
select group_concat(rangeA separator '|')
into #range_re
from local;
select qs.*
from qs
where not f1 regexp(#range_re) and not t1 regexp(#range_re);
Note that you'll need to be careful to escape regex characters in your matches.
Original way too complicated answer follows.
That tells us which entries in qs don't match entries in local.
select qs.id, f1, t1, rangeA
from qs
left join local on 1=1
where instr(f1,rangeA) = 0 and instr(t1,rangeA) = 0;
+----+---------------+-------------------+--------+
| id | f1 | t1 | rangeA |
+----+---------------+-------------------+--------+
| 6 | match1 | no match | match2 |
| 8 | foo match1 | match1 bar | match2 |
| 9 | no match | abc match2 123 | match1 |
| 10 | no match | no match | match1 |
| 10 | no match | no match | match2 |
| 11 | also no match | again not a match | match1 |
| 11 | also no match | again not a match | match2 |
+----+---------------+-------------------+--------+
But we want those which don't match all of local. We can do that by counting up how many times a row appears in our list of not matches.
select qs.id, f1, t1, count(id)
from qs
left join local on 1=1
where instr(f1,rangeA) = 0
and instr(t1,rangeA) = 0
group by qs.id;
+----+---------------+-------------------+-----------+
| id | f1 | t1 | count(id) |
+----+---------------+-------------------+-----------+
| 6 | match1 | no match | 1 |
| 8 | foo match1 | match1 bar | 1 |
| 9 | no match | abc match2 123 | 1 |
| 10 | no match | no match | 2 |
| 11 | also no match | again not a match | 2 |
+----+---------------+-------------------+-----------+
And then select only those whose count is the same as the number of matches.
mysql> select qs.id, f1, t1
from qs
left join local on 1=1
where instr(f1,rangeA) = 0
and instr(t1,rangeA) = 0
group by qs.id
having count(id) = (select count(*) from local);
+----+---------------+-------------------+
| id | f1 | t1 |
+----+---------------+-------------------+
| 10 | no match | no match |
| 11 | also no match | again not a match |
+----+---------------+-------------------+
dbfiddle

here's what i have so far, which works pretty well for <50k rows. Thank you to Schwern for the helpful discussion about INSTR().
CREATE TEMPORARY TABLE `tempB` SELECT f1, t1 FROM
(SELECT LEFT(CONCAT(c1,n1,s1),17) AS f1, LEFT(CONCAT(c1,n1,s2),17) AS t1 FROM qs WHERE c1 ='a')tab0 ORDER BY f1 ASC;
CREATE TEMPORARY TABLE `tempB2` SELECT rangeA FROM local WHERE rangeA LIKE 'a%' ORDER BY rangeA ASC;
CREATE TEMPORARY TABLE `tempB3` SELECT rangeA AS rangeAA FROM local WHERE rangeA LIKE 'a%' ORDER BY rangeA ASC;
SELECT f1,t1, rangeA, rangeAA FROM tempB
LEFT JOIN tempB2 ON INSTR(f1,rangeA) =1
LEFT JOIN tempB3 ON INSTR(t1,rangeAA) =1
WHERE rangeA IS NULL OR rangeAA IS NULL

If I correctly understand your question, I think you should look into using LOCATE() or POSITION(). I don't really get the need to using all those LEFT().
A overly simplified version of what I think you want is this:
CREATE TEMPORARY TABLE `tempB`
SELECT CONCAT(c1,n1,s1) AS f1, CONCAT(c1,n1,s2) AS t1 FROM qs ORDER BY f1 ASC;
CREATE TEMPORARY TABLE `tempB2` SELECT rangeA FROM local ;
SELECT tempB.f1, tempB.t1
from tempB
WHERE (SELECT COUNT(*) from tempB2
WHERE POSITION(rangeA IN tempB.f1) != 0 AND POSITION(rangeA IN tempB.t1) != 0) = 0;

MySQL Query performance improvement for order by before group by

below is a query I use to get the latest record per serverID unfortunately this query does take endless to process. According to the stackoverflow question below it should be a very fast solution. Is there any way to speed up this query or do I have to split it up? (first get all serverIDs than get the last record for each server)
Retrieving the last record in each group
SELECT s1.performance, s1.playersOnline, s1.serverID, s.name, m.modpack, m.color
FROM stats_server s1
LEFT JOIN stats_server s2
ON (s1.serverID = s2.serverID AND s1.id < s2.id)
INNER JOIN server s
ON s1.serverID=s.id
INNER JOIN modpack m
ON s.modpack=m.id
WHERE s2.id IS NULL
ORDER BY m.id
15 rows in set (34.73 sec)
Explain:
+------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------+
1 row in set, 1 warning (0.00 sec)
Sample Output:
+-------------+---------------+----------+---------------+-------------------------+--------+
| performance | playersOnline | serverID | name | modpack | color |
+-------------+---------------+----------+---------------+-------------------------+--------+
| 99 | 18 | 15 | hub | Lobby | AAAAAA |
| 98 | 12 | 10 | horizons | Horizons | AA00AA |
| 97 | 6 | 11 | m_lobby | Monster | AA0000 |
| 99 | 1 | 12 | m_north | Monster | AA0000 |
| 86 | 10 | 13 | m_south | Monster | AA0000 |
| 87 | 17 | 14 | m_east | Monster | AA0000 |
| 98 | 10 | 16 | m_west | Monster | AA0000 |
| 84 | 7 | 5 | tppi | Test Pack Please Ignore | 55FFFF |
| 95 | 15 | 6 | agrarian_plus | Agrarian Skies | 00AA00 |
| 98 | 23 | 7 | agrarian2 | Agrarian Skies | 00AA00 |
| 74 | 18 | 9 | agrarian | Agrarian Skies | 00AA00 |
| 97 | 37 | 17 | agrarian3 | Agrarian Skies | 00AA00 |
| 99 | 17 | 3 | bteam_pvp | Attack of the B-Team | FFAA00 |
| 73 | 44 | 8 | bteam_pve | Attack of the B-Team | FFAA00 |
| 93 | 11 | 4 | crackpack | Crackpack | EFEFEF |
+-------------+---------------+----------+---------------+-------------------------+--------+
15 rows in set (38.49 sec)
Sample Data:
http://www.mediafire.com/download/n0blj1io0c503ig/mym_bridge.sql.bz2

Edit
Ok I solved it. Here is expanded rows showing your original slow query:
And here is a fast query using MAX() with GROUP BY that gives the identical results. Please try it for yourself.
SELECT s1.id
,s1.performance
,s1.playersOnline
,s1.serverID
,s.name
,m.modpack
,m.color
FROM stats_server s1
JOIN (
SELECT MAX(id) as 'id'
FROM stats_server
GROUP BY serverID
) AS s2
ON s1.id = s2.id
JOIN server s
ON s1.serverID = s.id
JOIN modpack m
ON s.modpack = m.id
ORDER BY m.id

I would phrase this query using not exists:
SELECT ss.performance, ss.playersOnline, ss.serverID, s.name, m.modpack, m.color
FROM stats_server ss INNER JOIN
server s
ON ss.serverID = s.id INNER JOIN
modpack m
ON s.modpack = m.id
WHERE NOT EXISTS (select 1
from stats_server ss2
where ss2.serverID = ss.serverID AND ss2.id > ss.id
)
Apart from the primary key indexes on server and modpack (which I assume are there), you also want an index on stats_server(ServerId, id). This index should also help your version of the query.

Am I missing something? Why wouldn't a standard uncorrelated subquery work?
SELECT x.id, x.performance, x.playersOnline, s.name, m.modpack, m.color, x.timestamp
FROM stats_server x
JOIN
( SELECT serverid, MAX(id) maxid FROM stats_server GROUP BY serverid ) y
ON y.serverid = x.serverid AND y.maxid = x.id
JOIN server s
ON x.serverID=s.id
JOIN modpack m
ON s.modpack=m.id

I'm guessing that you really want this (notice the order of the joins and the join criteria), and this matches the indexes that you've created:
SELECT s1.performance, s1.playersOnline, s1.serverID, s.name, m.modpack, m.color
FROM server s
INNER JOIN stats_server s1
ON s1.serverID = s.id
LEFT JOIN stats_server s2
ON s2.serverID = s.id AND s2.id > s1.id
INNER JOIN modpack m
ON m.id = s.modpack
WHERE s2.id IS NULL
ORDER BY m.id
MySQL doesn't always inner join the tables in the order that you write them in the query since the order doesn't really matter for the result set (though it can affect index use).
With no usable index specified in the WHERE clause, MySQL might want to start with the table with the least number of rows (maybe stats_server in this case). With the ORDER BY clause, MySQL might want to start with modpack so it doesn't have to order the results later.
MySQL picks the execution plan then sees if it has the proper index for joining rather than seeing what indexes it has to join on then picking the execution plan. MySQL doesn't just automatically pick the plan that matches your indexes.
STRAIGHT_JOIN tells MySQL in what order to join the tables so that it uses the indexes that you expect it to use:
SELECT s1.performance, s1.playersOnline, s1.serverID, s.name, m.modpack, m.color
FROM server s
STRAIGHT_JOIN stats_server s1
ON s1.serverID = s.id
LEFT JOIN stats_server s2
ON s2.serverID = s.id AND s2.id > s1.id
STRAIGHT_JOIN modpack m
ON m.id = s.modpack
WHERE s2.id IS NULL
ORDER BY m.id
I don't know what indexes you've defined since you've not provided an EXPLAIN result or shown your indexes, but this should give you some idea on how to improve the situation.

How can I filter a SQL query using the MAX of a column without subqueries?

I have a query that works something like this (MySQL):
Query:
SELECT a, b, c, d FROM xTable WHERE d=(
SELECT MAX(d) FROM xTable WHERE uid=1
) AND c=0
Sample:
--xTable--
| a | b | c | d | uid |
| 30 | str | 20 | 32 | 1 |
| 36 | abc | 0 | 32 | 1 |
| 20 | ... | 40 | 33 | 1 |
| 19 | dsi | 0 | 34 | 1 |
| 68 |aeasd| 0 | 34 | 1 |
| 112 |3eids| 224 | 34 | 1 |
I want only the rows with the biggest d (34) and uid=X, but only the ones with c=0.
Is there a way to replicate this query without subqueries?

It can be rewritten into a JOIN but I don't see why that would improve anything (although with MySQL's limited query optimizer you never know...)
SELECT t1.a,
t1.b,
t1.c,
t1.d
FROM xTable t1
JOIN (
SELECT MAX(d) as max_d
FROM xTable
WHERE uid=1
) t2 ON t2.max_d = t1.d
WHERE t1.c=0

I dont't know if this will help you but since your worry is regarding changing database due to frequent inserts, you can try the solution below which will remove the dynamism of getting the max by selecting the max beforehand and using that as the comparison. It will get the row with the max you have at the point you choose to fire the query.
with m as (select max(d) as d from xTable ) select a, b, c, d from xTable where d = (select d from m) ;
So here, you have your max fixed for the current query.

Inexplicably slow query in MySQL

Given this result-set:
mysql> EXPLAIN SELECT c.cust_name, SUM(l.line_subtotal) FROM customer c
-> JOIN slip s ON s.cust_id = c.cust_id
-> JOIN line l ON l.slip_id = s.slip_id
-> JOIN vendor v ON v.vend_id = l.vend_id WHERE v.vend_name = 'blahblah'
-> GROUP BY c.cust_name
-> HAVING SUM(l.line_subtotal) > 49999
-> ORDER BY c.cust_name;
+----+-------------+-------+--------+---------------------------------+---------------+---------+----------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------------+---------------+---------+----------------------+------+----------------------------------------------+
| 1 | SIMPLE | v | ref | PRIMARY,idx_vend_name | idx_vend_name | 12 | const | 1 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | l | ref | idx_vend_id | idx_vend_id | 4 | csv_import.v.vend_id | 446 | |
| 1 | SIMPLE | s | eq_ref | PRIMARY,idx_cust_id,idx_slip_id | PRIMARY | 4 | csv_import.l.slip_id | 1 | |
| 1 | SIMPLE | c | eq_ref | PRIMARY,cIndex | PRIMARY | 4 | csv_import.s.cust_id | 1 | |
+----+-------------+-------+--------+---------------------------------+---------------+---------+----------------------+------+----------------------------------------------+
4 rows in set (0.04 sec)
I'm a bit baffled as to why the query referenced by this EXPLAIN statement is still taking about a minute to execute. Isn't it true that this query only has to search through 449 rows? Anyone have any idea as to what could be slowing it down so much?

I think the having sum() is the root of all evil. This forces to mysql to make two joins (from customer to slip and then to line) to get the value of the sum. After this it has to retrieve all the data to properly filter by the sum() value to get a meaningful result.
It might be optimized to the following and probably get better response times:
select c.cust_name,
grouping.line_subtotal
from customer c join
(select c.cust_id,
l.vend_id,
sum(l.line_subtotal) as line_subtotal
from slip s join line l on s.slip_id = l.slip_id
group by c.cust_id, l.vend_id) grouping
on c.cust_id = grouping.cust_id
join vendor v on v.vend_id = grouping.vend_id
where v.vend_name = 'blablah'
and grouping.line_subtotal > 499999
group by c.cust_name
order by c.cust_name;
In other words, create a sub-select that does all the necessary grouping before making the real query.

You can run your select vendor query first, and then join the results with the rest:
SELECT c.cust_name, SUM(l.line_subtotal) FROM customer c
-> JOIN slip s ON s.cust_id = c.cust_id
-> JOIN line l ON l.slip_id = s.slip_id
-> JOIN (SELECT * FROM vendor WHERE vend_name='blahblah') v ON v.vend_id = l.vend_id
-> GROUP BY c.cust_name
-> HAVING SUM(l.line_subtotal) > 49999
-> ORDER BY c.cust_name;
Also, do vend_name and/or cust_name have an index? That might be an issue here.

Fastest way to select min row with join

In this example, I have a listing of users (main_data), a pass list (pass_list) and a corresponding priority to each pass code type (pass_code). The query I am constructing is looking for a list of users and the corresponding pass code type with the lowest priority. The query below works but it just seems like there may be a faster way to construct it I am missing. SQL Fiddle: http://sqlfiddle.com/#!2/2ec8d/2/0 or see below for table details.
SELECT md.first_name, md.last_name, pl.*
FROM main_data md
JOIN pass_list pl on pl.main_data_id = md.id
AND
pl.id =
(
SELECT pl2.id
FROM pass_list pl2
JOIN pass_code pc2 on pl2.pass_code_type = pc2.type
WHERE pl2.main_data_id = md.id
ORDER BY pc2.priority
LIMIT 1
)
Results:
+------------+-----------+----+--------------+----------------+
| first_name | last_name | id | main_data_id | pass_code_type |
+------------+-----------+----+--------------+----------------+
| Bob | Smith | 1 | 1 | S |
| Mary | Vance | 8 | 2 | M |
| Margret | Cough | 5 | 3 | H |
| Mark | Johnson | 9 | 4 | H |
| Tim | Allen | 13 | 5 | M |
+------------+-----------+----+--------------+----------------+
users (main_data)
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 1 | Bob | Smith |
| 2 | Mary | Vance |
| 3 | Margret | Cough |
| 4 | Mark | Johnson |
| 5 | Tim | Allen |
+----+------------+-----------+
pass list (pass_list)
+----+--------------+----------------+
| id | main_data_id | pass_code_type |
+----+--------------+----------------+
| 1 | 1 | S |
| 3 | 2 | E |
| 4 | 2 | H |
| 5 | 3 | H |
| 7 | 4 | E |
| 8 | 2 | M |
| 9 | 4 | H |
| 10 | 4 | H |
| 11 | 5 | S |
| 12 | 3 | S |
| 13 | 5 | M |
| 14 | 1 | E |
+----+--------------+----------------+
Table which specifies priority (pass_code)
+----+------+----------+
| id | type | priority |
+----+------+----------+
| 1 | M | 1 |
| 2 | H | 2 |
| 3 | S | 3 |
| 4 | E | 4 |
+----+------+----------+

Due to mysql's unique extension to its GROUP BY, it's simple:
SELECT * FROM
(SELECT md.first_name, md.last_name, pl.*
FROM main_data md
JOIN pass_list pl on pl.main_data_id = md.id
ORDER BY pc2.priority) x
GROUP BY md.id
This returns only the first row encountered for each unique value of md.id, so by using an inner query to order the rows before applying the group by you get only the rows you want.

A version that will get the details as required, and should also work across different flavours of SQL
SELECT md.first_name, md.last_name, MinId, pl.main_data_id, pl.pass_code_type
FROM main_data md
INNER JOIN pass_list pl
ON md.id = pl.main_data_id
INNER JOIN pass_code pc
ON pl.pass_code_type = pc.type
INNER JOIN
(
SELECT pl.main_data_id, pl.pass_code_type, Sub0.MinPriority, MIN(pl.id) AS MinId
FROM pass_list pl
INNER JOIN pass_code pc
ON pl.pass_code_type = pc.type
INNER JOIN
(
SELECT main_data_id, MIN(priority) AS MinPriority
FROM pass_list a
INNER JOIN pass_code b
ON a.pass_code_type = b.type
GROUP BY main_data_id
) Sub0
ON pl.main_data_id = Sub0.main_data_id
AND pc.priority = Sub0.MinPriority
GROUP BY pl.main_data_id, pl.pass_code_type, Sub0.MinPriority
) Sub1
ON pl.main_data_id = Sub1.main_data_id
AND pl.id = Sub1.MinId
AND pc.priority = Sub1.MinPriority
ORDER BY pl.main_data_id
This does not rely on the flexibility of MySQLs GROUP BY functionality.

I'm not familiar with the special behavior of MySQL's group by, but my solution for these types of problems is to simply express as where there doesn't exist a row with a lower priority. This is standard SQL so should work on any DB.
select distinct u.id, u.first_name, u.last_name, pl.pass_code_type, pc.id, pc.priority
from main_data u
inner join pass_list pl on pl.main_data_id = u.id
inner join pass_code pc on pc.type = pl.pass_code_type
where not exists (select 1
from pass_list pl2
inner join pass_code pc2 on pc2.type = pl2.pass_code_type
where pl2.main_data_id = u.id and pc2.priority < pc.priority);
How well this performs is going to depend on having the proper indexes (assuming that main_data and pass_list are somewhat large). In this case indexes on the primary (should be automatically created) and foreign keys should be sufficient. There may be other queries that are faster, I would start by comparing this to your query.
Also, I had to add distinct because you have duplicate rows in pass_list (id 9 & 10), but if you ensure that duplicates can't exist (unique index on main_data_id, pass_code_type) then you will save some time by removing the distinct which forces a final sort of the result set. This savings would be more noticeable the larger the result set is.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

group by slows down the query - mysql

The slow down might not be related to the GROUP BY clause. Try adding an index for w.code and z.forsale individually. MySQL Profiling might also help you in your endeavour

The most obvious answer would be to create a new index for product_t only. Cany you create new indexes?

Related

optimizing short text comparisons

MySQL Query performance improvement for order by before group by

How can I filter a SQL query using the MAX of a column without subqueries?

Inexplicably slow query in MySQL

Fastest way to select min row with join

Categories

Resources