Optimize MySQL query for group_concat function - mysql

SELECT SQL_NO_CACHE link.stop, stop.common_name, locality.name, stop.bearing, stop.latitude, stop.longitude
FROM service
JOIN pattern ON pattern.service = service.code
JOIN link ON link.section = pattern.section
JOIN naptan.stop ON stop.atco_code = link.stop
JOIN naptan.locality ON locality.code = stop.nptg_locality_ref
GROUP BY link.stop
The above query takes roughly 800ms - 1000ms to run.
If I append a group_concat statement the query then takes 8 - 10 seconds:
SELECT SQL_NO_CACHE link.stop, link.stop, stop.common_name, locality.name, stop.bearing, stop.latitude, stop.longitude, group_concat(service.line) lines
How can I change this query so that it runs in less than 2 seconds with the group_concat statement?
SQL Fiddle: http://sqlfiddle.com/#!9/414fe
EXPLAIN statements for both queries: http://i.imgur.com/qrURgzV.png

How long does this query take?
SELECT p.section, GROUP_CONCAT(s.line)
FROM pattern p join
service s
ON p.service = s.code
GROUP BY p.section
I am thinking that you can do the group_concat() in a subquery, so the outer query does not need an aggregation. This can speed queries when there is one table in the subquery. In your case, there are two.
The final results would be something like:
link.section = pattern.section
SELECT SQL_NO_CACHE . . .,
(SELECT GROUP_CONCAT(s.line)
FROM pattern p join
service s
ON p.service = s.code
WHERE p.section = link.section
) as lines
FROM link JOIN
naptan.stop
ON stop.atco_code = link.stop JOIN
naptan.locality
ON locality.code = stop.nptg_locality_ref;
For this query, you want the following additional indexes: pattern(section, service) and service(code, line).
I don't know if this will work, but it is worth a try.
Note: this is assuming that you really don't need the group by for the rest of the columns.

A remark: You're using the nonstandard MySQL extension to GROUP BY. It happens to work for you because link.stop is joined to stop.atco_code, which itself is a primary key. But you need to be very careful with this.
I suggest you add some compound indexes. You join in to pattern on service and join out based on section. So add this index.
ALTER TABLE pattern ADD INDEX service_section (service, section, line);
This will let the query use just the index, and not have to hit the table itself to retrieve the information needed for the JOIN or your GROUP_CONCAT() operation. (You might also delete the index on just service, this new index makes it redundant).
Similarly, you want to create an index (section, stop) on the link table, and get rid of the index on just section.
On stop, you're using most of the columns, and you already have an index (PK) on atco_code, so let this one be.
Finally, on locality put an index on (code,name).
All this indexing monkey business should cut down the amount of work MySQL must do to satisfy your query.
Now look, as soon as you add WHERE anything = anything to the query, you may need to add a column to one or more of these indexes. You definitely should read up on multi-column indexing and grouping; good indexing is a critical success factor for your kind of data.
You should also run ANALYZE TABLE xxxx on each of your tables after inserting lots of rows, to make sure the query optimizer can see appropriate information about the content of the table and indexes.

Related

Mysql, PHP Laravel

I have following tables on Production with the respective counts of records,
people count= '565367'
donors count= '556325'
telerec_recipients count= '115147'
person_addresses count= '563183'
person_emails count= '106958'
person_phones count= '676474'
person_sms count= '22275'
On the UI end I want to display some data by applying some joins or left joins, and at the end by grouping, ordering and paginating I'm generating a json response.
So for achieving this I tried 2 methods, one by creating the view file of the join queries and apply where, group by and order by clauses inside controller, and second by directly firing the laravel syntax of joins in my controller.
Because I have a large data in all the tables, The join query works good but when it comes to group by statement it takes much time to execute.
Help me out in order to optimize or faster this process.
My Example Query is:
SELECT people.id as person_id,donors.donor_id,telerec_recipients.id as telerec_recipient_id,people.first_name,people.last_name,people.birth_date,
person_addresses.address_id,person_emails.email_id,person_phones.phone_id,person_sms.sms_id
FROM `people`
inner join `donors`
on `people`.`id` = `donors`.`person_id`
left join `telerec_recipients`
on `people`.`id` = `telerec_recipients`.`person_id`
left join `person_addresses`
on `people`.`id` = `person_addresses`.`person_id`
left join `person_emails`
on `people`.`id` = `person_emails`.`person_id`
left join `person_phones`
on `people`.`id` = `person_phones`.`person_id`
left join `person_sms`
on `people`.`id` = `person_sms`.`person_id`
GROUP BY `people`.`id`, `telerec_recipients`.`id`
ORDER BY `last_name` ASC LIMIT 25 offset 0;
Result of the explain:
Donors Table Indexes
The problem is that MySQL uses the wrong index on the donors table, you need to instruct MySQL using force index index hint to use the primary key instead.
Explanation
From the output of the explain it is clear that sg really goes wrong with donors table, since you can see using temporary, using filesort in the extra column.
The key column tells you that MySQL uses a unique index on the donors.donor_id field. However, donor_id field is not used anywhere in the query for joining / grouping / filtering purposes. donors.person_id field is used in the join. Since the primary key of donors table is on the person_id field, myql should use that for the join. force index index hint tells MySQL that you firmly believe that a certain index should be used.
Note 1:
You have several duplicate indexes. Remove all duplicates, they are only slowing your system own.
Note 2:
I think you query is against the sql standards because you have fields in the select list that are neither in the group by list, nor are subject of an aggregate function such as sum(), nor are functionally dependent on the grouping fields. In MySQL under certain sql mode settings such queries are allowed to run, but they may not produce entirely the same output as you would like.

Improve indexing to speed up slow query

SELECT SQL_NO_CACHE TIME_FORMAT(ADDTIME(journey.departure
, SEC_TO_TIME(SUM(link2.elapsed))), '%H:%i') AS departure
FROM journey
JOIN journey_day
ON journey_day.journey = journey.code
JOIN pattern
ON pattern.code = journey.pattern
JOIN service
ON service.code = pattern.service
JOIN link
ON link.section = pattern.section
AND link.stop = "370023591"
JOIN link link2
ON link2.section = pattern.section
AND link2.id <= link.id
WHERE journey_day.day = 6
GROUP BY journey.id
ORDER BY journey.departure
The above query takes 1-2 seconds to run. I need to reduce this to roughly 100ms. Please note that I understand the service table hasn't been used in the query, but that is just to simplify the question.
Any ideas how I can speed this up? I can see that the link table is using filesort, is this causing the slowness in the query?
One thought is that you could explicitly optimize the selection of the "link" table record with minimum "id" value.
Using a temporary table or materialized WITH statement are two approaches that would work to produce the result set. Two approaches to get the minimum value of "id" are 1) ordering by id, adding a row_number and selecting the first value; or 2) I use a windowed row_number, ordering by id, and again select the row with value 1.
Well planned indexes can by crucial to performance. From what you have presented, I would start with the following specific indexes... these are all covering indexes to qualify all the joins and criteria you will be working with. Covering indexes help the engine because the engine can get all the data that qualify without having to go to the raw data pages.
Most specifically, starting with your journey table, I would explicitly have the composite index based on all 3 fields in the order I have them... Day first as that is your WHERE criteria, then ID as that is the GROUP BY and finally DEPARTURE for your ORDER BY clause.
The LINK table based on section and stop first as those are the criteria as joined to the journey table. ID next as it is basis of joining to LINK2, and finally ELAPSED for your field criteria selection.
table index
journey (day, id, departure)
link (section, stop, id, elapsed)
pattern (code, service, section)
service (code)

Can you index subqueries?

I have a table and a query that looks like below. For a working example, see this SQL Fiddle.
SELECT o.property_B, SUM(o.score1), w.score
FROM o
INNER JOIN
(
SELECT o.property_B, SUM(o.score2) AS score FROM o GROUP BY property_B
) w ON w.property_B = o.property_B
WHERE o.property_A = 'specific_A'
GROUP BY property_B;
With my real data, this query takes 27 seconds. However, if I first create w as a temporary Table and index property_B, it all together takes ~1 second.
CREATE TEMPORARY TABLE w AS
SELECT o.property_B, SUM(o.score2) AS score FROM o GROUP BY property_B;
ALTER TABLE w ADD INDEX `property_B_idx` (property_B);
SELECT o.property_B, SUM(o.score1), w.score
FROM o
INNER JOIN w ON w.property_B = o.property_B
WHERE o.property_A = 'specific_A'
GROUP BY property_B;
DROP TABLE IF EXISTS w;
Is there a way to combine the best of these two queries? I.e. a single query with the speed advantages of the indexing in the subquery?
EDIT
After Mehran's answer below, I read this piece of explanation in the MySQL documentation:
As of MySQL 5.6.3, the optimizer more efficiently handles subqueries in the FROM clause (that is, derived tables):
...
For cases when materialization is required for a subquery in the FROM clause, the optimizer may speed up access to the result by adding an index to the materialized table. If such an index would permit ref access to the table, it can greatly reduce amount of data that must be read during query execution. Consider the following query:
SELECT * FROM t1
JOIN (SELECT * FROM t2) AS derived_t2 ON t1.f1=derived_t2.f1;
The optimizer constructs an index over column f1 from derived_t2 if doing so would permit the use of ref access for the lowest cost execution plan. After adding the index, the optimizer can treat the materialized derived table the same as a usual table with an index, and it benefits similarly from the generated index. The overhead of index creation is negligible compared to the cost of query execution without the index. If ref access would result in higher cost than some other access method, no index is created and the optimizer loses nothing.
First of all you need to know that creating a temporary table is absolutely a feasible solution. But in cases no other choice is applicable which is not true here!
In your case, you can easily boost your query as FrankPl pointed out because your sub-query and main-query are both grouping by the same field. So you don't need any sub-queries. I'm going to copy and paste FrankPl's solution for the sake of completeness:
SELECT o.property_B, SUM(o.score1), SUM(o.score2)
FROM o
GROUP BY property_B;
Yet it doesn't mean it's impossible to come across a scenario in which you wish you could index a sub-query. In which cases you've got two choices, first is using a temporary table as you pointed out yourself, holding the results of the sub-query. This solution is advantageous since it is supported by MySQL for a long time. It's just not feasible if there's a huge amount of data involved.
The second solution is using MySQL version 5.6 or above. In recent versions of MySQL new algorithms are incorporated so an index defined on a table used within a sub-query can also be used outside of the sub-query.
[UPDATE]
For the edited version of the question I would recommend the following solution:
SELECT o.property_B, SUM(IF(o.property_A = 'specific_A', o.score1, 0)), SUM(o.score2)
FROM o
GROUP BY property_B
HAVING SUM(IF(o.property_A = 'specific_A', o.score1, 0)) > 0;
But you need to work on the HAVING part. You might need to change it according to your actual problem.
I am not really that familiar with MySql, I mostly worked with Oracle.
If you want a where-clause in the SUM, you can use decode or case.
it would look something like that
SELECT o.property_B, , SUM(decode(property_A, 'specific_A', o.score1, 0), SUM(o.score2)
FROM o
GROUP BY property_B;
or with case
SELECT o.property_B, , SUM(CASE
WHEN property_A = 'specific_A' THEN o.score1
ELSE 0
END ),
SUM(o.score2)
FROM o
GROUP BY property_B;
I do not see why you would need the join at all. I would assume that
SELECT o.property_B, SUM(o.score1), SUM(o.score2)
FROM o
GROUP BY property_B;
should give what you want, but with a much simpler and hence better to optimize statement.
It should be the duty of MySQL to optimise your query, and I don't think there is a way to create an index on the fly. However, you can try to force the use of the index of property_o (if you have it). See http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
Also, you can merge the create and alter statements, if you prefer.

Mysql left join very slow

I have a left join:
$query = "SELECT a.`id`, a.`documenttitle`, a.`committee`, a.`issuedate`, b.`tagname`
FROM `#__document_management_documents` AS a
LEFT JOIN `#__document_managment_tags` AS b
ON a.id = b.documentid
".$tagexplode."
".$issueDateText."
AND a.committee in (".$committeeQueryTextExplode.")
AND a.documenttitle LIKE '".$documentNameFilter."%'
GROUP BY a.id ORDER BY a.documenttitle ASC
";
It's really slow abaout 7 seconds on 4000 records
Any ideas what I might be doing wrong
SELECT a.`id`, a.`documenttitle`, a.`committee`, a.`issuedate`, b.`tagname`
FROM `w4c_document_management_documents` AS a
LEFT JOIN `document_managment_tags` AS b
ON a.id = b.documentid WHERE a.issuedate >= ''
AND a.committee in ('1','8','9','10','11','12','13','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41','42','43','44','45','46','47')
AND a.documenttitle LIKE '%' GROUP BY a.id ORDER BY a.documenttitle ASC
I would put an index on a.committee, and full text index the doctitle col. The IN and LIKE are immediate flags to me. Issue date should also have an index because you are >= it
Try running the following commands in a MySQL client:
show index from #__document_management_documents;
show index from #_document_management_tags;
Check to see if there are keys/indexes on the id and documentid fields from the respective tables. If there aren't, MySQL will be doing a full table scan to lookup the values. Creating indexes on these fields makes the search time logarithmic, because it sorts them in a binary tree which is stored in the index file. Even better is to use primary keys (if possible), because that way the row data is stored in the leaf, which saves MySQL another I/O operation to lookup the data.
It could also simply be that the IN and >= operators have bad performance, in which case you might have to rewrite your queries or redesign your tables.
As mentioned above, try to find if your columns have index. You can even do "EXPLAIN" command in your MySQL client at the start of your query to see if the query is actually using indexes. You will see in the 'key' columns and 'Extra' column. Get more information here
This will help you optimize your query. Also group by causes using temporary and filesort which causes MySQL to create a temporary table and going through each rows. If you could use PHP to group by it would be faster.

MySQL indexing in an "or" statement

I have 3 tables that I need to join, these join together fine using indexes. However, we are transitioning over from using one legacy field as the identifier to another one in another table.
LEGACYID is that legacy field, while NEWID is the new field. Both fields are varchars.
Both fields are indexed exclusively with a btree index, both tables are MyISAM.
SELECT Username
FROM CUST C use index(primary,NEWID)
JOIN TBLSHP S ON S.CUSID = C.CUSID
JOIN TBLQ Q ON Q.SHPID = S.SHPID
WHERE C.LEGACYID = '692041'
OR Q.NEWID = '692041'
This query takes 5.147 seconds, that's 5 seconds longer than I expect.
When doing an EXPLAIN EXTENDED query the index type for NEWID is ALL i.e. full table scan , possible keys are (primary,NEWID) and key(null). If I remove the LEGACYID from the Or statement, explain says key (NEWID) will now be used. If I remove NEWID from the OR statement changes occur as following:
the type of the table joins for (S,C) change from type ref to eq_ref
key_len changes from 4 to 5 (on both)
extra changes from empty to "Using where" .
With either one of the statements removed from the the OR statement the query runs at expected speeds.
Table Q has 183k records; C:115k; S:169k.
One last point. if I move the query placement:
SELECT Username
FROM CUST C use index(primary,NEWID)
JOIN TBLSHP S ON S.CUSID = C.CUSID
LEFT JOIN TBLQ Q ON Q.SHPID = S.SHPID
AND Q.NEWID = '692041'
WHERE C.LEGACYID = '692041'
Although its not the same query, for the way the data works, it will provide the results I need, and the speed is down to under a .1 of a second again.
I did want to clarify that I don't really need a query that works solution. Thanks to Ponies below that already has provided one. What I need to know is if anyone else has run into this problem and can explain why this is happening and what I can do for this simple or statement to use both indexes.
If you know there won't be duplicates, change UNION to UNION ALL (UNION ALL is faster because it doesn't remove duplicates). Otherwise, use:
SELECT Username
FROM CUST C use index(primary,NEWID)
JOIN TBLSHP S ON S.CUSID = C.CUSID
JOIN TBLQ Q ON Q.SHPID = S.SHPID
WHERE C.LEGACYID = '692041'
UNION
SELECT Username
FROM CUST C use index(primary,NEWID)
JOIN TBLSHP S ON S.CUSID = C.CUSID
JOIN TBLQ Q ON Q.SHPID = S.SHPID
WHERE Q.NEWID = '692041'
ORs are notoriously bad performers, because it splinters the execution path. The UNION alleviates that splintering, and the combines the two results sets. That said, IN is preferable to ORs because though being logically the same, the execution of IN is generally more optimized.
UNION isn't always the answer
Investigate many options, comparing the EXPLAIN PLAN output before determining a solution. I've come across a couple recently that perform better using a cursor than a single query using esoteric functionality.
Also, make sure foreign key columns (what you're using in the ON clause when JOINing) are indexed. MySQL has started (v5.5+?) to automatically do this when a foreign key constraint is made, but that's only for InnoDB tables.