Slow count query with where clause - mysql

I am trying to perform a count to get the total number of results in a pagination but the query is too slow 2.12s
+-------+
| size |
+-------+
| 50000 |
+-------+
1 row in set (2.12 sec)
my count query
select count(appeloffre0_.ID_APPEL_OFFRE) as size
from ao.appel_offre appeloffre0_
inner join ao.acheteur acheteur1_
on appeloffre0_.ID_ACHETEUR=acheteur1_.ID_ACHETEUR
where
(exists (select 1 from ao.lot lot2_ where lot2_.ID_APPEL_OFFRE=appeloffre0_.ID_APPEL_OFFRE and lot2_.ESTIMATION_COUT>=1))
and (exists (select 1 from ao.lieu_execution lieuexecut3_ where lieuexecut3_.appel_offre=appeloffre0_.ID_APPEL_OFFRE and lieuexecut3_.region=1))
and (exists (select 1 from ao.ao_activite aoactivite4_ where aoactivite4_.ID_APPEL_OFFRE=appeloffre0_.ID_APPEL_OFFRE and (aoactivite4_.ID_ACTIVITE=1)))
and appeloffre0_.DATE_OUVERTURE_PLIS>'2015-01-01'
and (appeloffre0_.CATEGORIE='fournitures' or appeloffre0_.CATEGORIE='travaux' or appeloffre0_.CATEGORIE='services')
and acheteur1_.ID_ENTITE_MERE=2
explain cmd :
+----+--------------------+--------------+------+---------------------------------------------+--------------------+---------+--------------------------------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------+------+---------------------------------------------+--------------------+---------+--------------------------------+-------+--------------------------+
| 1 | PRIMARY | acheteur1_ | ref | PRIMARY,acheteur_ibfk_1 | acheteur_ibfk_1 | 5 | const | 3 | Using where; Using index |
| 1 | PRIMARY | appeloffre0_ | ref | appel_offre_ibfk_2 | appel_offre_ibfk_2 | 4 | ao.acheteur1_.ID_ACHETEUR | 31061 | Using where |
| 4 | DEPENDENT SUBQUERY | aoactivite4_ | ref | ao_activites_activite_fk,ao_activites_ao_fk | ao_activites_ao_fk | 4 | ao.appeloffre0_.ID_APPEL_OFFRE | 3 | Using where |
| 3 | DEPENDENT SUBQUERY | lieuexecut3_ | ref | fk_ao_lieuex,fk_region_lieuex | fk_ao_lieuex | 4 | ao.appeloffre0_.ID_APPEL_OFFRE | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | lot2_ | ref | FK_LOT_AO | FK_LOT_AO | 4 | ao.appeloffre0_.ID_APPEL_OFFRE | 5 | Using where |
+----+--------------------+--------------+------+---------------------------------------------+--------------------+---------+--------------------------------+-------+--------------------------+
the index acheteur_ibfk_1 is a FK references table ENTITE_MERE because i have and acheteur1_.ID_ENTITE_MERE=2 in where clause.

You can have multiple conditions on your joins by using ON condition1 AND condition2 etc.
SELECT COUNT(appeloffre0_.ID_APPEL_OFFRE) as size
FROM ao.appel_offre appeloffre0_
JOIN ao.acheteur acheteur1_ ON appeloffre0_.ID_ACHETEUR=acheteur1_.ID_ACHETEUR
JOIN ao.lot lot2_ ON appeloffre0_.ID_APPEL_OFFRE=lot2_.ID_APPEL_OFFRE AND lot2_.ESTIMATION_COUT>=1
JOIN ao.lieu_execution lieuexecut3_ ON appeloffre0_.ID_APPEL_OFFRE=lieuexecut3_.ID_APPEL_OFFRE AND lieuexecut3_.ID_ACTIVITE=1
JOIN ao.ao_activite aoactivite4_ ON appeloffre0_.ID_APPEL_OFFRE=aoactivite4_.ID_APPEL_OFFRE AND aoactivite4_.ID_ACTIVITE=1
WHERE appeloffre0_.DATE_OUVERTURE_PLIS>'2015-01-01'
AND (appeloffre0_.CATEGORIE='fournitures' OR appeloffre0_.CATEGORIE='travaux' OR appeloffre0_.CATEGORIE='services')
AND acheteur1_.ID_ENTITE_MERE=2;

You can try:
select count(aa.ID_APPEL_OFFRE) as size
from (
select ID_APPEL_OFFRE, ID_ACHETEUR from ao.appel_offre appeloffre0_
inner join ao.acheteur acheteur1_
on appeloffre0_.ID_ACHETEUR=acheteur1_.ID_ACHETEUR
where appeloffre0_.DATE_OUVERTURE_PLIS>'2015-01-01'
and (appeloffre0_.CATEGORIE in ('fournitures','travaux','services'))
and (acheteur1_.ID_ENTITE_MERE=2)) aa
inner join ao.lot lot2_ on lot2_.ID_APPEL_OFFRE=aa.ID_APPEL_OFFRE
inner join ao.lieu_execution lieuexecut3_ on lieuexecut3_.appel_offre=aa.ID_APPEL_OFFRE
inner join ao.ao_activite aoactivite4_ on aoactivite4_.ID_APPEL_OFFRE=aa.ID_APPEL_OFFRE
where
aoactivite4_.ID_ACTIVITE=1
and lot2_.ESTIMATION_COUT>=1
and lieuexecut3_.region=1;
But I haven't seen your tables so I am not 100% sure that you won't get duplicates because of joins.
A couple of low-hanging fruits might also be found by ensuring that your appeloffre0_.CATEGORIE and appeloffre0_.DATE_OUVERTURE_PLIS have indexes on them.
Other fields which should have indexes on them are ao.lot.ID_APPEL_OFFRE, ao.lieu_execution.ID_APPEL_OFFRE and ao.ao_activite.ID_APPEL_OFFRE, and ao.appel_offre.ID_ACHETEUR (all the joined fields).

I would have the following indexes on your tables if not already... These are covering indexes for your query meaning the index has the applicable column to get your results without having to go to the actual raw data pages.
table index
appel_offre ( DATE_OUVERTURE_PLIS, CATEGORIE, ID_APPEL_OFFRE, ID_ACHETEUR )
lot ( ID_APPEL_OFFRE, ESTIMATION_COUT )
lieu_execution ( appel_offre, region )
ao_activite ( ID_APPEL_OFFRE, ID_ACTIVITE )
Having indexes on just individual columns won't really help optimize what you are looking for. Also, I am doing count of DISTINCT ID_APPEL_OFFRE's in case any of the JOINed tables have more than 1 record, it does not create a Cartesian result count for you
select
count(distinct AOF.ID_APPEL_OFFRE) as size
from
ao.appel_offre AOF
JOIN ao.acheteur ACH
on AOF.ID_ACHETEUR = ACH.ID_ACHETEUR
and ACH.ID_ENTITE_MERE = 2
JOIN ao.lot
ON AOF.ID_APPEL_OFFRE = lot.ID_APPEL_OFFRE
and lot.ESTIMATION_COUT >= 1
JOIN ao.lieu_execution EX
ON AOF.ID_APPEL_OFFRE = EX.appel_offre
and EX.region = 1
JOIN ao.ao_activite ACT
ON AOF.ID_APPEL_OFFRE = ACT.ID_APPEL_OFFRE
and ACT.ID_ACTIVITE = 1
where
AOF.DATE_OUVERTURE_PLIS > '2015-01-01'
and ( AOF.CATEGORIE = 'fournitures'
or AOF.CATEGORIE = 'travaux'
or AOF.CATEGORIE = 'services')

Like #FuzzyTree said in his comment exists is faster than an inner join if it's not a 1:1 relationship because it terminates as soon as it finds 1 whereas the join will get every matching row.
But the solution is that We add in and not exists :
where ( appeloffre0_.ID_APPEL_OFFRE IN (select lot2_.ID_APPEL_OFFRE from ao.lot lot2_
where lot2_.ESTIMATION_COUT>=1)
)
So the query run very fast than exists or joins .
select count(appeloffre0_.ID_APPEL_OFFRE) as size
from ao.appel_offre appeloffre0_
inner join ao.acheteur acheteur1_
on appeloffre0_.ID_ACHETEUR=acheteur1_.ID_ACHETEUR
where
( appeloffre0_.ID_APPEL_OFFRE IN (select lot2_.ID_APPEL_OFFRE from ao.lot lot2_ where lot2_.ESTIMATION_COUT>=1))
and (appeloffre0_.ID_APPEL_OFFRE IN (select lieuexecut3_.appel_offre from ao.lieu_execution lieuexecut3_ where lieuexecut3_.region=1))
and (appeloffre0_.ID_APPEL_OFFRE IN (select aoactivite4_.ID_APPEL_OFFRE from ao.ao_activite aoactivite4_ where aoactivite4_.ID_ACTIVITE=1 ))
and appeloffre0_.DATE_OUVERTURE_PLIS>'2015-01-01'
and (appeloffre0_.CATEGORIE='fournitures' or appeloffre0_.CATEGORIE='travaux' or appeloffre0_.CATEGORIE='services')
and acheteur1_.ID_ENTITE_MERE=2

Related

How to match more than one rows using INNER JOIN with MySQL?

What is the right way to select films which labels are 'Action' AND 'Drama' using INNER JOIN ?
I've tried this query, the result must be 'Taken, The Godfather' but, no result returned.
SELECT
f.film_guid,
f.film_name
FROM
films as f
INNER JOIN
film_labels as l ON l.film_guid = f.film_guid
WHERE
l.label = 'Action' AND l.label = 'Drama'
Table: films
+------------+----------------+
| film_guid | film_name |
+------------+----------------+
| filmguid_1 | Taken |
| filmguid_2 | Matrix |
| filmguid_3 | The Godfather |
+------------+----------------+
Table: film_labels
+------------+----------------+
| film_guid | label |
+------------+----------------+
| filmguid_1 | Action |
| filmguid_1 | Drama |
| filmguid_1 | Family |
| filmguid_2 | Action |
| filmguid_3 | Action |
| filmguid_3 | Drama |
+------------+----------------+
You are looking for a rows in film_labels that contains both Action and Drama, which cannot happen. You need to search across labels that correspond to the given film, which suggest aggregation:
SELECT f.film_guid, f.film_name
FROM films as f
INNER JOIN film_labels as l ON l.film_guid = f.film_guid
WHERE l.label IN ('Action', 'Drama') -- either one, or the other
GROUP BY f.film_guid, f.film_name
HAVING COUNT(*) = 2 -- both match
Note that you could also use exists with correlated subquery. It is a bit longer to type but could be more efficient (with the right indexes in place), since it avoids the need for aggregation:
SELECT f.*
FROM films as f
WHERE
EXISTS (SELECT 1 FROM film_labels l WHERE l.film_guid = f.film_guid AND l.label = 'Action')
AND EXISTS (SELECT 1 FROM film_labels l WHERE l.film_guid = f.film_guid AND l.label = 'Drama')
For performance with the second query, you want an index on film_labels(film_guid , label).

Can query be optimized: Get a records max date then join the max date's values

I've created a query that returns the results I want but I feel there must be a better way to do this. Any guidance would be appreciated.
I am trying to get all items for a specific meeting and join their max meeting date < X and join the max date's committee acronym. X is the current meeting date.
I've tried a few different queries but none, other than the one below, returned the expected results all the time.
You can see this query in action by going to rextester.
DROP TABLE IF EXISTS `committees`;
CREATE TABLE committees
(`id` int, `acronym` varchar(4))
;
INSERT INTO committees
(`id`, `acronym`)
VALUES
(1, 'Com1'),
(2, 'Com2'),
(3, 'Com3')
;
DROP TABLE IF EXISTS `meetings`;
CREATE TABLE meetings
(`id` int, `date` datetime, `committee_id` int)
;
INSERT INTO meetings
(`id`, `date`, `committee_id`)
VALUES
(1, '2017-01-01 00:00:00', 1),
(2, '2017-02-02 00:00:00', 2),
(3, '2017-03-03 00:00:00', 2)
;
DROP TABLE IF EXISTS `agenda_items`;
CREATE TABLE agenda_items
(`id` int, `name` varchar(6))
;
INSERT INTO agenda_items
(`id`, `name`)
VALUES
(1, 'Item 1'),
(2, 'Item 2'),
(3, 'Item 3')
;
DROP TABLE IF EXISTS `join_agenda_items_meetings`;
CREATE TABLE join_agenda_items_meetings
(`id` int, `agenda_item_id` int, `meeting_id` int)
;
INSERT INTO join_agenda_items_meetings
(`id`, `agenda_item_id`, `meeting_id`)
VALUES
(1, 1, 1),
(2, 1, 2),
(3, 2, 1),
(4, 3, 2),
(5, 2, 1),
(6, 1, 3)
;
SELECT agenda_items.id,
meetings.id,
meetings.date,
sub_one.max_date,
sub_two.acronym
FROM agenda_items
LEFT JOIN (SELECT ai.id AS ai_id,
me.id AS me_id,
Max(me.date) AS max_date
FROM agenda_items AS ai
JOIN join_agenda_items_meetings AS jaim
ON jaim.agenda_item_id = ai.id
JOIN meetings AS me
ON me.id = jaim.meeting_id
WHERE me.date < '2017-02-02'
GROUP BY ai_id) sub_one
ON sub_one.ai_id = agenda_items.id
LEFT JOIN (SELECT agenda_items.id AS age_id,
meetings.date AS meet_date,
committees.acronym AS acronym
FROM agenda_items
JOIN join_agenda_items_meetings
ON join_agenda_items_meetings.agenda_item_id = agenda_items.id
JOIN meetings
ON meetings.id = join_agenda_items_meetings.meeting_id
JOIN committees
ON committees.id = meetings.committee_id
WHERE meetings.date) sub_two
ON sub_two.age_id = agenda_items.id
AND sub_one.max_date = sub_two.meet_date
JOIN join_agenda_items_meetings
ON agenda_items.id = join_agenda_items_meetings.agenda_item_id
JOIN meetings
ON meetings.id = join_agenda_items_meetings.meeting_id
WHERE meetings.id = 2;
REVIEW / TESTING OF ANSWERS (REVISED):*
I've revised the testing based on the comments made.
Since I put a bounty on this question I felt I should show how I'm evaluating the answers and give some feedback. Overall I'm very grateful to all how have helped out, thank you.
For testing, I reviewed the queries against:
the initial rextester
a modified version of the initial rextester with all 4 queries for 2 separate datasets
a larger data set from my actual database
My Original Query with EXPLAIN
+----+-------------+---------------------------+------+----------------------------------------------+
| id | select_type | table | rows | Extra |
+----+-------------+---------------------------+------+----------------------------------------------+
| 1 | PRIMARY | meetings | 1 | |
| 1 | PRIMARY | join_agenda_item_meetings | 1976 | Using where; Using index |
| 1 | PRIMARY | agenda_items | 1 | Using index |
| 1 | PRIMARY | <derived2> | 1087 | |
| 1 | PRIMARY | <derived3> | 2202 | |
| 3 | DERIVED | join_agenda_item_meetings | 1976 | Using index |
| 3 | DERIVED | meetings | 1 | Using where |
| 3 | DERIVED | committees | 1 | |
| 3 | DERIVED | agenda_items | 1 | Using index |
| 2 | DERIVED | jaim | 1976 | Using index; Using temporary; Using filesort |
| 2 | DERIVED | me | 1 | Using where |
| 2 | DERIVED | ai | 1 | Using index |
+----+-------------+---------------------------+------+----------------------------------------------+
12 rows in set (0.02 sec)
Paul Spiegel's answers.
The initial answer works and seems to be the most efficient option presented, much more than mine.
Paul Spiegel's first query pulls the fewest rows, is shorter and more readable than mine. It also doesn't need to reference a date which will be nicer when writing it as well.
+----+--------------------+-------+------+--------------------------+
| id | select_type | table | rows | Extra |
+----+--------------------+-------+------+--------------------------+
| 1 | PRIMARY | m1 | 1 | |
| 1 | PRIMARY | am1 | 1976 | Using where; Using index |
| 1 | PRIMARY | am2 | 1 | Using index |
| 1 | PRIMARY | m2 | 1 | |
| 2 | DEPENDENT SUBQUERY | am3 | 1 | Using index |
| 2 | DEPENDENT SUBQUERY | m3 | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | c3 | 1 | Using where |
+----+--------------------+-------+------+--------------------------+
7 rows in set (0.00 sec)
This query also returns the correct results when adding DISTINCT to the select statement. This query does not perform as well as the first though (but it is close).
+----+-------------+------------++------+-------------------------+
| id | select_type | table | rows | Extra |
+----+-------------+------------++------+-------------------------+
| 1 | PRIMARY | <derived2> | 5 | Using temporary |
| 1 | PRIMARY | am | 1 | Using index |
| 1 | PRIMARY | m | 1 | |
| 1 | PRIMARY | c | 1 | Using where |
| 2 | DERIVED | m1 | 1 | |
| 2 | DERIVED | am1 | 1787 | Using where; Using index |
| 2 | DERIVED | am2 | 1 | Using index |
| 2 | DERIVED | m2 | 1 | |
+----+-------------+------------+------+--------------------------+
8 rows in set (0.00 sec)
Stefano Zanini's answer
This query does return the expected results using DISTINCT. When using EXPLAIN and the number of rows being pulled this query is more efficient when compared to my original one but Paul Spiegel's is just a bit better.
+----+-------------+------------+------+---------------------------------+
| id | select_type | table | rows | Extra |
+----+-------------+------------+------+---------------------------------+
| 1 | PRIMARY | me | 1 | Using temporary; Using filesort |
| 1 | PRIMARY | rel | 1787 | Using where; Using index |
| 1 | PRIMARY | <derived2> | 1087 | |
| 1 | PRIMARY | rel2 | 1 | Using index |
| 1 | PRIMARY | me2 | 1 | Using where |
| 1 | PRIMARY | co | 1 | |
| 2 | DERIVED | t1 | 1787 | Using index |
| 2 | DERIVED | t2 | 1 | Using where |
+----+-------------+------------+------+---------------------------------+
8 rows in set (0.00 sec)
EoinS' answer
As noted in the comments, this answer works if meetings are sequential, but they may not be unfortunately.
This one is a bit crazy.. Let's do it step by step:
The first step is a basic join
set #meeting_id = 2;
select am1.meeting_id,
am1.agenda_item_id,
m1.date as meeting_date
from meetings m1
join join_agenda_items_meetings am1 on am1.meeting_id = m1.id
where m1.id = #meeting_id;
We select the meeting (id = 2) and the corresponding agenda_item_ids. This will already return the rows we need with the first three columns.
Next step is to get the last meeting date for every agenda item. We need to join the first query with the join table and corresponding meetings (except of the one with id = 2 - am2.meeting_id <> am1.meeting_id). We only want meetings with a date before the actual meeting (m2.date < m1.date). From all those meetings we only want the latest date each agenda item. So we group by the agenda item and select max(m2.date):
select am1.meeting_id,
am1.agenda_item_id,
m1.date as meeting_date,
max(m2.date) as max_date
from meetings m1
join join_agenda_items_meetings am1 on am1.meeting_id = m1.id
left join join_agenda_items_meetings am2
on am2.agenda_item_id = am1.agenda_item_id
and am2.meeting_id <> am1.meeting_id
left join meetings m2
on m2.id = am2.meeting_id
and m2.date < m1.date
where m1.id = #meeting_id
group by m1.id, am1.agenda_item_id;
This way we get the fourth column (max_date).
Last step is to select the acronym of the meeting with the last date (max_date). And this is the crazy part - We can use a correlated subquery in the SELECT clause. And we can use max(m2.date) for the correlation:
select c3.acronym
from meetings m3
join join_agenda_items_meetings am3 on am3.meeting_id = m3.id
join committees c3 on c3.id = m3.committee_id
where am3.agenda_item_id = am2.agenda_item_id
and m3.date = max(m2.date)
The final query would be:
select am1.meeting_id,
am1.agenda_item_id,
m1.date as meeting_date,
max(m2.date) as max_date,
( select c3.acronym
from meetings m3
join join_agenda_items_meetings am3 on am3.meeting_id = m3.id
join committees c3 on c3.id = m3.committee_id
where am3.agenda_item_id = am2.agenda_item_id
and m3.date = max(m2.date)
) as acronym
from meetings m1
join join_agenda_items_meetings am1 on am1.meeting_id = m1.id
left join join_agenda_items_meetings am2
on am2.agenda_item_id = am1.agenda_item_id
and am2.meeting_id <> am1.meeting_id
left join meetings m2
on m2.id = am2.meeting_id
and m2.date < m1.date
where m1.id = #meeting_id
group by m1.id, am1.agenda_item_id;
http://rextester.com/JKK60222
To be true, i was surprised that you can use max(m2.date) in the subquery.
Another solution - Use the second query in a subquery (derived table). Join committees over meetings and the join table using max_date. Only keep rows with an acronym and rows without a max_date.
select t.*, c.acronym
from (
select am1.meeting_id,
am1.agenda_item_id,
m1.date as meeting_date,
max(m2.date) as max_date
from meetings m1
join join_agenda_items_meetings am1 on am1.meeting_id = m1.id
left join join_agenda_items_meetings am2
on am2.agenda_item_id = am1.agenda_item_id
and am2.meeting_id <> am1.meeting_id
left join meetings m2
on m2.id = am2.meeting_id
and m2.date < m1.date
where m1.id = #meeting_id
group by m1.id, am1.agenda_item_id
) t
left join join_agenda_items_meetings am
on am.agenda_item_id = t.agenda_item_id
and t.max_date is not null
left join meetings m
on m.id = am.meeting_id
and m.date = t.max_date
left join committees c on c.id = m.committee_id
where t.max_date is null or c.acronym is not null;
http://rextester.com/BBMDFL23101
Using your schema I used the below query, assuming that all meetings entries are sequential:
set #mymeeting = 2;
select j.agenda_item_id, m.id, m.date, mp.date, c.acronym
from meetings m
left join join_agenda_items_meetings j on j.meeting_id = m.id
left join join_agenda_items_meetings jp on jp.meeting_id = m.id -1 and jp.agenda_item_id = j.agenda_item_id
left join meetings mp on mp.id = jp.meeting_id
left join committees c on mp.committee_id = c.id
where m.id = #mymeeting;
I create a variable just to make it easy to change meetings on the fly.
Here is a functional example in Rextester
Thanks for making your schema so easy to reproduce!
I found this problem quite challenging, and the results I achieved are not jaw-dropping, but I managed to get rid of one of the sub-queries and maybe of a few joins, and this is result:
select distinct me.ID, me.DATE, rel.AGENDA_ITEM_ID, sub.MAX_DATE, co.ACRONYM
from MEETINGS me
join JOIN_AGENDA_ITEMS_MEETINGS rel /* Note 1*/
on me.ID = rel.MEETING_ID
left join (
select t1.AGENDA_ITEM_ID, max(t2.DATE) MAX_DATE
from JOIN_AGENDA_ITEMS_MEETINGS t1
join MEETINGS t2
on t2.ID = t1.MEETING_ID
where t2.DATE < '2017-02-02'
group by t1.AGENDA_ITEM_ID
) sub
on rel.AGENDA_ITEM_ID = sub.AGENDA_ITEM_ID /* Note 2 */
left join JOIN_AGENDA_ITEMS_MEETINGS rel2
on rel2.AGENDA_ITEM_ID = rel.AGENDA_ITEM_ID /* Note 3 */
left join MEETINGS me2
on rel2.MEETING_ID = me2.ID and
sub.MAX_DATE = me2.DATE /* Note 4 */
left join COMMITTEES co
on co.ID = me2.COMMITTEE_ID
where me.ID = 2 and
(sub.MAX_DATE is null or me2.DATE is not null) /* Note 5 */
order by rel.AGENDA_ITEM_ID, rel2.MEETING_ID;
Notes
you don't need the join with AGENDA_ITEMS, since the ID is already available in the relationship table
up to this point we have current meeting, its agenda items and their "calculated" max date
we get all meetings of each agenda item...
...so that we can pick the meeting whom date matches the max date we calculated previously
this condition is needed because all the joins from rel2 on have to be left (because some agenda item may have no previous meeting and hence MAX_DATE = null) but this way me2 would give some agenda items undesired meetings.

Mysql query: fastest way to find users followed by or following another user

I have the following tables:
Relationships
id, follower_id, followee_id, status
Users
id, name, email
I want to find all users who are either following or followed by a specific user.
This is what I have so far but it is very slow:
SELECT DISTINCT
`users`.*
FROM
`users`
INNER JOIN
`relationships` ON ((`users`.`id` = `relationships`.`follower_id`
AND `relationships`.`followee_id` = 1)
OR (`users`.`id` = `relationships`.`followee_id`
AND `relationships`.`follower_id` = 1))
WHERE
`relationships`.`status` = 'following'
ORDER BY `users`.`id`
What I mean by slow
I have one user who has roughly 600 followers and 600 following and it takes about 5 seconds for this query to run which seems insanely slow for those numbers!
The explain method shows the following:
+----+-------------+---------------+------+-----------------------------------------------------------------------+------+---------+------+------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+-----------------------------------------------------------------------+------+---------+------+------+------------------------------------------------+
| 1 | SIMPLE | relationships | ALL | index_relationships_on_followed_id,index_relationships_on_follower_id | NULL | NULL | NULL | 727 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | users | ALL | PRIMARY | NULL | NULL | NULL | 767 | Range checked for each record (index map: 0x1) |
+----+-------------+---------------+------+-----------------------------------------------------------------------+------+---------+------+------+------------------------------------------------+
Try breaking this into two queries, with a union:
SELECT u.*
FROM `users` u INNER JOIN
`relationships` r
ON u.`id` = r.`follower_id` AND r.`followee_id` = 1
WHERE `r.`status` = 'following'
UNION
SELECT u.*
FROM `users` u INNER JOIN
`relationships` r
ON u.`id` = r.`followee_id` AND r.`follower_id` = 1
WHERE `r.`status` = 'following'
ORDER BY id;
This may be a case where a more complicated query has better performance. These queries will also benefit from indexes: relationships(status, follower_id, followee_id) and relationships(status, followee_id, follower_id).

inner join on multiple tables, count & distinct

I have 3 tables and I am trying to join those tables with inner join. however when I use count(distinct column_id) it mysql through error which is
SQL syntax : check
for the right syntax to use near '(DISTINCT as_ticket.vehicle_id) FROM as_vehicle INNER JOIN as_ticket
My Query
SELECT
`as_vehicle`.`make`, `as_vehicle`.`model`, `as_odometer`.`value`
COUNT (DISTINCT `as_ticket`.`vehicle_id`)
FROM `as_vehicle`
INNER JOIN `as_ticket`
ON `as_vehicle`.`vehicle_id` = `as_ticket`.`vehicle_id`
INNER JOIN `as_odometer`
ON `as_odometer`.`vehicle_id` = `as_vehicle`.`vehicle_id`
WHERE `as_ticket`.`vehicle_id` = 7
ORDER BY `as_odometer`.`value`
DESC
Tbl as_vehicle
+------------+-------------+---------+
| vehicle_id |make | model |
+------------+-------------+---------|
| 1 | HYUNDAI | SOLARIS |
| 2 | A638EA15 | ACCENT |
+-------------+------------+---------+
Tbl as_odometer;
+------------+-------+
| vehicle_id | value |
+------------+-------+
| 1 | 10500 |
| 5 | 20000 |
| 1 | 20000 |
+------------+-------+
Tbl service
+-----------+------------+
| ticket_id | vehicle_id |
+-----------+------------+
| 1 | 1 |
| 2 | 1 |
+-----------+------------+
You forgot a comma before count.
SELECT `as_vehicle`.`make`, `as_vehicle`.`model`, `as_odometer`.`value`,
count(DISTINCT `as_ticket`.`vehicle_id`) // here ---^
First, you should not have a space after the count() and you have a missing comma (as already noted). More importantly, you don't have a group by, so your query will return one row.
And, because of the where clause, the value will always be "1". You have restricted the query to just one vehicle id.
I suspect the query you want is more like:
SELECT `as_vehicle`.`make`, `as_vehicle`.`model`, `as_odometer`.`value`
COUNT(*)
FROM `as_vehicle` INNER JOIN
`as_ticket`
ON `as_vehicle`.`vehicle_id` = `as_ticket`.`vehicle_id` INNER JOIN
`as_odometer`
ON `as_odometer`.`vehicle_id` = `as_vehicle`.`vehicle_id`
WHERE `as_ticket`.`vehicle_id` = 7
GROUP BY `as_vehicle`.`make`, `as_vehicle`.`model`, `as_odometer`.`value`
ORDER BY `as_odometer`.`value` DESC;
Also, you should learn to use table aliases and all those backquotes don't help the query.

How to optimize join which causes very slow performace

This query runs more than 12 seconds, even though all tables are relatively small - about 2 thousands rows.
SELECT attr_73206_ AS attr_73270_
FROM object_73130_ f1
LEFT OUTER JOIN (
SELECT id_field, attr_73206_ FROM (
SELECT m.id_field, t0.attr_73102_ AS attr_73206_ FROM object_73200_ o
INNER JOIN master_slave m ON (m.id_object = 73130 OR m.id_object = 73290) AND (m.id_master = 73200 OR m.id_master = 73354) AND m.id_slave_field = o.id
INNER JOIN object_73101_ t0 ON t0.id = o.attr_73206_
ORDER BY o.id_order
) AS o GROUP BY o.id_field
) AS o ON f1.id = o.id_field
Both tables have fields id as primary keys. Besides, id_field, id_order,attr_73206_ and all fields in master_slave are indexed. As for the logic of this query, on the whole it's of master-detail kind. Table object_73130_ is a master-table, table object_73200_ is a detail-table. They are linked by a master_slave table. object_73101_ is an ad-hoc table used to get a real value for the field attr_73206_ by its id. For each row in the master table the query returns a field from the very first row of its detail table. Firstly, the query had another look, but here at stackoverflow I was advised to use this more optimized structure (instead of a subquery which was used previously, and, by the way, the query started to run much faster). I observe that the subquery in the first JOIN block runs very fast but returns a number of rows comparable to the number of rows in the main master-table. In any way, I do not know how to optimize it. I just wonder why a simple fast-running join causes so much trouble. Oh, the main observation is that if I remove an ad-hoc object_73101_ from the query to return just an id, but not a real value, then the query runs as quick as a flash. So, all attention should be focused on this part of the query
INNER JOIN object_73101_ t0 ON t0.id = o.attr_73206_
Why does it slow down the whole query so terribly?
EDIT
In this way it runs super-fast
SELECT t0.attr_73102_ AS attr_73270_
FROM object_73130_ f1
LEFT OUTER JOIN (
SELECT id_field, attr_73206_ FROM (
SELECT m.id_field, attr_73206_ FROM object_73200_ o
INNER JOIN master_slave m ON (m.id_object = 73130 OR m.id_object = 73290) AND (m.id_master = 73200 OR m.id_master = 73354) AND m.id_slave_field = o.id
ORDER BY o.id_order
) AS o GROUP BY o.id_field
) AS o ON f1.id = o.id_field
LEFT JOIN object_73101_ t0 ON t0.id = o.attr_73206_
So, you can see, that I just put the add-hoc join outside of the subquery. But, the problem is, that subquery is automatically created and I have an access to that part of algo which creates it and I can modify this algo, and I do not have access to the part of algo which builds the whole query, so the only thing I can do is just to fix the subquery somehow. Anyway, I still can't understand why INNER JOIN inside a subquery can slow down the whole query hundreds of times.
EDIT
A new version of query with different aliases for each table. This has no effect on the performance:
SELECT attr_73206_ AS attr_73270_
FROM object_73130_ f1
LEFT OUTER JOIN (
SELECT id_field, attr_73206_ FROM (
SELECT m.id_field, t0.attr_73102_ AS attr_73206_ FROM object_73200_ a
INNER JOIN master_slave m ON (m.id_object = 73130 OR m.id_object = 73290) AND (m.id_master = 73200 OR m.id_master = 73354) AND m.id_slave_field = a.id
INNER JOIN object_73101_ t0 ON t0.id = a.attr_73206_
ORDER BY a.id_order
) AS b GROUP BY b.id_field
) AS c ON f1.id = c.id_field
EDIT
This is the result of EXPLAIN command:
| id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | ROWS | Extra |
| 1 | PRIMARY | f1 | INDEX | NULL | PRIMARY | 4 | 1570 | USING INDEX
| 1 | PRIMARY | derived2| ALL | NULL | NULL | NULL | 1564 |
| 2 | DERIVED | derived3| ALL | NULL | NULL | NULL | 1575 | USING TEMPORARY; USING filesort
| 3 | DERIVED | m | RANGE | id_object,id_master,..| id_object | 4 | 1356 | USING WHERE; USING TEMPORARY; USING filesort
| 3 | DERIVED | a | eq_ref | PRIMARY,attr_73206_ | PRIMARY | 4 | 1 |
| 3 | DERIVED | t0 | eq_ref | PRIMARY | PRIMARY | 4 | 1 |
What is wrong with that?
EDIT
Here is the result of EXPLAIN command for the "super-fast" query
| id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | ROWS | Extra
| 1 | PRIMARY | f1 | INDEX | NULL | PRIMARY | 4 | 1570 | USING INDEX
| 1 | PRIMARY | derived2| ALL | NULL | NULL | NULL | 1570 |
| 1 | PRIMARY | t0 | eq_ref| PRIMARY | PRIMARY | 4 | 1 |
| 2 | DERIVED | derived3| ALL | NULL | NULL | NULL | 1581 | USING TEMPORARY; USING filesort
| 3 | DERIVED | m | RANGE | id_object,id_master,| id_bject | 4 | 1356 | USING WHERE; USING TEMPORARY; USING filesort
| 3 | DERIVED | a | eq_ref | PRIMARY | PRIMARY | 4 | 1 |
CLOSED
I will use my own "super-fast" query, which I presented above. I think it is impossible to optimize it anymore.
Without knowing the exact nature of the data/query, there are a couple things that I'm seeing:
MySQL is notoriously bad at handling sub-selects, as it requires the creation of derived tables. In fact, some versions of MySQL also ignore indexes when using sub-selects. Typically, it's better to use JOINs instead of sub-selects, but if you need to use sub-selects, it's best to make that sub-select as lean as possible.
Unless you have a very specific reason for putting the ORDER BY in the sub-select, it may be a good idea to move it to the "main" query portion because the result set may be smaller (allowing for quicker sorting).
So all that being said, I tried to re-write your query using JOIN logic, but I was wondering What table the final value (attr_73102_) is coming from? Is it the result of the sub-select, or is it coming from table object_73130_? If it's coming from the sub-select, then I don't see why you're bothering with the original LEFT JOIN, as you will only be returning the list of values from the sub-select, and NULL for any non-matching rows from object_73130_.
Regardless, not knowing this answer, I think the query below MAY be syntactically equivalent:
SELECT t0.attr_73102_ AS attr_73270_
FROM object_73130_ f1
LEFT JOIN (object_73200_ o
INNER JOIN master_slave m ON m.id_slave_field = o.id
INNER JOIN object_73101_ t0 ON t0.id = o.attr_73206_)
ON f1.id = o.id_field
WHERE m.id_object IN (73130,73290)
AND m.id_master IN (73200,73354)
GROUP BY o.id_field
ORDER BY o.id_order;