Sorting left join results on large open schema tables - mysql

I am designing an open schema database with the following table definitions
mysql> desc orders;
+-------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| json | text | NO | | NULL | |
+-------+---------+------+-----+---------+----------------+
mysql> desc ordersnames;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(330) | NO | UNI | NULL | |
+-------+--------------+------+-----+---------+----------------+
with an index on name
mysql> desc orderskeys;
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| reference | int(11) | NO | MUL | NULL | |
| nameref | int(11) | NO | MUL | NULL | |
| value | varchar(330) | NO | | NULL | |
+-----------+--------------+------+-----+---------+----------------+
with indices on:
reference,nameref,value
nameref,value
reference
All json fields (1 dimension only) have entry in the orderskeys table per existing field, whereby nameref is a reference to the field name as defined in ordersname.
I would typically query like this:
SELECT
orderskeysdeliveryPostcode.value deliveryPostcode,
orders.ID,
orderskeysCN.value CN
FROM
orders
JOIN ordersnames as ordersnamesCN
on ordersnamesCN.name = 'CN'
JOIN orderskeys as orderskeysCN
on orderskeysCN.nameref = ordersnamesCN.ID
and orderskeysCN.reference = orders.ID
and orderskeysCN.value = '10094'
JOIN ordersnames as ordersnamesdeliveryPostcode
on ordersnamesdeliveryPostcode.name = 'deliveryPostcode'
JOIN orderskeys as orderskeysdeliveryPostcode
on orderskeysdeliveryPostcode.nameref = ordersnamesdeliveryPostcode.ID
and orderskeysdeliveryPostcode.reference = orders.ID
order by deliveryPostcode
limit 0,1000
yielding a result set like this
+------------------+--------+-------+
| deliveryPostcode | ID | CN |
+------------------+--------+-------+
| NULL | 251018 | 10094 |
| NULL | 157153 | 10094 |
| NULL | 95419 | 10094 |
| B-5030 | 172944 | 10094 |
+------------------+--------+-------+
-> lightning fast even with 400k + orders records
However, not all record do contain all fields, so the above query will not yield the records that do not have a 'deliveryPostcode field', so I have to query like this
SELECT
orderskeysdeliveryPostcode.value deliveryPostcode,
orders.ID,
orderskeysCN.value CN
FROM
orders
JOIN ordersnames as ordersnamesCN
on ordersnamesCN.name = 'CN'
JOIN orderskeys as orderskeysCN
on orderskeysCN.nameref = ordersnamesCN.ID
and orderskeysCN.reference = orders.ID
and orderskeysCN.value = '10094'
JOIN ordersnames as ordersnamesdeliveryPostcode
on ordersnamesdeliveryPostcode.name = 'deliveryPostcode'
LEFT JOIN orderskeys as orderskeysdeliveryPostcode
on orderskeysdeliveryPostcode.nameref = ordersnamesdeliveryPostcode.ID
and orderskeysdeliveryPostcode.reference = orders.ID
limit 0,1000
-> equally fast, but as soon as I add an ORDER BY clause on the key value from a left joined table, mysql wants to do the sorting externally (temporary, filesort) instead of using an existing index.
SELECT
orderskeysdeliveryPostcode.value deliveryPostcode,
orders.ID,
orderskeysCN.value CN
FROM
orders
JOIN ordersnames as ordersnamesCN
on ordersnamesCN.name = 'CN'
JOIN orderskeys as orderskeysCN
on orderskeysCN.nameref = ordersnamesCN.ID
and orderskeysCN.reference = orders.ID
and orderskeysCN.value = '10094'
JOIN ordersnames as ordersnamesdeliveryPostcode
on ordersnamesdeliveryPostcode.name = 'deliveryPostcode'
LEFT JOIN orderskeys as orderskeysdeliveryPostcode
on orderskeysdeliveryPostcode.nameref = ordersnamesdeliveryPostcode.ID
and orderskeysdeliveryPostcode.reference = orders.ID
ORDER BY deliveryPostCode
limit 0,1000
-> very slow ...
In fact the sorting operation itself is not much different , as all NULL values for column deliveryPostcode would be at the beginning (ASC) or the end (DESC) while the rest of the dataset would have the same order as with JOIN instead of LEFT JOIN.
How can I query (and order) such tables efficiently? Do I need different relations or indices ?
Much obliged ...

With INNER JOINs, to reduce the number of lookups, MySQL is going to start with the table with the fewest rows (see the EXPLAIN result to see which table MySQL starts with).
If you order by anything other than a column in that first table, or there is no index to satisfy the ORDER BY clause on that first table, MySQL is going to have to do a filesort.
The use of a temporary table is much more likely when text columns are involved, and not just an in-memory temporary table, but a dreadful on-disk temporary table.
Use STRAIGHT_JOIN to force the order that MySQL performs inner joins.

I am not sure what logic do you have in some parts of your query.
I think it still can be optimized.
But just to resolve the issue you have, try just switch it to RIGHT JOIN for now:
SELECT
orderskeysdeliveryPostcode.value deliveryPostcode,
o.id,
o.CN
FROM orderskeys as orderskeysdeliveryPostcode
INNER JOIN ordersnames as ord_n
on ord_n.id = orderskeysdeliveryPostcode.nameref
AND ord_n.name = 'deliveryPostcode'
RIGHT JOIN (
SELECT
orders.ID,
orderskeysCN.CN
FROM
orders
LEFT JOIN
(SELECT
orderskeys.value as CN,
orderskeys.reference
FROM
orderskeys
INNER JOIN ordersnames as ordersnamesCN
ON ordersnamesCN.id = orderskeys.nameref
AND ordersnamesCN.name = 'CN'
WHERE orderskeys.value = '12209'
) as orderskeysCN
ON
orderskeysCN.reference = orders.ID
limit 0,1000
) as o
on
orderskeysdeliveryPostcode.reference = o.ID
ORDER BY deliveryPostCode;
and here is sqlfiddle we can play with. Just need you to add data inserts there.

Related

How can I efficiently store a 2-way "like" system similar to Tinder?

On Tinder, when 2 members like each other, they are a "match" and are able to communicate. If only one member likes another, then it's not a match.
I'm trying to store this "Like" system in MySQL but can't figure out the best way to do it that's efficient. This is my setup right now.
mysql> desc likes_likes;
+--------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| from_user_id | int(11) | NO | MUL | NULL | |
| to_user_id | int(11) | NO | MUL | NULL | |
| value | int(11) | NO | | NULL | |
| created_at | datetime | NO | | NULL | |
| updated_at | datetime | YES | | NULL | |
+--------------+----------+------+-----+---------+----------------+
6 rows in set (0.00 sec)
To find my matches, I would query something like...
SELECT to_user_id FROM likes_likes WHERE from_user_id = my_id AND value = 1 AND .... I don't know how to join the same table from here.
How do I perform the query on this table? If it's not efficient, what's a better structure to store this model?
1 is like, 0 is not like. Those are the only 2 values.
SELECT A.from_user_id AS userA, B.from_user_id AS userB
FROM likes_likes A
JOIN likes_likes B
ON A.from_user_id = B.to_user_id
AND A.to_user_id = B.from_user_id
AND A.id <> B.id
WHERE A.value = 1
AND B.value = 1
To find matches you can use a regular join with alias:
SELECT l1.from_user_id user1, l2.from_user_id user2
FROM likes_likes l1
INNER JOIN likes_likes l2 ON
l2.from_user_id = l1.to_user_id AND
l1.to_user_id = l2.from_user_id AND
l1.value = 1 AND l2.value = 1
The first condition checks whether the person user1 has liked or not liked user2 and that user2 has liked at least one other person.
The second condition completes the check so that we now have two persons who have expressed an opinion about each other.
The last two checks make sure that they both like each other :)
Here's a way using group by least(),greatest() to get each unique pair of users into a group and then checking if there are 2 rows per group
select least(from_user_id,to_user_id), greatest(from_user_id,to_user_id)
from likes_likes
where value = 1
-- and my_id in (from_user_id,to_user_id)
group by least(from_user_id,to_user_id), greatest(from_user_id,to_user_id)
having count(*) = 2
If it's possible to have multiple likes from the same user to another user (i.e. user 'A' likes user 'B' twice) then use having count(distinct from_user_id) = 2
Do you actually need value? If there is no row there is no like. From this query you should get 1 for a match and 0 for no mutual match.
SELECT
COUNT(*)
FROM
likes_like i_like_you
JOIN likes_like you_like_me ON i_like_you.to_user_id = you_like_me.from_user_id
WHERE
i_like_you.from_user_id = #my_id
AND you_like_me.from_user_id = #your_id
Is there any reason for id? It seems like the pair (from_user_id, to_user_id) should be UNIQUE, hence could be the 'natural' PRIMARY KEY.
I have yet to see any good argument for needing value.
So the table has shrunk to
CREATE TABLE likes_likes (
from_user_id ...,
to_user_id ...,
created_at ...,
updated_at ...,
PRIMARY KEY(from_user_id, to_user_id), -- serves as the necessary INDEX.
) ENGINE=InnoDB;
SELECT A.from_user_id AS userA,
B.from_user_id AS userB
FROM likes_likes A
JOIN likes_likes B
ON A.from_user_id = B.to_user_id
AND A.to_user_id = B.from_user_id
(I'm assuming you disallow a person liking himself.)

Mysql query: fastest way to find users followed by or following another user

I have the following tables:
Relationships
id, follower_id, followee_id, status
Users
id, name, email
I want to find all users who are either following or followed by a specific user.
This is what I have so far but it is very slow:
SELECT DISTINCT
`users`.*
FROM
`users`
INNER JOIN
`relationships` ON ((`users`.`id` = `relationships`.`follower_id`
AND `relationships`.`followee_id` = 1)
OR (`users`.`id` = `relationships`.`followee_id`
AND `relationships`.`follower_id` = 1))
WHERE
`relationships`.`status` = 'following'
ORDER BY `users`.`id`
What I mean by slow
I have one user who has roughly 600 followers and 600 following and it takes about 5 seconds for this query to run which seems insanely slow for those numbers!
The explain method shows the following:
+----+-------------+---------------+------+-----------------------------------------------------------------------+------+---------+------+------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+-----------------------------------------------------------------------+------+---------+------+------+------------------------------------------------+
| 1 | SIMPLE | relationships | ALL | index_relationships_on_followed_id,index_relationships_on_follower_id | NULL | NULL | NULL | 727 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | users | ALL | PRIMARY | NULL | NULL | NULL | 767 | Range checked for each record (index map: 0x1) |
+----+-------------+---------------+------+-----------------------------------------------------------------------+------+---------+------+------+------------------------------------------------+
Try breaking this into two queries, with a union:
SELECT u.*
FROM `users` u INNER JOIN
`relationships` r
ON u.`id` = r.`follower_id` AND r.`followee_id` = 1
WHERE `r.`status` = 'following'
UNION
SELECT u.*
FROM `users` u INNER JOIN
`relationships` r
ON u.`id` = r.`followee_id` AND r.`follower_id` = 1
WHERE `r.`status` = 'following'
ORDER BY id;
This may be a case where a more complicated query has better performance. These queries will also benefit from indexes: relationships(status, follower_id, followee_id) and relationships(status, followee_id, follower_id).

Slow count query with where clause

I am trying to perform a count to get the total number of results in a pagination but the query is too slow 2.12s
+-------+
| size |
+-------+
| 50000 |
+-------+
1 row in set (2.12 sec)
my count query
select count(appeloffre0_.ID_APPEL_OFFRE) as size
from ao.appel_offre appeloffre0_
inner join ao.acheteur acheteur1_
on appeloffre0_.ID_ACHETEUR=acheteur1_.ID_ACHETEUR
where
(exists (select 1 from ao.lot lot2_ where lot2_.ID_APPEL_OFFRE=appeloffre0_.ID_APPEL_OFFRE and lot2_.ESTIMATION_COUT>=1))
and (exists (select 1 from ao.lieu_execution lieuexecut3_ where lieuexecut3_.appel_offre=appeloffre0_.ID_APPEL_OFFRE and lieuexecut3_.region=1))
and (exists (select 1 from ao.ao_activite aoactivite4_ where aoactivite4_.ID_APPEL_OFFRE=appeloffre0_.ID_APPEL_OFFRE and (aoactivite4_.ID_ACTIVITE=1)))
and appeloffre0_.DATE_OUVERTURE_PLIS>'2015-01-01'
and (appeloffre0_.CATEGORIE='fournitures' or appeloffre0_.CATEGORIE='travaux' or appeloffre0_.CATEGORIE='services')
and acheteur1_.ID_ENTITE_MERE=2
explain cmd :
+----+--------------------+--------------+------+---------------------------------------------+--------------------+---------+--------------------------------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------+------+---------------------------------------------+--------------------+---------+--------------------------------+-------+--------------------------+
| 1 | PRIMARY | acheteur1_ | ref | PRIMARY,acheteur_ibfk_1 | acheteur_ibfk_1 | 5 | const | 3 | Using where; Using index |
| 1 | PRIMARY | appeloffre0_ | ref | appel_offre_ibfk_2 | appel_offre_ibfk_2 | 4 | ao.acheteur1_.ID_ACHETEUR | 31061 | Using where |
| 4 | DEPENDENT SUBQUERY | aoactivite4_ | ref | ao_activites_activite_fk,ao_activites_ao_fk | ao_activites_ao_fk | 4 | ao.appeloffre0_.ID_APPEL_OFFRE | 3 | Using where |
| 3 | DEPENDENT SUBQUERY | lieuexecut3_ | ref | fk_ao_lieuex,fk_region_lieuex | fk_ao_lieuex | 4 | ao.appeloffre0_.ID_APPEL_OFFRE | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | lot2_ | ref | FK_LOT_AO | FK_LOT_AO | 4 | ao.appeloffre0_.ID_APPEL_OFFRE | 5 | Using where |
+----+--------------------+--------------+------+---------------------------------------------+--------------------+---------+--------------------------------+-------+--------------------------+
the index acheteur_ibfk_1 is a FK references table ENTITE_MERE because i have and acheteur1_.ID_ENTITE_MERE=2 in where clause.
You can have multiple conditions on your joins by using ON condition1 AND condition2 etc.
SELECT COUNT(appeloffre0_.ID_APPEL_OFFRE) as size
FROM ao.appel_offre appeloffre0_
JOIN ao.acheteur acheteur1_ ON appeloffre0_.ID_ACHETEUR=acheteur1_.ID_ACHETEUR
JOIN ao.lot lot2_ ON appeloffre0_.ID_APPEL_OFFRE=lot2_.ID_APPEL_OFFRE AND lot2_.ESTIMATION_COUT>=1
JOIN ao.lieu_execution lieuexecut3_ ON appeloffre0_.ID_APPEL_OFFRE=lieuexecut3_.ID_APPEL_OFFRE AND lieuexecut3_.ID_ACTIVITE=1
JOIN ao.ao_activite aoactivite4_ ON appeloffre0_.ID_APPEL_OFFRE=aoactivite4_.ID_APPEL_OFFRE AND aoactivite4_.ID_ACTIVITE=1
WHERE appeloffre0_.DATE_OUVERTURE_PLIS>'2015-01-01'
AND (appeloffre0_.CATEGORIE='fournitures' OR appeloffre0_.CATEGORIE='travaux' OR appeloffre0_.CATEGORIE='services')
AND acheteur1_.ID_ENTITE_MERE=2;
You can try:
select count(aa.ID_APPEL_OFFRE) as size
from (
select ID_APPEL_OFFRE, ID_ACHETEUR from ao.appel_offre appeloffre0_
inner join ao.acheteur acheteur1_
on appeloffre0_.ID_ACHETEUR=acheteur1_.ID_ACHETEUR
where appeloffre0_.DATE_OUVERTURE_PLIS>'2015-01-01'
and (appeloffre0_.CATEGORIE in ('fournitures','travaux','services'))
and (acheteur1_.ID_ENTITE_MERE=2)) aa
inner join ao.lot lot2_ on lot2_.ID_APPEL_OFFRE=aa.ID_APPEL_OFFRE
inner join ao.lieu_execution lieuexecut3_ on lieuexecut3_.appel_offre=aa.ID_APPEL_OFFRE
inner join ao.ao_activite aoactivite4_ on aoactivite4_.ID_APPEL_OFFRE=aa.ID_APPEL_OFFRE
where
aoactivite4_.ID_ACTIVITE=1
and lot2_.ESTIMATION_COUT>=1
and lieuexecut3_.region=1;
But I haven't seen your tables so I am not 100% sure that you won't get duplicates because of joins.
A couple of low-hanging fruits might also be found by ensuring that your appeloffre0_.CATEGORIE and appeloffre0_.DATE_OUVERTURE_PLIS have indexes on them.
Other fields which should have indexes on them are ao.lot.ID_APPEL_OFFRE, ao.lieu_execution.ID_APPEL_OFFRE and ao.ao_activite.ID_APPEL_OFFRE, and ao.appel_offre.ID_ACHETEUR (all the joined fields).
I would have the following indexes on your tables if not already... These are covering indexes for your query meaning the index has the applicable column to get your results without having to go to the actual raw data pages.
table index
appel_offre ( DATE_OUVERTURE_PLIS, CATEGORIE, ID_APPEL_OFFRE, ID_ACHETEUR )
lot ( ID_APPEL_OFFRE, ESTIMATION_COUT )
lieu_execution ( appel_offre, region )
ao_activite ( ID_APPEL_OFFRE, ID_ACTIVITE )
Having indexes on just individual columns won't really help optimize what you are looking for. Also, I am doing count of DISTINCT ID_APPEL_OFFRE's in case any of the JOINed tables have more than 1 record, it does not create a Cartesian result count for you
select
count(distinct AOF.ID_APPEL_OFFRE) as size
from
ao.appel_offre AOF
JOIN ao.acheteur ACH
on AOF.ID_ACHETEUR = ACH.ID_ACHETEUR
and ACH.ID_ENTITE_MERE = 2
JOIN ao.lot
ON AOF.ID_APPEL_OFFRE = lot.ID_APPEL_OFFRE
and lot.ESTIMATION_COUT >= 1
JOIN ao.lieu_execution EX
ON AOF.ID_APPEL_OFFRE = EX.appel_offre
and EX.region = 1
JOIN ao.ao_activite ACT
ON AOF.ID_APPEL_OFFRE = ACT.ID_APPEL_OFFRE
and ACT.ID_ACTIVITE = 1
where
AOF.DATE_OUVERTURE_PLIS > '2015-01-01'
and ( AOF.CATEGORIE = 'fournitures'
or AOF.CATEGORIE = 'travaux'
or AOF.CATEGORIE = 'services')
Like #FuzzyTree said in his comment exists is faster than an inner join if it's not a 1:1 relationship because it terminates as soon as it finds 1 whereas the join will get every matching row.
But the solution is that We add in and not exists :
where ( appeloffre0_.ID_APPEL_OFFRE IN (select lot2_.ID_APPEL_OFFRE from ao.lot lot2_
where lot2_.ESTIMATION_COUT>=1)
)
So the query run very fast than exists or joins .
select count(appeloffre0_.ID_APPEL_OFFRE) as size
from ao.appel_offre appeloffre0_
inner join ao.acheteur acheteur1_
on appeloffre0_.ID_ACHETEUR=acheteur1_.ID_ACHETEUR
where
( appeloffre0_.ID_APPEL_OFFRE IN (select lot2_.ID_APPEL_OFFRE from ao.lot lot2_ where lot2_.ESTIMATION_COUT>=1))
and (appeloffre0_.ID_APPEL_OFFRE IN (select lieuexecut3_.appel_offre from ao.lieu_execution lieuexecut3_ where lieuexecut3_.region=1))
and (appeloffre0_.ID_APPEL_OFFRE IN (select aoactivite4_.ID_APPEL_OFFRE from ao.ao_activite aoactivite4_ where aoactivite4_.ID_ACTIVITE=1 ))
and appeloffre0_.DATE_OUVERTURE_PLIS>'2015-01-01'
and (appeloffre0_.CATEGORIE='fournitures' or appeloffre0_.CATEGORIE='travaux' or appeloffre0_.CATEGORIE='services')
and acheteur1_.ID_ENTITE_MERE=2

How to optimize join which causes very slow performace

This query runs more than 12 seconds, even though all tables are relatively small - about 2 thousands rows.
SELECT attr_73206_ AS attr_73270_
FROM object_73130_ f1
LEFT OUTER JOIN (
SELECT id_field, attr_73206_ FROM (
SELECT m.id_field, t0.attr_73102_ AS attr_73206_ FROM object_73200_ o
INNER JOIN master_slave m ON (m.id_object = 73130 OR m.id_object = 73290) AND (m.id_master = 73200 OR m.id_master = 73354) AND m.id_slave_field = o.id
INNER JOIN object_73101_ t0 ON t0.id = o.attr_73206_
ORDER BY o.id_order
) AS o GROUP BY o.id_field
) AS o ON f1.id = o.id_field
Both tables have fields id as primary keys. Besides, id_field, id_order,attr_73206_ and all fields in master_slave are indexed. As for the logic of this query, on the whole it's of master-detail kind. Table object_73130_ is a master-table, table object_73200_ is a detail-table. They are linked by a master_slave table. object_73101_ is an ad-hoc table used to get a real value for the field attr_73206_ by its id. For each row in the master table the query returns a field from the very first row of its detail table. Firstly, the query had another look, but here at stackoverflow I was advised to use this more optimized structure (instead of a subquery which was used previously, and, by the way, the query started to run much faster). I observe that the subquery in the first JOIN block runs very fast but returns a number of rows comparable to the number of rows in the main master-table. In any way, I do not know how to optimize it. I just wonder why a simple fast-running join causes so much trouble. Oh, the main observation is that if I remove an ad-hoc object_73101_ from the query to return just an id, but not a real value, then the query runs as quick as a flash. So, all attention should be focused on this part of the query
INNER JOIN object_73101_ t0 ON t0.id = o.attr_73206_
Why does it slow down the whole query so terribly?
EDIT
In this way it runs super-fast
SELECT t0.attr_73102_ AS attr_73270_
FROM object_73130_ f1
LEFT OUTER JOIN (
SELECT id_field, attr_73206_ FROM (
SELECT m.id_field, attr_73206_ FROM object_73200_ o
INNER JOIN master_slave m ON (m.id_object = 73130 OR m.id_object = 73290) AND (m.id_master = 73200 OR m.id_master = 73354) AND m.id_slave_field = o.id
ORDER BY o.id_order
) AS o GROUP BY o.id_field
) AS o ON f1.id = o.id_field
LEFT JOIN object_73101_ t0 ON t0.id = o.attr_73206_
So, you can see, that I just put the add-hoc join outside of the subquery. But, the problem is, that subquery is automatically created and I have an access to that part of algo which creates it and I can modify this algo, and I do not have access to the part of algo which builds the whole query, so the only thing I can do is just to fix the subquery somehow. Anyway, I still can't understand why INNER JOIN inside a subquery can slow down the whole query hundreds of times.
EDIT
A new version of query with different aliases for each table. This has no effect on the performance:
SELECT attr_73206_ AS attr_73270_
FROM object_73130_ f1
LEFT OUTER JOIN (
SELECT id_field, attr_73206_ FROM (
SELECT m.id_field, t0.attr_73102_ AS attr_73206_ FROM object_73200_ a
INNER JOIN master_slave m ON (m.id_object = 73130 OR m.id_object = 73290) AND (m.id_master = 73200 OR m.id_master = 73354) AND m.id_slave_field = a.id
INNER JOIN object_73101_ t0 ON t0.id = a.attr_73206_
ORDER BY a.id_order
) AS b GROUP BY b.id_field
) AS c ON f1.id = c.id_field
EDIT
This is the result of EXPLAIN command:
| id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | ROWS | Extra |
| 1 | PRIMARY | f1 | INDEX | NULL | PRIMARY | 4 | 1570 | USING INDEX
| 1 | PRIMARY | derived2| ALL | NULL | NULL | NULL | 1564 |
| 2 | DERIVED | derived3| ALL | NULL | NULL | NULL | 1575 | USING TEMPORARY; USING filesort
| 3 | DERIVED | m | RANGE | id_object,id_master,..| id_object | 4 | 1356 | USING WHERE; USING TEMPORARY; USING filesort
| 3 | DERIVED | a | eq_ref | PRIMARY,attr_73206_ | PRIMARY | 4 | 1 |
| 3 | DERIVED | t0 | eq_ref | PRIMARY | PRIMARY | 4 | 1 |
What is wrong with that?
EDIT
Here is the result of EXPLAIN command for the "super-fast" query
| id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | ROWS | Extra
| 1 | PRIMARY | f1 | INDEX | NULL | PRIMARY | 4 | 1570 | USING INDEX
| 1 | PRIMARY | derived2| ALL | NULL | NULL | NULL | 1570 |
| 1 | PRIMARY | t0 | eq_ref| PRIMARY | PRIMARY | 4 | 1 |
| 2 | DERIVED | derived3| ALL | NULL | NULL | NULL | 1581 | USING TEMPORARY; USING filesort
| 3 | DERIVED | m | RANGE | id_object,id_master,| id_bject | 4 | 1356 | USING WHERE; USING TEMPORARY; USING filesort
| 3 | DERIVED | a | eq_ref | PRIMARY | PRIMARY | 4 | 1 |
CLOSED
I will use my own "super-fast" query, which I presented above. I think it is impossible to optimize it anymore.
Without knowing the exact nature of the data/query, there are a couple things that I'm seeing:
MySQL is notoriously bad at handling sub-selects, as it requires the creation of derived tables. In fact, some versions of MySQL also ignore indexes when using sub-selects. Typically, it's better to use JOINs instead of sub-selects, but if you need to use sub-selects, it's best to make that sub-select as lean as possible.
Unless you have a very specific reason for putting the ORDER BY in the sub-select, it may be a good idea to move it to the "main" query portion because the result set may be smaller (allowing for quicker sorting).
So all that being said, I tried to re-write your query using JOIN logic, but I was wondering What table the final value (attr_73102_) is coming from? Is it the result of the sub-select, or is it coming from table object_73130_? If it's coming from the sub-select, then I don't see why you're bothering with the original LEFT JOIN, as you will only be returning the list of values from the sub-select, and NULL for any non-matching rows from object_73130_.
Regardless, not knowing this answer, I think the query below MAY be syntactically equivalent:
SELECT t0.attr_73102_ AS attr_73270_
FROM object_73130_ f1
LEFT JOIN (object_73200_ o
INNER JOIN master_slave m ON m.id_slave_field = o.id
INNER JOIN object_73101_ t0 ON t0.id = o.attr_73206_)
ON f1.id = o.id_field
WHERE m.id_object IN (73130,73290)
AND m.id_master IN (73200,73354)
GROUP BY o.id_field
ORDER BY o.id_order;

Select distinct records on a join

I have two mysql tables - a sales table:
+----------------+------------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------------------------+------+-----+---------+-------+
| StoreId | bigint(20) unsigned | NO | PRI | NULL | |
| ItemId | bigint(20) unsigned | NO | | NULL | |
| SaleWeek | int(10) unsigned | NO | PRI | NULL | |
+----------------+------------------------------+------+-----+---------+-------+
and an items table:
+--------------------+------------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+------------------------------+------+-----+---------+-------+
| ItemId | bigint(20) unsigned | NO | PRI | NULL | |
| ItemName | varchar(100) | NO | | NULL | |
+--------------------+------------------------------+------+-----+---------+-------+
The sales table contains multiple records for each ItemID - one for each SaleWeek. I want to select all items sold by joining the two tables like so:
SELECT items.ItemName, items.ItemId FROM items
JOIN sales ON items.ItemId = sales.ItemId
WHERE sales.StoreID = ? ORDER BY sales.SaleWeek DESC;
However, this is returning multiple ItemId values based on the multiple entries for each SaleWeek. Can I do a distinct select to only return one ItemID - I don't want to have to query for the latest SaleWeek because some items may not have an entry for the latest SaleWeek so I need to get the last sale. Do I need to specify DISTINCT or use a LEFT OUTER JOIN or something?
A DISTINCT should do what you're looking for:
SELECT DISTINCT items.ItemName, items.ItemId FROM items
JOIN sales ON items.ItemId = sales.ItemId
WHERE sales.StoreID = ? ORDER BY sales.SaleWeek DESC;
That would return only distinct items.ItemName, items.ItemId tuples.
You had comment about the sales week too. And wanting the most recent week, you may want to try using a GROUP BY
SELECT
items.ItemName,
items.ItemId,
max( Sales.SaleWeek ) MostRecentSaleWeek
FROM
items JOIN sales ON items.ItemId = sales.ItemId
WHERE
sales.StoreID = ?
GROUP BY
items.ItemID,
items.ItemName
ORDER BY
MostRecentSaleWeek, -- ordinal column number 3 via the MAX() call
items.ItemName
You may have to change the ORDER BY to the ordinal 3rd column reference if you so want based on that column.. This query will give you each distinct item AND the most recent week it was sold.
SELECT u.user_name,u.user_id, u.user_country,u.user_phone_no,ind.Industry_name,inv.id,u.user_email
FROM invitations inv
LEFT JOIN users u
ON inv.sender_id = u.user_id
LEFT JOIN employee_info ei
ON inv.sender_id=ei.employee_fb_id
LEFT JOIN industries ind
ON ei.industry_id=ind.id
WHERE inv.receiver_id='XXX'
AND inv.invitation_status='0'
AND inv.invitati
on_status_desc='PENDING'
GROUP BY (user_id)
We can use this:
INSERT INTO `test_table` (`id`, `name`) SELECT DISTINCT
a.`employee_id`,b.`first_name` FROM `employee_leave_details`as a INNER JOIN
`employee_register` as b ON a.`employee_id` = b.`employee_id`