I need a piece of advice, building an app now and I need to run some queries on rather large tables, possibly at a very frequent rate, so I'm trying to get the best approach performance wise.
I have the following 2 tables:
Albums:
+---------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| eventid | int(11) | NO | MUL | NULL | |
| album | varchar(200) | NO | | NULL | |
| filename | varchar(200) | NO | | NULL | |
| obstacle_time | time | NO | | NULL | |
+---------------+--------------+------+-----+---------+----------------+
and keywords:
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| eventid | int(11) | NO | MUL | NULL | |
| filename | varchar(200) | NO | | NULL | |
| bibnumbers | varchar(200) | NO | | NULL | |
| gender | varchar(20) | YES | | NULL | |
| top_style | varchar(20) | YES | | NULL | |
| pants_style | varchar(20) | YES | | NULL | |
| other | varchar(20) | YES | | NULL | |
| cap | varchar(200) | NO | | NULL | |
| tshirt | varchar(200) | NO | | NULL | |
| pants | varchar(200) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
Both table have a unique_index declared which is a constraint of the eventid+filename column.
Both table contains information about some images, but the albums table is available instantly (as soon as I have the images), while the keywords table usually becomes available several days later after a manual tagging of the images is completed
Now I will have people searching for all kind of things once the tagging is enabled, but since the results can be HUGE (up to 10.000 or more) I'm only showing them in small chunks so the browser doesn't get killed with trying to load a huge amount of images, because of this my server will be hit with loads of query requests (every time the visitor scrolls to the bottom of the page, an ajax query will return the next chunk of images).
Now my question is, which of the following queries is better performance wise:
SELECT `albums`.`filename`,`basket`.`id`,`albums`.`id`,`obstacle_time`
FROM `albums`
LEFT JOIN `basket`
ON `basket`.`eventid` = `albums`.`eventid`
AND `basket`.`fileid` = `albums`.`id`
AND `basket`.`visitor_id` = 1
LEFT JOIN `keywords`
ON `keywords`.`eventid` = `albums`.`eventid`
AND `albums`.`filename` = `keywords`.`filename`
WHERE
`albums_2015`.`eventid` = 1
AND `album` LIKE '%string%'
AND `obstacle_time` >= '08:00:00'
AND `obstacle_time` <= '14:11:10'
AND `gender` = 1
AND `top_style` REGEXP '[[:<:]]0[[:>:]]|[[:<:]]1[[:>:]]'
AND `cap` = '2'
AND `tshirt` = '1'
AND `pants` = '3'
ORDER BY `obstacle_time`
LIMIT X, 10
OR using an IN CLAUSE inside WHERE like:
SELECT `albums`.`filename`,`basket`.`id`,`albums`.`id`,`obstacle_time`
FROM `albums`
LEFT JOIN `basket`
ON `basket`.`eventid` = `albums`.`eventid`
AND `basket`.`fileid` = `albums`.`id`
AND `basket`.`visitor_id` = 1
WHERE
`albums_2015`.`eventid` = 1
AND `album` LIKE '%string%'
AND `obstacle_time` >= '08:00:00'
AND `obstacle_time` <= '14:11:10'
AND `filename` IN (
SELECT `filename`
FROM `keywrods`
WHERE
`eventid` = 1
AND `gender` = 1
AND `top_style` REGEXP '[[:<:]]0[[:>:]]|[[:<:]]1[[:>:]]'
AND `cap` = '2'
AND `tshirt` = '1'
AND `pants` = '3'
)
ORDER BY `obstacle_time`
LIMIT X, 10
I had looked to similar questions but wasn't able to figure it out which is the best course of action.
My understanding so far is that:
Using LEFT JOINtakes advantages of INDEXING, BUT!!! if I use it I will get a full join of the tables even when I only need a significantly smaller result set, so it's almost a wast to join thousands of rows just to then filter out most of them.
Using IN and subquery isn't indexed??? I'm not 100% sure about this, I'm using MySQL 5.6 and to the best of my understanding since 5.6 even subqueries get automatically indexed my MySQL. I think this method has benefits when there result is significantly filtered, not sure if there will be any benefit if the subquery will return all the possible filenames.
As footnote questions:
Should I consider returning the whole result to the client on the first query and use client side (HTML) techniques to load the images gradually rather than re-querying the server each time?
Should I consider merging the 2 tables into 1, how much of a performance impact will that have? (can be tricky due to various reasons, which have no place in the question)
Thanks.
EDIT 1
Explain for JOIN query:
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+
| 1 | SIMPLE | albums_2015 | ref | unique_index | unique_index | 4 | const | 6475 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | basket | ALL | NULL | NULL | NULL | NULL | 2 | Using where; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | keywords_2015 | eq_ref | unique_index | unique_index | 206 | const,mybibnumber.albums_2015.filename | 1 | Using index |
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+
Using WHERE IN:
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+--+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | |
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+--+
| 1 | SIMPLE | albums_2015 | ref | unique_index | unique_index | 4 | const | 6475 | Using where; Using temporary; Using filesort | |
| 1 | SIMPLE | keywords_2015 | eq_ref | unique_index | unique_index | 206 | const,mybibnumber.albums_2015.filename | 1 | Using where | |
| 1 | SIMPLE | basket | ALL | NULL | NULL | NULL | NULL | 2 | Using where; Using join buffer (Block Nested Loop) | |
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+--+
EDIT 2
I wasn't able to set up a SQL Fiddler (keep getting error of something went wrong), so I have created a test database on one of my servers.
Address: http://188.165.217.185/phpmyadmin/, user: temp_test, pass: test_temp
I'm still building the whole thing and I don't have all the values filled in yet, like top_style, pants_style, etc, so a more appropriate query for the test scenario will be:
WHERE IN:
SELECT `albums_2015`.`filename`,
`albums_2015`.`id`,
`obstacle_time`
FROM `albums_2015`
WHERE `albums_2015`.`eventid` = 1
AND `album` LIKE '%'
AND `obstacle_time` >= '08:00:00'
AND `obstacle_time` <= '14:11:10'
AND `filename` IN (SELECT `filename`
FROM `keywords_2015`
WHERE eventid = 1
AND
`bibnumbers` REGEXP '[[:<:]]113[[:>:]]|[[:<:]]106[[:>:]]')
ORDER BY `obstacle_time`
LIMIT 0, 10
LEFT JOIN
SELECT `albums_2015`.`filename`,`albums_2015`.`id`,`obstacle_time`
FROM `albums_2015`
LEFT JOIN `keywords_2015`
ON `keywords_2015`.`eventid` = `albums_2015`.`eventid`
AND `albums_2015`.`filename` = `keywords_2015`.`filename`
WHERE
`albums_2015`.`eventid` = 1
AND `album` LIKE '%'
AND `obstacle_time` >= '08:00:00'
AND `obstacle_time` <= '14:11:10'
AND `bibnumbers` REGEXP '[[:<:]]113[[:>:]]|[[:<:]]106[[:>:]]'
ORDER BY `obstacle_time`
LIMIT 0, 10
More a bunch of tips :
Join using index are the best if you have to deal with multi table query,
Don't mind adding some index to speed up your query (index take space, but on INT field it's nothing and you gain way more than you lose).
In case of big table, caching the data in the distant table is usually a good idea.
An insert Trigger on TAG_table that cache the displayed part in the distant table (like the tag name for the overview of albums) can help you keeping your join query at a descent frequency.
Be careful with REGEX, it's something that hurt badly the perf. Adding a new table to split data is a better idea (and use indexing which is native optimisation)
For every field in a WHERE clause of a big and frequent query you should have an index on it. If you can't put one, then your DB model is f**cked-up and need to be changed.
Related
Have found an inefficient query in our system. content holds versions of slides, and this is supposed to select the highest version of a slide by id.
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901'
group by `slide_id`
) c ON `c`.`version` = `content`.`version`;
EXPLAIN
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | content | NULL | ref | PRIMARY,version | PRIMARY | 8 | const | 9703 | 100.00 | NULL |
| 2 | DERIVED | content | NULL | ref | PRIMARY,fk_content_slides_idx,thumbnail_asset_id,version,slide_id | fk_content_slides_idx | 8 | const | 1 | 100.00 | Using where; Using index |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
One big issue is that it returns almost all the slides in the system as the outer query does not filter by slide id. After adding that I get...
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901' group by `slide_id`
) c ON `c`.`version` = `content`.`version`
WHERE `slide_id` = '16901';
EXPLAIN
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | content | NULL | const | PRIMARY,fk_content_slides_idx,version,slide_id | PRIMARY | 16 | const,const | 1 | 100.00 | NULL |
| 2 | DERIVED | content | NULL | ref | PRIMARY,fk_content_slides_idx,thumbnail_asset_id,version,slide_id | fk_content_slides_idx | 8 | const | 1 | 100.00 | Using where; Using index |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
That reduces the amount of rows down to one correctly, but doesnt really speed things up.
There are indexes on version, slide_id and a unique key on version AND slide_id.
Is there anything else I can do to speed this up?
Use a TOP LIMIT 1 insetead of Max ?
m
MySQL seems to take an index (version, slide_id) to join the tables. You should get a better result with
SELECT `content`.*
FROM `content`
FORCE INDEX FOR JOIN (fk_content_slides_idx)
join (
SELECT `slide_id`, max(version) as `version` from `content`
WHERE `slide_id` = '16901' group by `slide_id`
) c ON `c`.`slide_id` = `content`.`slide_id` and `c`.`version` = `content`.`version`
You need an index that has slide_id as first column, I just guessed that's fk_content_slides_idx, if not, take another one.
The part FORCE INDEX FOR JOIN (fk_content_slides_idx) is just to enforce it, you should try if mysql takes it by itself without forcing (it should).
You might get even a slightly better result with an index (slide_id, version), it depends on the amount of data (e.g. the number of versions per id) if you see a difference (but you should not spam indexes, and you already have a lot on this table, but you can try it for fun.)
Just a suggestion i think you should avoid the group by slide_id because you are filter by one slide_id only (16901)
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901'
) c ON `c`.`version` = `content`.`version`
WHERE `slide_id` = '16901';
There exists a table:
CREATE TABLE person
(
id INT(10) PRIMARY KEY AUTO_INCREMENT,
nameFirst VARCHAR(255) DEFAULT '?',
nameSecond VARCHAR(255) DEFAULT '',
fatherNameFirst VARCHAR(255) DEFAULT NULL
);
Note: There are actually other columns there, 18 in total, but they are not being used here.
The goal is to set up father's first name from using second name of the child. It can be predicted from the second name (patronymic) in russian language, but not always correctly. So i plan to do what can be done automatically and some will do by hand later.
So the UPDATE done as follows:
UPDATE person AS child
LEFT JOIN
(SELECT DISTINCT nameFirst FROM person) AS parent
ON CONCAT(parent.nameFirst,'овна')=child.nameSecond OR CONCAT(parent.nameFirst,'ович')=child.nameSecond
SET child.fatherNameFirst=parent.nameFirst;
Eventually it will need to run on a table that has >2m entries, for now i have tried with the sample data of 400k. The problem is that after about an hour of my computer using one if its cores at 100% the query has not yet finished.
So i was thinking if i can break it up into subqueries, so these can be set to run one after another, but they should each take 5-10 minutes. This way if i need to do something, i can terminate currently running one and not lose a day of CPU time.
I have attempted to add: WHERE child.id<1000 but either it was still way too long or had little impact (perhaps i misunderstand how MariaDB opens up this update).
In case sample data will actually help somebody understand it better:
select id, nameFirst, nameSecond from person limit 10;
+----+--------------------+----------------------------+
| id | nameFirst | nameSecond |
+----+--------------------+----------------------------+
| 1 | Туликович | |
| 2 | Август | Михайлович |
| 3 | Август | Христианович |
| 4 | Александр | Александрович |
| 5 | Александр | Христьянович |
| 6 | Альберт | Викторович |
| 7 | Альбрехт | Александрович |
| 8 | Амалия | Андреевна |
| 9 | Амалия | Ивановна |
| 10 | Ангелина | Андреевна |
+----+--------------------+----------------------------+
fatherNameFirst is empty at this time.
You could break this down alphabetically by adding where nameFirst like 'A%' to the update query - and then run the query multiple times.
Given this sample data:
CREATE TABLE person
(
id INT(10) PRIMARY KEY AUTO_INCREMENT,
nameFirst VARCHAR(255) DEFAULT '?',
nameSecond VARCHAR(255) DEFAULT '',
fatherNameFirst VARCHAR(255) DEFAULT NULL
) DEFAULT CHARSET=utf8;
INSERT INTO person
(`id`, `nameFirst`, `nameSecond`)
VALUES
(1, 'Туликович', NULL),
(2, 'Август', 'Михайлович'),
(3, 'Август', 'Христианович'),
(4, 'Александр', 'Александрович'),
(5, 'Александр', 'Христьянович'),
(6, 'Альберт', 'Викторович'),
(7, 'Альбрехт', 'Александрович'),
(8, 'Амалия', 'Андреевна'),
(9, 'Амалия', 'Ивановна'),
(10, 'Ангелина', 'Андреевна')
;
with your query you get this EXPLAIN output:
+----+-------------+------------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | PRIMARY | child | ALL | NULL | NULL | NULL | NULL | 10 | NULL |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 10 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | person | ALL | NULL | NULL | NULL | NULL | 10 | Using temporary |
+----+-------------+------------+------+---------------+------+---------+------+------+----------------------------------------------------+
That's probably the worst you can get.
Let's see if we can rewrite this. First, there's absolutely no need for this subquery and the DISTINCT.
mysql > explain UPDATE person AS child
-> LEFT JOIN person parent
-> ON CONCAT(parent.nameFirst,'овна')=child.nameSecond OR CONCAT(parent.nameFirst,'ович')=child.nameSecond
-> SET child.fatherNameFirst=parent.nameFirst;
+----+-------------+--------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | child | ALL | NULL | NULL | NULL | NULL | 10 | NULL |
| 1 | SIMPLE | parent | ALL | NULL | NULL | NULL | NULL | 10 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+--------+------+---------------+------+---------+------+------+----------------------------------------------------+
This eliminates the Using temporary. That's good.
With an index on nameFirst, we can speed this up further.
CREATE INDEX idx_person_nameFirst ON person(nameFirst);
Then explaining again:
+----+-------------+--------+-------+---------------+----------------------+---------+------+------+-----------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+----------------------+---------+------+------+-----------------------------------------------------------------+
| 1 | SIMPLE | child | ALL | NULL | NULL | NULL | NULL | 10 | NULL |
| 1 | SIMPLE | parent | index | NULL | idx_person_nameFirst | 768 | NULL | 10 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+--------+-------+---------------+----------------------+---------+------+------+-----------------------------------------------------------------+
Not yet perfect, but it's using the index. This should speed things up a lot.
From here on, it gets hard to optimize further. You can experiment a bit by adjusting the join buffer size, but I recommend you do this in a session only.
SET SESSION join_buffer_size = <whatever value>;
Every thread connecting to your server uses its own join buffer. That's why you should only test it in a session. When you have very much connections on your server, memory consumption could get out of hand.
I have a big table, with 670k rows and I'm running a SELECT with a lot of WHEREs to search and filter useful results, the thing is sometimes there are NO results with the selected filters, and the query just goes all over the table and takes a lot of time, I'd like to stop the query if there are no results found in, say, 30 seconds.
This is my query:
SELECT date, s.name, l.id, l.title,ratingsum,numvotes,keyword,tag
from news_links l
LEFT JOIN sources s on s.id = l.source
WHERE
l.date BETWEEN STR_TO_DATE(?,'%Y-%m-%d')
AND STR_TO_DATE(?,'%Y-%m-%d')
AND s.name like ?
AND ((numvotes-1) *?) <= l.ratingsum
AND numvotes > ?
AND matches = 1
AND tag >= ?
AND tag <= ?
AND (l.title like ? or l.keyword like ?)
AND category >= ?
AND category <= ?
order by date desc
limit ?,15
I tried running a sub-query instead of joining but it didn't speed up the query.
News table(640k rows)
-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | UNI | NULL | auto_increment |
| link | varchar(450) | NO | PRI | NULL | |
| date | datetime | NO | MUL | NULL | |
| title | varchar(145) | NO | MUL | NULL | |
| source | int(11) | NO | MUL | NULL | |
| text | mediumtext | YES | | NULL | |
| numvotes | int(3) | NO | MUL | 0 | |
| ratingsum | int(3) | NO | | 0 | |
| matches | int(1) | NO | | 0 | |
| keyword | varchar(45) | YES | | NULL | |
| tag | int(1) | NO | | 0 | |
+-----------+--------------+------+-----+---------+----------------+
I have indexes set up on date,title,source,numvotes as well as the primary key on link
670k rows should run VERY fast in MySQL. You should have a closer look at your indices. Start adding a combined HASH index on news_links.source and news_links.matches:
ALTER TABLE news_links ADD INDEX myIdx1 USING HASH (source, matches)
What does EXPLAIN SELECT ... gives you with that?
After that you can try to improve the Performance further by including more Information in your index (Note that MySQL will use only one index per table). Add a BTREE index:
ALTER TABLE news_links ADD INDEX myIdx2 USING BTREE (source, matches, `date`)
BTREE will be good for range-queries (eg with a BETWEEN in it). HASH is good for equal/unequal conditions. If you want to index several columns with mixed conditions (range an equal) use BTREE
What does EXPLAIN SELECT ... gives you now?
Here's the query:
SELECT
u.uid as UID,
fuo.uid as FUO_UID,
fo.prid as FO_NAME
FROM
users u
LEFT OUTER JOIN firstpoint_users_organisations fuo ON (u.uid=fuo.uid)
LEFT OUTER JOIN firstpoint_organisations fo ON (fo.nid=fuo.nid)
WHERE
u.status=1 AND u.uid>1
ORDER BY u.uid
LIMIT 3;
And the tables:
users
+------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------+----------------+
| uid | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(60) | NO | UNI | | |
| status | tinyint(4) | NO | | 0 | |
+-----------------------------------------------------------------------------+
firstpoint_users_organisations
+-------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+-------+
| nid | int(10) unsigned | NO | PRI | 0 | |
| uid | int(10) unsigned | NO | PRI | 0 | |
+-------+------------------+------+-----+---------+-------+
firstpoint_organisations
+----------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------------+------+-----+---------+-------+
| nid | int(10) unsigned | NO | PRI | 0 | |
| prid | varchar(32) | NO | | | |
+------------------------------------------------------------+
I wish to show users.uid and firstpoint_organisations.prid for every row in users, even though some users won't have a prid, in which case I show NULL (hence the left outer joins). The connection should be as follows:
users
uid - firstpoint_users_organisations
\---->uid
nid - firstpoint_organisations
\-------->nid
prid
So each user (users) has a user id (uid), and the organisation they're associated with (firstpoint_users_organisation) has a node id (nid) and stores this association. The organisation's details are then stored in firstpoint_organisations.
So every user will have a prid, but if they don't, show NULL.
Now, if I do an INNER JOIN on firstpoint_users_organisations and then on firstpoint_organisations, I get a good query speed (the above query runs in 0.02 seconds). But, when I switch both to LEFT OUTER JOIN, so I can get all users, prid or no prid, the above query takes ~90 seconds to run.
Is there anything I can do to speed this query up? There are approx. 70,000 rows in the users table, but even with LIMIT 3, the making the INNER JOIN a LEFT OUTER JOIN takes a horrible amount of time. Interestingly, the query takes the same amount of time to run with LIMIT 30, so I think there's something fundamentally wrong with my query.
EXPLAIN as requested:
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+-------+----------------------------------------------+
| 1 | SIMPLE | u | range | PRIMARY | PRIMARY | 4 | NULL | 13152 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | fuo | index | NULL | PRIMARY | 8 | NULL | 3745 | Using index |
| 1 | SIMPLE | fo | eq_ref | PRIMARY | PRIMARY | 4 | dbdb-dbdb_uat.fuo.nid | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+-------+----------------------------------------------+
3 rows in set (0.00 sec)
Your query is pointlessly (because uid > 1 would include all but one of the users) using the index on uid, so use the IGNORE INDEX hint for that index:
SELECT
u.uid as UID,
fuo.uid as FUO_UID,
fo.prid as FO_NAME
FROM users u IGNORE INDEX (uid)
LEFT JOIN firstpoint_users_organisations fuo ON u.uid=fuo.uid
LEFT JOIN firstpoint_organisations fo ON fo.nid=fuo.nid
WHERE u.status=1
AND u.uid > 1
ORDER BY u.uid
LIMIT 3
You should put an index on users(status), which may give you some benefit if there are enough rows with status != 1
It is quite expected that changing the LIMIT would have no effect, because 70000 rows must be sorted before the limit is applied to know which rows are the first rows to return - the limit has little effect, except that less rows are returned to the client (less comma IO)
I'm a believer in "less code is good", so from a strictly style point of view I have removed non essential code from your query:
removed OUTER because there is no other kind of left join
removed brackets around join conditions because you don't need 'em
I would use a unique index on u.status,u.uid for that, because mysql has to make a fullscan to look, which entries has status = 1 I think.
I hope it's faster afterwards ;)
hoping you can help me on the right track to start optimising my queries. I've never thought too much about optimisation before, but I have a few queries similar to the one below and want to start concentrating on improving their efficiency. An example of a query which I badly need to optimise is as follows:
SELECT COUNT(*) AS `records_found`
FROM (`records_owners` AS `ro`, `records` AS `r`)
WHERE r.reg_no = ro.contact_no
AND `contacted_email` <> "0000-00-00"
AND `contacted_post` <> "0000-00-00"
AND `ro`.`import_date` BETWEEN "2010-01-01" AND "2010-07-11" AND `r`.`pa_date_of_birth` > "2010-01-01" AND EXISTS ( SELECT `number` FROM `roles` WHERE `roles`.`number` = r.`reg_no` )
Running EXPLAIN on the above produces the following:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+-------------+
| 1 | PRIMARY | r | ALL | NULL | NULL | NULL | NULL | 21533 | Using where |
| 1 | PRIMARY | ro | eq_ref | PRIMARY | PRIMARY | 4 | r.reg_no | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | roles | ALL | NULL | NULL | NULL | NULL | 189 | Using where |
As you can see, you have a dependent subquery, which is one of the worst thing performance-wise in MySQL. See here for tips:
http://dev.mysql.com/doc/refman/5.0/en/select-optimization.html
http://dev.mysql.com/doc/refman/5.0/en/in-subquery-optimization.html