Following query is working like expected and uses index
Query takes 0,0481 sec
SELECT
geodb_locations.name,
geodb_locations.name_url,
COUNT(user.uid) AS useranzahl
FROM
user
LEFT JOIN
geodb_locations ON geodb_locations.id=user.plz
WHERE
user.freigeben=1 AND
geodb_locations.adm0='AT'
GROUP BY user.plz
ORDER BY useranzahl DESC
LIMIT 25
Explain
If only country locale is changed within the query from AT to DE
Query takes about 2.5 sec and does not use index
SELECT
geodb_locations.name,
geodb_locations.name_url,
COUNT(user.uid) AS useranzahl
FROM
user
LEFT JOIN
geodb_locations ON geodb_locations.id=user.plz
WHERE
user.freigeben=1 AND
geodb_locations.adm0='DE'
GROUP BY user.plz
ORDER BY useranzahl DESC
LIMIT 25
Explain
Why is index not used by the optimizer of second query and how to improve the query.
2.5 sec are to long ..
If u.uid cannot be NULL, use COUNT(*) instead of COUNT(u.uid).
As already pointed out, remove LEFT.
Add these indexes:
user: (freigeben, plz)
geodb_locations: (adm0, name_url, name)
As for why the EXPLAIN changed, ... It is quite normal (but somewhat rare) for the distribution of the constants to determine what order the tables are touched (Austria is less common than Germany?) or which index to use.
Regardless of optimizations, this query will have to scan a lot more rows for DE than for AT; this has to happen before the sort (ORDER BY) and LIMIT.
Two things prevent much optimization:
The WHERE references both tables.
The ORDER BY depends on a computed value.
Related
I have problem with MySQL ORDER BY, it slows down query and I really don't know why, my query was a little more complex so I simplified it to a light query with no joins, but it stills works really slow.
Query:
SELECT
W.`oid`
FROM
`z_web_dok` AS W
WHERE
W.`sent_eRacun` = 1 AND W.`status` IN(8, 9) AND W.`Drzava` = 'BiH'
ORDER BY W.`oid` ASC
LIMIT 0, 10
The table has 946,566 rows, with memory taking 500 MB, those fields I selecting are all indexed as follow:
oid - INT PRIMARY KEY AUTOINCREMENT
status - INT INDEXED
sent_eRacun - TINYINT INDEXED
Drzava - VARCHAR(3) INDEXED
I am posting screenshoots of explain query first:
The next is the query executed to database:
And this is speed after I remove ORDER BY.
I have also tried sorting with DATETIME field which is also indexed, but I get same slow query as with ordering with primary key, this started from today, usually it was fast and light always.
What can cause something like this?
The kind of query you use here calls for a composite covering index. This one should handle your query very well.
CREATE INDEX someName ON z_web_dok (Drzava, sent_eRacun, status, oid);
Why does this work? You're looking for equality matches on the first three columns, and sorting on the fourth column. The query planner will use this index to satisfy the entire query. It can random-access the index to find the first row matching your query, then scan through the index in order to get the rows it needs.
Pro tip: Indexes on single columns are generally harmful to performance unless they happen to match the requirements of particular queries in your application, or are used for primary or foreign keys. You generally choose your indexes to match your most active, or your slowest, queries. Edit You asked whether it's better to create specific indexes for each query in your application. The answer is yes.
There may be an even faster way. (Or it may not be any faster.)
The IN(8, 9) gets in the way of easily handling the WHERE..ORDER BY..LIMIT completely efficiently. The possible solution is to treat that as OR, then convert to UNION and do some tricks with the LIMIT, especially if you might also be using OFFSET.
( SELECT ... WHERE .. = 8 AND ... ORDER BY oid LIMIT 10 )
UNION ALL
( SELECT ... WHERE .. = 9 AND ... ORDER BY oid LIMIT 10 )
ORDER BY oid LIMIT 10
This will allow the covering index described by OJones to be fully used in each of the subqueries. Furthermore, each will provide up to 10 rows without any temp table or filesort. Then the outer part will sort up to 20 rows and deliver the 'correct' 10.
For OFFSET, see http://mysql.rjweb.org/doc.php/index_cookbook_mysql#or
I have executed the below query but it takes 72sec to completed
SELECT ci.id, ci.city_real, co.country_name FROM cities ci
LEFT JOIN countries co ON(ci.country_id=co.country_id)
WHERE city_real in ('Delhi','Bangalore','Mumbai') ORDER BY population DESC
Then I have created indexes as per explain stat plan but it is no different which means it takes 72sec.
Then I have changed the query using use index
SELECT ci.id, ci.city_real, co.country_name FROM cities ci
LEFT JOIN countries co use index (idx_name) ON(ci.country_id=co.country_id)
WHERE city_real in ('Delhi','Bangalore','Mumbai') ORDER BY population DESC
this time also I am disappointed because this also takes 68 to 73 secs to complete.
Finally, I have used force index in my query
SELECT ci.id, ci.city_real, co.country_name FROM cities ci
LEFT JOIN countries co force index (idx_name) ON(ci.country_id=co.country_id)
WHERE city_real in ('Delhi','Bangalore','Mumbai') ORDER BY population DESC
Now I can see the result in just 7secs.
From this example my questions are:
Why use index didn't work?
When I use force index any side-effects will be occurred or not?
The number of rows in the table countries:- 1200000 rows
The number of rows in the table cities:- 45000000 rows
Thanks in Advance...
USE INDEX you give mysql the chance to select another index if it calculates the other is better,
With FORCE INDEX, you force mysql to use it and it will do that.
side effects the query could be slower, than the chosen one by mysql, but when you want another because it choose wrong, you only can get slower
Did you run each test twice? I strongly suspect that the difference (10x) is not due to the use of an index hint, but from the caching of data.
You ran the USE with nothing in the buffer_pool. So there was a lot of I/O to fetch the 45M rows from one table and 1.2M rows from the other.
Then you ran with FORCE, which worked the same way. But this time, it found all the data in RAM. So it ran 10 times as fast.
Further evidence can be found from EXPLAIN SELECT. This will say which index it did use.
I have this query
SELECT `PR_CODIGO`, `PR_EXIBIR`, `PR_NOME`, `PRC_DETALHES` FROM `PROPRIETARIOS` LEFT JOIN `PROPRIETARIOSCONTATOS` ON `PROPRIETARIOSCONTATOS`.`PRC_COD_CAD` = `PROPRIETARIOS`.`PR_CODIGO` WHERE `PR_EXIBIR` = 'T' LIMIT 20
It runs very fast, less than 1 second.
If i add GROUP BY, it takes several seconds (5+) to run. Even the Group By field being index.
I'm using group by because the query above returns repeated rows (i search for a name and his contacts on another table, show's 4 times same name).
How do i fix this?
With the GROUP BY clause, the LIMIT clause isn't applied until after the rows are collapsed by the group by operation.
To get an understanding of the operations that MySQL is performing and which indexes are being considered and chosen by the optimizer, we use EXPLAIN.
Unstated in the question is what "field" (columns or expressions) are in the GROUP BY clause. So we are only guessing.
Based on the query shown in the question...
SELECT pr.pr_codigo
, pr.pr_exibir
, pr.pr_nome
, prc.prc_detalhes
FROM `PROPRIETARIOS` pr
LEFT
JOIN `PROPRIETARIOSCONTATOS` prc
ON prc.prc_cod_cad = pr.pr_codigo
WHERE pr.pr_exibir = 'T'
LIMIT 20
Our guess at the most appropriate indexes...
... ON PROPRIETARIOSCONTATOS (prc_cod_cad, prc_detalhes)
... ON PROPRIETARIOS (pr_exibir, pr_codigo, pr_exibir, pr_nome)
Our guess is going to change depending on what column(s) are listed in the GROUP BY clause. And we might also suggest an alternative query to return an equivalent result.
But without knowing the GROUP BY clause, without knowing if our guesses about which table each column is from are correct, without knowing the column datatypes, without any estimates of cardinality, and without example data and expected output, ... we're flying blind and just making guesses.
Say I have an Order table that has 100+ columns and 1 million rows. It has a PK on OrderID and FK constraint StoreID --> Store.StoreID.
1) select * from 'Order' order by OrderID desc limit 10;
the above takes a few milliseconds.
2) select * from 'Order' o join 'Store' s on s.StoreID = o.StoreID order by OrderID desc limit 10;
this somehow can take up to many seconds. The more inner joins I add, slows it down further more.
3) select OrderID, column1 from 'Order' o join 'Store' s on s.StoreID = o.StoreID order by OrderID desc limit 10;
this seems to speed the execution up, by limiting the columns we select.
There are a few points that I dont understand here and would really appreciate it if anyone more knowledgeable with mysql (or rmdb query execution in general) can enlighten me.
Query 1 is fast since it's just a reverse lookup by PK and DB only needs to return the first 10 rows it encountered.
I don't see why Query 2 should take for ever. Shouldn't the operation be the same? i.e. get the first 10 rows by PK and then join with other tables. Since there's a FK constraint, it is guaranteed that the relationship will be satisfied. So DB doesn't need to join more rows than necessary and then trim the result, right? Unless, FK constraint allows null FK? In which case I guess a left join would make this much faster than an inner join?
Lastly, I'm guess query 3 is simply faster because less columns are used in those unnecessary joins? But why would the query execution need the other columns while joining? Shouldn't it just join using PKs first, and then get the columns for just the 10 rows?
Thanks!
My understanding is that the mysql engine applies limit after any join's happen.
From http://dev.mysql.com/doc/refman/5.0/en/select.html, The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization. (LIMIT is applied after HAVING.)
EDIT: You could try using this query to take advantage of the PK speed.
select * from (select * from 'Order' order by OrderID desc limit 10) o
join 'Store' s on s.StoreID = o.StoreID;
All of your examples are asking for tablescans of the existing tables, so none of them will be more or less performant than the degree to which mysql can cache the data or results. Some of your queries have order by or join criteria, which can take advantage of indexes purely to make the joining process more efficient, however, that still is not the same as having a set of criteria that will trigger the use of indexes.
Limit is not a criteria -- it can be thought of as filtration once a result set is determined. You save time on the client, once the result set is prepared, but not on the server.
Really, the only way to get the answers you are seeking is to become familiar with:
EXPLAIN EXTENDED your_sql_statement
The output of EXPLAIN will show you how many rows are being looked at by mysql, as well as whether or not any indexes are being used.
I have a hug table with 170,000 records.
What is difference between this query
Showing rows 0 - 299 (1,422 total, Query took 1.9008 sec)
SELECT 1 FROM `p_apartmentbuy` p
where
p.price between 500000000 and 900000000
and p.yard = 1
and p.dateadd between 1290000000 and 1320000000
ORDER BY `p`.`id` desc
limit 1669
Explain
And this one:
Showing rows 0 - 299 (1,422 total, Query took 0.2625 sec)
SELECT 1 FROM `p_apartmentbuy` p
where
p.price between 500000000 and 900000000
and p.yard = 1
and p.dateadd between 1290000000 and 1320000000
ORDER BY `p`.`id` desc
limit 1670
Explain:
Both of these queries are using 1 table with same data and have same where clasue, but only limit row count are different
MySQL has a buffer for sorting. When the stuff to be sorted is too big, it sorts chunks, then mergesorts them. This is called "filesort". Your 1670-th row apparently just overflows the sort buffer.
Read more details here.
Now why it picks another key for the in-memory sort... I am not too sure; but apparently its strategy is not quite good since it ends up being slower.
recap: odd that the query returning more rows runs much faster
this is not related to buffer vs file sort, sorting 1400 records takes well under 1 second
the first explain shows the query optimizer doing a linear scan, the second explain shows it using an index. Even a partially helpful index is usually much better than none at all.
Internally, mysql maintains stats about the size of indexes and tries to guess which index, or whether a linear scan would be faster. This estimate is data specific, I've seen mysql use the right index 99 times out of 100, but every now and then pick a different one and run the query 50x slower.
You can override the built-in query optimizer and specify the index to use manually, with SELECT ... FROM ... FORCE INDEX (...)