Compound index required to speed up join-ed query? - mysql

A colleague asked me to explain how indexes (indices?) boost up performance; I tried to do so, but got confused myself.
I used the model below for explanation (an error/diagnostics logging database). It consists of three tables:
List of business systems, table "System" containing their names
List of different types of traces, table "TraceTypes", defining what kinds of error messages can be logged
Actual trace messages, having foreign keys from System and TraceTypes tables
I used MySQL for the demo, however I don't recall the table types I used. I think it was InnoDB.
System TraceTypes
----------------------------- ------------------------------------------
| ID | Name | | ID | Code | Description |
----------------------------- ------------------------------------------
| 1 | billing | | 1 | Info | Informational mesage |
| 2 | hr | | 2 | Warning| Warning only |
----------------------------- | 3 | Error | Failure |
| ------------------------------------------
| ------------|
Traces | |
--------------------------------------------------
| ID | System_ID | TraceTypes_ID | Message |
--------------------------------------------------
| 1 | 1 | 1 | Job starting |
| 2 | 1 | 3 | System.nullr..|
--------------------------------------------------
First, i added some records to all of the tables and demonstrated that the query below executes in 0.005 seconds:
select count(*) from Traces
inner join System on Traces.System_ID = System.ID
inner join TraceTypes on Traces.TraceTypes_ID = TraceTypes.ID
where
System.Name='billing' and TraceTypes.Code = 'Info'
Then I generated more data (no indexes yet)
"System" contained about 100 entries
"TraceTypes" contained about 50 entries
"Traces" contained ~10 million records.
Now the previous query took 8-10 seconds.
I created indexes on Traces.System_ID column and Traces.TraceTypes_ID column. Now this query executed in milliseconds:
select count(*) from Traces where System_id=1 and TraceTypes_ID=1;
This was also fast:
select count(*) from Traces
inner join System on Traces.System_ID = System.ID
where System.Name='billing' and TraceTypes_ID=1;
but the previous query which joined all the three tables still took 8-10 seconds to complete.
Only when I created a compound index (both System_ID and TraceTypes_ID columns included in index), the speed went down to milliseconds.
The basic statement I was taught earlier is "all the columns you use for join-ing, must be indexed".
However, in my scenario I had indexes on both System_ID and TraceTypes_ID, however MySQL didn't use them. The question is - why? My bets is - the item count ratio 100:10,000,000:50 makes the single-column indexes too large to be used. But is it true?

First, the correct, and the easiest, way to analyze a slow SQL statement is to do EXPLAIN. Find out how the optimizer chose its plan and ponder on why and how to improve that. I'd suggest to study the EXPLAIN results with only 2 separate indexes to see how mysql execute your statement.
I'm not very familiar with MySQL, but it seems that there's restriction of MySQL 4 of using only one index per table involved in a query. There seems to be improvements on this since MySQL 5 (index merge), but I'm not sure whether it applies to your case. Again, EXPLAIN should tell you the truth.
Even with using 2 indexes per table allowed (MySQL 5), using 2 separate indexes is generally slower than compound index. Using 2 separate indexes requires index merge step, compared to the single pass of using a compound index.
Multi Column indexes vs Index Merge might be helpful, which uses MySQL 5.4.2.

It's not the size of the indexes so much as the selectivity that determines whether the optimizer will use them.

My guess would be that it would be using the index and then it might be using traditional look up to move to another index and then filter out. Please check the execution plan. So in short you might be looping through two indexes in nested loop. As per my understanding. We should try to make a composite index on column which are in filtering or in join and then we should use Include clause for the columns which are in select. I have never worked in MySql so my this understanding is based on SQL Server 2005.

Related

Sorting query performance in PHP+MYSQL

I have a table with huge number of records. When I query from that specific table specially when using ORDER BY in query, it takes too much execution time.
How can I optimize this table for Sorting & Searching?
Here is an example scheme of my table (jobs):
+---+-----------------+---------------------+--+
| id| title | created_at |
+---+-----------------+---------------------+--+
| 1 | Web Developer | 2018-04-12 10:38:00 | |
| 2 | QA Engineer | 2018-04-15 11:10:00 | |
| 3 | Network Admin | 2018-04-17 11:15:00 | |
| 4 | System Analyst | 2018-04-19 11:19:00 | |
| 5 | UI/UX Developer | 2018-04-20 12:54:00 | |
+---+-----------------+---------------------+--+
I have been searching for a while, I learned that creating INDEX can help improving the performance, can someone please elaborate how the performance can be increased?
Add "explain" word before ur query, and check result
explain select ....
There u can see what u need to improve, then add index on ur search and/or sorting field and run explain query again
If you want to earn performance on your request, a way is paginating it. So, you can put a limit (as you want) and specify the page you want to display.
For example SELECT * FROM your_table LIMIT 50 OFFSET 0.
I don't know if this answer will help you in your problem but you can try it ;)
Indexes are the databases way to create lookup trees (B-Trees in most cases) to more efficiently sort, filter, and find rows.
Indexes are used to find rows with specific column values quickly.
Without an index, MySQL must begin with the first row and then read
through the entire table to find the relevant rows. The larger the
table, the more this costs. If the table has an index for the columns
in question, MySQL can quickly determine the position to seek to in
the middle of the data file without having to look at all the data.
This is much faster than reading every row sequentially.
https://dev.mysql.com/doc/refman/5.5/en/mysql-indexes.html
You can use EXPLAIN to help identify how the query is currently running, and identify areas of improvement. It's important to not over-index a table, for reasons probably beyond the scope of this question, so it'd be good to do some research on efficient uses of indexes.
ALTER TABLE jobs
ADD INDEX(created_at);
(Yes, there is a CREATE INDEX syntax that does the equivalent.)
Then, in the query, do
ORDER BY created_at DESC
However, with 15M rows, it may still take a long time. Will you be filtering (WHERE)? LIMITing?
If you really want to return to the user 15M rows -- well, that is a lot of network traffic, and that will take a long time.
MySQL details
Regardless of the index declaration or version, the ASC/DESC in ORDER BY will be honored. However it may require a "filesort" instead of taking advantage of the ordering built into the index's BTree.
In some cases, the WHERE or GROUP BY is to messy for the Optimizer to make use of any index. But if it can, then...
(Before MySQL 8.0) While it is possible to declare an index DESC, the attribute is ignored. However, ORDER BY .. DESC is honored; it scans the data backwards. This also works for ORDER BY a DESC, b DESC, but not if you have a mixture of ASC and DESC in the ORDER BY.
MySQL 8.0 does create DESC indexes; that is, the BTree containing the index is stored 'backwards'.

Can't optimise mySQL query

I am running a query to retrieve some game levels from a MySQL database. The query itself takes around 0.00025 seconds to execute on a base that contains 40 level strings. I thought it was satisfactory, until I got a message from the website host telling me to optimise the below-mentioned query, or the script will be removed since it is pushing a lot of strain onto their servers.
I tried optimising by using explain and explain extended and adjusting the columns accordingly(adding indexes), but am always getting the same performance. What I noticed also is that MySQL didn't use indexes where they were available but instead did a full-table scan.
Results from EXPLAIN EXTENDED:
table id select_type type possible_keys key key_len ref rows Extra
users 1 SIMPLE ALL PRIMARY,id NULL NULL NULL 7 Using temporary; Using filesort
AllTime 1 SIMPLE ref PRIMARY,userid PRIMARY 4 Test.users.id 1
query:
SELECT users.nickname, AllTime.userid, AllTime.id, AllTime.levelname, AllTime.levelstr
FROM AllTime
INNER JOIN users
ON AllTime.userid=users.id
ORDER BY AllTime.id DESC
LIMIT ($value_from_php),20;
The tables:
users
| id(int) | nickname(varchar) |
| (Primary, Auto_increment) | |
|---------------------------|-------------------|
| 1 | username1 |
| 2 | username2 |
| 3 | username3 |
| ... | ... |
and AllTime
| id(int) | userid(int) | levelname(varchar) | levelstr(text) |
| (Primary, Auto_increment) | (index) | | |
|---------------------------|-------------|--------------------|----------------|
| 1 | 2 | levelname1 | levelstr1 |
| 2 | 2 | levelname2 | levelstr2 |
| 3 | 3 | levelname3 | levelstr3 |
| 4 | 1 | levelname4 | levelstr4 |
| 5 | 1 | levelname5 | levelstr5 |
| 6 | 1 | levelname6 | levelstr6 |
| 7 | 2 | levelname7 | levelstr7 |
Is there a way to optimize this query or would I be better off by calling two consecutive queries from php just to avoid the warning?
I am just learning MySQL, so please take that information into account when replying, thank you :)
I'm assuming you're using InnoDB.
For an INNER JOIN, MySQL typically starts with the table with the fewest rows, in this case users. However, since you just want the latest 20 AllTime records joined with the corresponding user records, you actually should start with AllTime since with the LIMIT, it will be the smaller data set.
Use STRAIGHT_JOIN to force the join order:
SELECT users.nickname, AllTime.userid, AllTime.id, AllTime.levelname,
AllTime.levelstr
FROM AllTime
STRAIGHT_JOIN users
ON users.id = AllTime.userid
ORDER BY AllTime.id DESC
LIMIT ($value_from_php),20;
It should be able to use the primary key on the AllTime table and follow it in descending order. It'll grab all the data on the same pages as it goes.
It should also use the primary key on the users table to grab the id and nickname. If there are more than just two columns, you might add a multi-column covering index on (id, nickname) to improve the speed.
If you can, convert the levelstr column to VARCHAR so that the data is stored on the same page as the rest of the data, otherwise, it has to go fetch the text columns separately. This assumes that your columns are under the 8000 byte row limit for InnoDB. There is no way to avoid the USING TEMPORARY unless you get rid of the text column.
Most likely, your host has identified this query by using the slow query log, which can identify all queries that don't use an index, or they may have red flagged it because of the Using temporary.
it doesn't look like the query has a problem.
Review the application code. Most likely the issue is in the code
Check MySQL query execution plan
possibly you are missing an index
Make sure you cache the data in Application and Database (fyi, sometimes you can load the whole database into Application memory)
Make sure you use a connection pool
Create a view (a very small chance for improvement)
Try to remove the "Order By" clause (again a very small chance it will improve the performance)
The query itself takes around 0.00025 seconds ... I got a message from the website host telling me to optimise the below-mentioned query, or the script will be removed since it is pushing a lot of strain onto their servers.
Ask the website host for more details about why this query has been flagged for attention. A query that trivial is not going to cause strain on anything unless it is being called very frequently.
Find out how many times that query is being run. I will bet you a nickel that your site is getting hammered by a bot and being executed hundreds or thousands of times per minute. If so, then that's your real problem.
LIMIT ($value_from_php),20; -- if $value_form_php is huge, then the query is slow. This is because all the 'old' pages need to be scanned before getting to the 20 you need.
By "remembering where you left off" you can make every page equally fast. See this for further details: http://mysql.rjweb.org/doc.php/pagination

MySQL query with 2 joins, large keylen leads to 'Copying to tmp table on disk' process hanging forever

I'm sure I must be doing something stupid, but as is often the case I can't figure out what it is.
I'm trying to run this query:
SELECT `f`.`FrenchWord`, `f`.`Pronunciation`, `e`.`EnglishWord`
FROM (`FrenchWords` f)
INNER JOIN `FrenchEnglishMappings` m ON `m`.`FrenchForeignKey`=`f`.`id`
INNER JOIN `EnglishWords` e ON `e`.`id`=`m`.`EnglishForeignKey`
WHERE `f`.`Pronunciation` = '[whatever]';
When I run it, what happens seems quite weird to me. I get the results of the query fine, 2 rows in about 0.002 seconds.
However, I also get a huge spike in CPU and SHOW PROCESSLIST shows two identical processes for that query with state 'Copying to tmp table on disk'. These seem to keep running endlessly until I kill them or the system freezes.
None of the tables involved is big - between 100k and 600k rows each. tmp_table_size and max_heap_table_size are both 16777216.
Edit: EXPLAIN on the statement gives:
+edit reduced keylen of Pronunciation to 112
+----+-------------+-------+--------+-------------------------------------------------------------+-----------------+---------+----------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------------------------------------+-----------------+---------+----------------------------+------+----------------------------------------------+
| 1 | SIMPLE | f | ref | PRIMARY,Pronunciation | Pronunciation | 112 | const | 2 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | m | ref | tmpindex,CombinedIndex,FrenchForeignKey,EnglishForeignKey | tmpindex | 4 | dict.f.id | 1 | Using index |
| 1 | SIMPLE | e | eq_ref | PRIMARY,id | PRIMARY | 4 | dict.m.EnglishForeignKey | 1 | |
+----+-------------+-------+--------+-------------------------------------------------------------+-----------------+---------+----------------------------+------+----------------------------------------------+
I'd be grateful if someone could point out what might be causing this. What I really don't understand is what MySQL is doing - surely if the query is complete then it doesn't need to do anything else?
UPDATE
Thanks for all the responses. I learnt something from all of them. This query was made massively faster after following the advice of nrathaus. I added a PronunciationHash binary(16) column to FrenchWords that contains unhex( md5 ( Pronunciation ) ). That is indexed with a keylen of 16 (vs 600+ for the varchar index on Pronunciation), and queries are much faster now.
As said by the EXPLAIN, you key size is HUGE : 602, this requires MySQL to write down the data.
You need to reduce (greatly) the keylen, I believe recommended is below 128.
I suggest you create a column called MD5_FrenchWord which will contain the MD5 value of FrenchWord. Then use this column for the GROUP BY. This assumes that you are looking for similarities, when you group by rather than the actual value
You are misusing GROUP BY. This clause is entirely pointless unless you also have a summary function such as MAX(something) or COUNT(*) in your SELECT clause.
Try removing GROUP BY and see if it helps.
It's not clear what you're trying to do with GROUP BY. But you might try SELECT DISTINCT if you're trying to dedup your result set.
Looking further at this question, it seems like you might benefit from a couple of compound indexes.
First, can you make sure your table declarations have NOT NULL in as many columns as possible?
Second, you're retrieving Pronunciation, FrenchWord, and id from your Frenchwords table, so try this compound index on that table. Your query will then be able to get what it needs directly from the index, saving a bunch of disk io. Notice that Pronunciation is mentioned first in the compound index declaration because that's the value you're searching for. This allows MySQL to do a lookup on the index, and get the other information it needs directly from the index, without thrashing back to the table itself.
(Pronunciation, FrenchWord, id)
You're retrieving Englishword from Englishwords looking it up by id. So, the same reasoning can apply to this compound index.
(id, Englishword)
Finally, I can't tell what your ORDER BY is for, once you use SELECT DISTINCT. You might try getting rid of it. But it probably makes no difference.
Give this a try. If your MySQL server is still thrashing after you make these changes, you have some kind of configuration problem.

Query performance; not sure what's happening

I had issues with my query that took 17 seconds to execute (350k rows):
SELECT idgps_unit, MAX(dt)
FROM gps_unit_location
GROUP BY 1
Explain
1 SIMPLE gps_unit_location index fk_gps2 5 422633
After playing with it, I came with this solution that takes 1second:
Select idgps_unit, MAX(dt) from (
SELECT idgps_unit, dt
FROM gps_unit_location
) d1
Group by 1
Explain:
1 PRIMARY <derived2> ALL 423344 Using temporary; Using filesort
2 DERIVED gps_unit_location index gps_unit_location_dt_gpsid 10 422617 Using index
And now I am confused- why query #2 is fast, while query #1 seems to be the same query and seems to be written more efficiently.
Index1 :DT, Index2: idgps_unit, Index3: idgps_unit+DT
The execution times are consistent; query #1 always takes 17-19sec; while #1 <1sec.
I am using Godaddy VPS Windows Server 2008 Economy
Table example:
id | idgps_unit | dt | location
1 | 1 | 2012-01-01 | 1
2 | 1 | 2012-01-02 | 2
3 | 2 | 2012-01-03 | 3
4 | 2 | 2012-01-04 | 4
5 | 3 | 2012-01-05 | 5
First, I'm assuming gps_unit_location is really a table and not a view. Second, I'm also assuming that you have run both queries multiple times, so caching is the not explanation. (Caching would be that you run the first query, it loads the table into page cache and the second reads from memory rather than disk.)
Do you have an index on gps_unit_location(idgps_unit)? Are the records very wide? If the answers to these questions are "yes", then the following may be happening.
If so, you might have a curious problem with indexing. You would think that an index would speed up such a query. What it does, though, is to look up the values in idgps_id in order. If the index does not contain the date, then the database needs to fetch the data from each page. If the table does not fit into memory, then this will often result in a cache-miss -- that is, time to load the page.
By contrast, if the table is wide and the engine does a full table scan, then it can zip through the table and extract the two fields of interest. It puts them on the side. If they are small relative to the full table, then sorting them might take very little time. Voila, the query finishes faster.
My guess would be that the second structure removes the use of an index.
By the way, you can fix this by changing the index to gps_unit_location(idgps_unit, dt). By including the field in the index, the query does not have to load the data.
I would say your indexs are not set up properly, Your second query is kind of an inner query which is effectively creating its own inner index group if that makes sense !

mysql fast select query without reading all db

I have a large database with two tables: stat and total.
The example of the relation is the following:
STAT:
| ID | total event |
+--------+--------------+
| 7 | 2 |
| 8 | 1 |
TOTAL:
|ID | Event |
+---+--------------+
| 7 | "hello" |
| 7 | "everybody" |
| 8 | "hi" |
This is a very simplified version; also consider that STAT table could have 500K records, and for each STAT I can have about 200 TOTAL rows.
Currently, if I run a simple SELECT query in table TOTAL the system is terribly slow.
Could anyone help me with some advice for the creation of the TOTAL table? Is it possible to say to MySQL that the id column is already sorted so that there is no reason to scan all the rows till the end where, for example, id=7?
Add INDEX(ID) to your tables (both), if you did not already.
SELECT COUNT(*) FROM TOTAL WHERE ID=7 -> if ID is indexed, this will be fast.
You can add an index, and furthermore you can partition your table.
As per #ypercube's comment, tables are not stored in a sorted state, so one cannot "tell" this to the database. However you can add an index on tables to make them faster to search.
One important thing to check - it looks like TOTAL.ID is intended as a foreign key - if so, the table TOTAL should have a primary key called ID. Rename the existing column of that name to STAT_ID instead, so it is obvious what it is a foreign key for. Then add an index on STAT_ID.
Lastly, as a point of style, I recommend that you make your table and column names case-insensitive, and write them in lower-case. It makes it easier to read SQL when keywords are in upper case, and database objects are in lower.