So here is a very simple table 'tbl':
+---------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------------+------+-----+---------+----------------+
| val | varchar(45) | YES | MUL | NULL | |
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
+---------+---------------------+------+-----+---------+----------------+
And indexes for it:
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tbl | 0 | PRIMARY | 1 | id | A | 201826018 | NULL | NULL | | BTREE | |
| tbl | 1 | val | 1 | val | A | 881336 | NULL | NULL | YES | BTREE | |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
I'm trying this simple select:
select val from tbl where val = 'iii';
result: 86208 rows in set (0.08 sec)
But when I want to slightly modify it:
select id, val from tbl where val = 'iii';
the result is: 86208 rows in set (47.30 sec)
I have an index right on the coumn that where points to, all I'm modifying is the result rows representation.
Why there is so terrifying delay? (I have to say that I can't reproduce this delay every time I want: even after 'reset query cache' or setting 'query_cache_type=off' command it can be done quickly).
Without actually examining your server configuration it is hard to tell, but here's an educated guess. In the first instance, MySQL is able to satisfy your query without actually reading the table data. All the information you have asked for can be retrieved from the index alone. Notice that the cardinality of the val index is only on the order of 106 rows, and the rows are going to be very short in the index.
In the second case you have asked for data NOT in the index on val. Now the engine has to actually find and read rows from the data. Here the cardinality is about 250 times larger, and since the index will retrieve rows ordered by val, finding the corresponding id values will require A LOT of jumping around in several hundred gigs of data on disk. This is going to be very much slower.
Try to add a ORDER BY and `LIMIT to the query. That should help a lot.
I think if you change the query to this it will be faster:
select id, val from tbl where val = 'iii' order by val limit 10;
You're doing a select based on two columns, but no index of both exists. Try adding a new index consisting of both id and val.
Related
I have a simple InnoDB table with 1M+ rows and some simple indexes.
I need to sort this table by first_public and id columns and get some of them, this is why I've indexed first_public column.
first_public is unique at the moment, but in real life it might be not.
mysql> desc table;
+--------------+-------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------------------+------+-----+---------+----------------+
| id | bigint unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | | NULL | |
| id_category | int | NO | MUL | NULL | |
| active | smallint | NO | | NULL | |
| status | enum('public','hidden') | NO | | NULL | |
| first_public | datetime | YES | MUL | NULL | |
| created_at | timestamp | YES | | NULL | |
| updated_at | timestamp | YES | | NULL | |
+--------------+-------------------------+------+-----+---------+----------------+
8 rows in set (0.06 sec)
it works well while I'm working with rows before 130000+
mysql> explain select id from table where active = 1 and status = 'public' order by first_public desc, id desc limit 24 offset 130341;
+----+-------------+--------+------------+-------+---------------+---------------------+---------+------+--------+----------+----------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+---------------------+---------+------+--------+----------+----------------------------------+
| 1 | SIMPLE | table | NULL | index | NULL | firstPublicDateIndx | 6 | NULL | 130365 | 5.00 | Using where; Backward index scan |
+----+-------------+--------+------------+-------+---------------+---------------------+---------+------+--------+----------+----------------------------------+
1 row in set, 1 warning (0.00 sec)
but when I try to get some next rows (with offset 140000+), it looks like MySQL don't use first_public column index at all.
mysql> explain select id from table where active = 1 and status = 'public' order by first_public desc, id desc limit 24 offset 140341;
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+
| 1 | SIMPLE | table | NULL | ALL | NULL | NULL | NULL | NULL | 1133533 | 5.00 | Using where; Using filesort |
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+
1 row in set, 1 warning (0.00 sec)
I tried to add first_public column in to select clause, but nothing changed.
What I'm doing wrong?
MySQL's optimizer tries to estimate the cost of doing your query, to decide if it's worth using an index. Sometimes it compares the cost of using the index versus just reading the rows in order, and discarding the ones that don't belong in the result.
In this case, it decided that if you use an OFFSET greater than 140k, it gives up on using the index.
Keep in mind how OFFSET works. There's no way of looking up the location of an offset by an index. Indexes help to look up rows by value, not by position. So to do an OFFSET query, it has to examine all the rows from the first matching row on up. Then it discards the rows it examined up to the offset, and then counts out the enough rows to meet the LIMIT and returns those.
It's like if you wanted to read pages 500-510 in a book, but to do this, you had to read pages 1-499 first. Then when someone asks you to read pages 511-520, and you have to read pages 1-510 over again.
Eventually the offset gets to be so large that it's less expensive to read 14000 rows in a table-scan, than to read 14000 index entries + 14000 rows.
What?!? Is OFFSET really so expensive? Yes, it is. It's much more common to look up rows by value, so MySQL is optimized for that usage.
So if you can reimagine your pagination queries to look up rows by value instead of using LIMIT/OFFSET, you'll be much happier.
For example, suppose you read "page" 1000, and you see that the highest id value on that page is 13999. When the client requests the next page, you can do the query:
SELECT ... FROM mytable WHERE id > 13999 LIMIT 24;
This does the lookup by the value of id, which is optimized because it utilizes the primary key index. Then it reads just 24 rows and returns them (MySQL is at least smart enough to stop reading after it reaches the OFFSET + LIMIT rows).
The best index is
INDEX(active, status, first_public, id)
Using huge offsets is terribly inefficient -- it must scan over 140341 + 24 rows to perform the query.
If you are trying to "walk through" the table, use the technique of "remembering where you left off". More discussion of this: http://mysql.rjweb.org/doc.php/pagination
The reason for the Optimizer to abandon the index: It decided that the bouncing back and forth between the index and the table was possibly worse than simply scanning the entire table. (The cutoff is about 20%, but varies widely.)
I have a simple table (created by django) - engine InnoDB:
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| correlation | double | NO | | NULL | |
| gene1_id | int(10) unsigned | NO | MUL | NULL | |
| gene2_id | int(10) unsigned | NO | MUL | NULL | |
+-------------+------------------+------+-----+---------+----------------+
The table has more than 411 million rows.
(The target table will have around 461M rows, 21471*21470 rows)
My main query looks like this, there might be up to 10 genes specified at most.
SELECT gene1_id, AVG(correlation) AS avg FROM genescorrelation
WHERE gene2_id IN (176829, 176519, 176230)
GROUP BY gene1_id ORDER BY NULL
This query is very slow, it takes almost 2 mins to run:
21471 rows in set (1 min 11.03 sec)
Indexes (cardinality looks strange - too small?):
Non_unique| Key_name | Seq_in_index | Column_name | Collation | Cardinality |
0 | PRIMARY | 1 | id | A | 411512194 |
1 | c_gene1_id_6b1d81605661118_fk_genes_gene_entrez | 1 | gene1_id | A | 18 |
1 | c_gene2_id_2d0044eaa6fd8c0f_fk_genes_gene_entrez | 1 | gene2_id | A | 18 |
I just run select count(*) on that table and it took 22 mins:
select count(*) from predictions_genescorrelation;
+-----------+
| count(*) |
+-----------+
| 411512002 |
+-----------+
1 row in set (22 min 45.05 sec)
What could be wrong?
I suspect that mysql configuration is not set up right.
During the import of data I experienced problem with space, so that might also affected the database, although I ran check table later - it took 2hours and stated OK.
Additionally - the cardinality of the indexes look strange. I have set up smaller database locally and there values are totally different (254945589,56528,17).
Should I redo indexes?
What params should I check of MySQL?
My tables are set up as InnoDB, would MyISAM make any difference?
Thanks,
matali
https://www.percona.com/blog/2006/12/01/count-for-innodb-tables/
SELECT COUNT(*) queries are very slow without WHERE clause or without SELECT COUNT(id) ... USE INDEX (PRIMARY).
to speedup this:
SELECT gene1_id, AVG(correlation) AS avg FROM genescorrelation
WHERE gene2_id IN (176829, 176519, 176230)
GROUP BY gene1_id ORDER BY NULL
you should have composite key on (gene2_id, gene1_id, correlation) in that order. try
About index-cardinality: stats of Innodb tables are approximate, not accurate (sometimes insane). there even was (IS?) a bug-report https://bugs.mysql.com/bug.php?id=58382
Try to ANALIZE table and watch cardinality again
I'm working on an application that needs to get the latest values from a table with currently > 3 million rows and counting. The latest values need to be grouped by two columns/attributes, so it runs the following query:
SELECT
m1.type,
m1.cur,
ROUND(m1.val, 2) AS val
FROM minuteCharts m1
JOIN
(SELECT
cur,
type,
MAX(id) id,
ROUND(val) AS val
FROM minuteCharts
GROUP BY cur, type) m2
ON m1.cur = m2.cur AND m1.id = m2.id;
The database server is quite the heavyweight, but the above query takes 3,500ms to complete and this number is rising. I suspect this wasn't a real problem when the application was just launched (as the database was pretty much empty back then), but it's becoming a problem and I haven't found a better solution. In fact, similar questions on SO actually had something like the above as their answers (which is probably where the developer got it from).
Is there anyone out there who knows how to get the same results more efficiently?
UPDATE: I submitted this too early.
EXPLAIN minuteCharts;
Field Type Null Key Default Extra
id int(255) NO PRI NULL auto_increment
time datetime NO MUL NULL
cur enum('EUR','USD') NO NULL
type enum('GOLD','SILVER','PLATINUM') NO NULL
val varchar(80) NO NULL
id is the primary index and there's an index on time.
The subquery with GROUP BY is doing a table-scan and a temporary table, because there's no index to support it.
mysql> EXPLAIN SELECT m1.type, m1.cur, ROUND(m1.val, 2) AS val FROM minuteCharts m1 JOIN (SELECT cur, type, MAX(id) id, ROUND(val) AS val FROM minuteCharts GROUP BY cur, type) m2 ON m1.cur = m2.cur AND m1.id = m2.id;
+----+-------------+--------------+------+---------------+-------------+---------+------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+-------------+---------+------------------------+------+---------------------------------+
| 1 | PRIMARY | m1 | ALL | PRIMARY | NULL | NULL | NULL | 1 | NULL |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 6 | test.m1.cur,test.m1.id | 2 | NULL |
| 2 | DERIVED | minuteCharts | ALL | NULL | NULL | NULL | NULL | 1 | Using temporary; Using filesort |
+----+-------------+--------------+------+---------------+-------------+---------+------------------------+------+---------------------------------+
You can improve this with the following index, sorted first by your GROUP BY columns, then also including the other columns for the subquery to make it a covering index:
mysql> ALTER TABLE minuteCharts ADD KEY (cur,type,id,val);
The table-scans turn into index scans (still not great but better), and the temp table goes away.
mysql> EXPLAIN ...
+----+-------------+--------------+-------+---------------+-------------+---------+------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+-------------+---------+------------------------+------+-------------+
| 1 | PRIMARY | m1 | index | PRIMARY,cur | cur | 88 | NULL | 1 | Using index |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 6 | test.m1.cur,test.m1.id | 2 | NULL |
| 2 | DERIVED | minuteCharts | index | cur | cur | 88 | NULL | 1 | Using index |
+----+-------------+--------------+-------+---------------+-------------+---------+------------------------+------+-------------+
Best results will be if the index fits in your buffer pool. If it's larger than the buffer pool, the query will have to push pages in and out repeatedly during the index scan, which will greatly degrade performance.
Re your comment:
The answer to how long it'll take to add the index depends on the version of MySQL you have, the storage engine for this table, your server hardware, the number of rows in the table, the level of concurrent load on the database, etc. In other words, I have no way of telling.
I'd suggest using pt-online-schema-change, so you will have no downtime.
Another suggestion would be to try it on a staging server with a clone of your database, so you can get a rough estimate how long it'll take (although testing on an idle server is often a lot quicker than running the same change on a busy server).
I'm new to query optimizations so I accept I don't understand everything yet but I do not understand why even this simple query isn't optimized as expected.
My table:
+------------------+-----------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+-----------+------+-----+-------------------+----------------+
| tasktransitionid | int(11) | NO | PRI | NULL | auto_increment |
| taskid | int(11) | NO | MUL | NULL | |
| transitiondate | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
+------------------+-----------+------+-----+-------------------+----------------+
My indexes:
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tasktransitions | 0 | PRIMARY | 1 | tasktransitionid | A | 952 | NULL | NULL | | BTREE | | |
| tasktransitions | 1 | transitiondate_ix | 1 | transitiondate | A | 952 | NULL | NULL | | BTREE | | |
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
My query:
SELECT taskid FROM tasktransitions WHERE transitiondate>'2013-09-31 00:00:00';
gives this:
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | tasktransitions | ALL | transitiondate_ix | NULL | NULL | NULL | 1082 | Using where |
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
If I understand everything correctly Using where and ALL means that all rows are retrieved from the storage engine and filtered at server layer. This is sub-optimal. Why does it refuse to use the index and only retrieve the requested range from the storage engine (innoDB)?
Cheers
MySQL will not use the index if it estimates that it would select a significantly large portion of the table, and it thinks that a table-scan is actually more efficient in those cases.
By analogy, this is the reason the index of a book doesn't contain very common words like "the" -- because it would be a waste of time to look up the word in the index and find the list of page numbers is a very long list, even every page in the book. It would be more efficient to simply read the book cover to cover.
My experience is that this happens in MySQL if a query's search criteria would match greater than 20% of the table, and this is usually the right crossover point. There could be some variation based on the data types, size of table, etc.
You can give a hint to MySQL to convince it that a table-scan would be prohibitively expensive, so it would be much more likely to use the index. This is not usually necessary, but you can do it like this:
SELECT taskid FROM tasktransitions FORCE INDEX (transitiondate_ix)
WHERE transitiondate>'2013-09-31 00:00:00';
I once was trying to join two tables and MySQL was refusing to use an index, resulting in >500ms queries, sometimes a few seconds. Turns out the column I was joining on had different encodings on each table. Changing both to the same encoding sped up the query to consistently less than 100ms.
Just in case, it helps somebody.
I have a table with a varchar column _id (long int coded as string). I added an index for this column, but query was still slow. I was executing this query:
select * from table where (_id = 2221835089) limit 1
I realized that the _id column wasn't been generated as string (I'm Laravel as DB framework). Well, if query is executed with the right data type in the where clause everything worked like a charm:
select * from table where (_id = '2221835089') limit 1
I am new at my MySQL 8.0, have finished 2 simple tutorials completely, and there is only two subjects that has not worked for me, one of them is indexing. I read the section labeled "2 Answers" and found that using
the statement suggested at the end of said section, seems to defeat the
purpose of the original USE INDEX or FORCE INDEX statement below. The suggested statement is like getting a table sorted via a WHERE statement instead of MySQL using USE INDEX or FORCE INDEX. It works, but seems to me it is not the same as using the natural USE INDEX or FORCE INDEX. Does any one knows why MySQL is ignoring my simple request to index a 10 row table on the Lname column?
Field
Type
Null
Key
Default
Extra
ID
int
NO
PRI
Null
auto_increment
Lname
varchar(20)
NO
MUL
Null
Fname
varchar(20)
NO
Mul
Null
City
varchar(15)
NO
Null
Birth_Date
date
NO
Null
CREATE INDEX idx_Lname ON TestTable (Lname);
SELECT * FROM TestTable USE INDEX (idx_Lname);
SELECT * From Testtable FORCE INDEX (idx_LastFirst);
I'm working on "online streaming" project and I need some help in constructing a DB for best performance. Currently I have one table containing all relevant information for the player including file, poster image, post_id etc.
+---------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| post_id | int(11) | YES | | NULL | |
| file | mediumtext | NO | | NULL | |
| thumbs_img | mediumtext | YES | | NULL | |
| thumbs_size | mediumtext | YES | | NULL | |
| thumbs_points | mediumtext | YES | | NULL | |
| poster_img | mediumtext | YES | | NULL | |
| type | int(11) | NO | | NULL | |
| uuid | varchar(40) | YES | | NULL | |
| season | int(11) | YES | | NULL | |
| episode | int(11) | YES | | NULL | |
| comment | text | YES | | NULL | |
| playlistName | text | YES | | NULL | |
| time | varchar(40) | YES | | NULL | |
| mini_poster | mediumtext | YES | | NULL | |
+---------------+-------------+------+-----+---------+----------------+
With 100k records it takes around 0.5 sec for a query and performance constantly degrading as I have more records.
+----------+------------+----------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+----------------------------------------------------------------------+
| 1 | 0.04630675 | SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1' |
+----------+------------+----------------------------------------------------------------------+
explain SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1';
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | dle_playerFiles | ALL | NULL | NULL | NULL | NULL | 61777 | Using where |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
How can I improve DB structure? How big websites like youtube construct their database?
Generally when query time is directly proportional to the number of rows, that suggests a table scan, which means for a query like
SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1'
The database is executing that literally, as in, iterate over every single row and check if it meets criteria.
The typical solution to this is an index, which is a precomputed list of values for a column (or set of columns) and a list of rows which have said value.
If you create an index on the post_id column on dle_playerFiles, then the index would essentially say
1: <some row pointer>, <some row pointer>, <some row pointer>
2: <some row pointer>, <some row pointer>, <some row pointer>
...
100: <some row pointer>, <some row pointer>, <some row pointer>
...
7000: <some row pointer>, <some row pointer>, <some row pointer>
250000: <some row pointer>, <some row pointer>, <some row pointer>
Therefore, with such an index in place, the above query would simply look at node 7000 of the index and know which rows contain it.
Then the database only needs to read the rows where post_id is 7000 and check if their type is 1.
This will be much quicker because the database never needs to look at every row to handle a query. The costs of an index:
Storage space - this is more data and it has to be stored somewhere
Update time - databases keep indexes in sync with changes to the table automatically, which means that INSERT, UPDATE and DELETE statements will take longer because they need to update the data. For small and efficient indexes, this tradeoff is usually worth it.
For your query, I recommend you create an index on 2 columns. Make them part of the same index, not 2 separate indexes:
create index ix_dle_playerFiles__post_id_type on dle_playerFiles (post_id, type)
Caveats to this working efficiently:
SELECT * is bad here. If you are returning every column, then the database must go to the table to read the columns because the index only contains the columns for filtering. If you really only need one or two of the columns, specify them explicitly in the SELECT clause and add them to your index. Do NOT do this for many columns as it just bloats the index.
Functions and type conversions tend to prevent index usage. Your SQL wraps the integer types post_id and type in quotes so they are interpreted as strings. The database may feel that an index can't be used because it has to convert everything. Remove the quotes for good measure.
If I read your Duration correctly, it appears to take 0.04630675 (seconds?) to run your query, not 0.5s.
Regardless, proper indexing can decrease the time required to return query results. Based on your query SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1', an index on post_id and type would be advisable.
Also, if you don't absolutely require all the fields to be returned, use individual column references of the fields you require instead of the *. The fewer fields, the quicker the query will return.
Another way to optimize a query is to ensure that you use the smallest data types possible - especially in primary/foreign key and index fields. Never use a bigint or an int when a mediumint, smallint or better still, a tinyint will do. Never, ever use a text field in a PK or FK unless you have no other choice (this one is a DB design sin that is committed far too often IMO, even by people with enough training and experience to know better) - you're far better off using the smallest exact numeric type possible. All this has positive impacts on storage size too.