Select from information_schema tables very slow - mysql

Query for * from information_schema.tables is very slow.
Innodb_stats_on_metadata is off, and select table_name from tables is fast, just selecting more fields is very slow (12 minutes!)
mysql> select * from tables limit 1;
+---------------+--------------------+----------------+-------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------------+----------------+-------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| def | information_schema | CHARACTER_SETS | SYSTEM VIEW | MEMORY | 10 | Fixed | NULL | 384 | 0 | 32869632 | 0 | 0 | NULL | 2016-12-19 23:55:46 | NULL | NULL | utf8_general_ci | NULL | max_rows=87381 | |
+---------------+--------------------+----------------+-------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
1 row in set (**12 min 27.02 sec**)
Additional information:
mysql> select count(*) from tables;
+----------+
| count(*) |
+----------+
| 194196 |
+----------+
1 row in set (0.57 sec)
mysql> show global variables like '%innodb%metada%';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_stats_on_metadata | OFF |
+--------------------------+-------+
1 row in set (0.00 sec)

Selecting more columns means the server has to do more work -- interrogating the storage engines for all of the tables in all of the schemas to obtain what you requested.
The tables in information_schema are not real tables. They are server internals, exposed via an SQL interface, in some cases allowing you to query information the server doesn't store and must calculate or gather because you asked. The server code knows what columns you ask for, and only gathers that information.
LIMIT 1 doesn't help, because information_schema doesn't handle LIMIT as you would expect -- the entire table is rendered in memory before the first row is returned and the rest are discarded.

Even in 5.7, the information about the tables is scattered in files on disk. Reading 200K files takes a lot of time.
That is one reason why 200K tables is not a good design. Other reasons have to do with caching -- there are practical limits on such.
You will see variations on timings of I_S queries because of caching.
Advice: Re-think your schema design.
8.0 Stores all that info in an InnoDB table, so it will be enormously faster.

Related

Physical disk rewrite of mysql data

I am using mysql for the first time in years to help a friend out. The issue: a mysql table that gets updated a lot with INT and CHAR values. This web app site is hosted on a large generic provider, so I have no direct control of setup/parameters/etc. The performance has gotten really, really bad for this table, to the point where processing a data page that should take a max of 10 seconds is sometimes taking 15 minutes.
I initially tried running all updates as a single transaction, rather than the 50ish statements in a php loop in the web app (written several years ago). The problem, at least what I think, is that this app is running on a giant mysql instance with many other generic websites, and the disk speed just isn't able to handle so many updates.
I am able to use chron/batch jobs on this provider. The web app is mainly used during work hours, so I could limit access to the web app during overnight hours.
I normally work with postgresql or ms sql server, so my knowledge of mysql is fairly limited.
Would performance be increased if I force the table to be dropped and rewritten overnight? Is there some mysql function like postgres's vacuum? I have tried to search for information, but unfortunately using words like rewrite table just brings up references to sql syntax helpers or performance tuning.
Alternately, I guess that I could create a new storage mechanism in mysql, as long as it could be done via a php script. Would there be a better storage mode than the default storage engine for something frequently updated?
Performance of mysql depends on multiple factors that it's complicated enough to have a clear answer in every case. I think we can check the following steps to help figuring out on what to improve on INSERT data into mysql.
Database Engine.
There are 5 engine that you can use depends on your purposes: MyISAM, Memory, InnoDB, Archive, NDB.
Document
An engine which has Locking granularity as table will be slower than engine has its value as row because it will lock a table from changing when insert or update a single record, while Locking granularity as row mean locking only that row when you insert or update records.
When perform INSERT OR UPDATE record, engine has B-tree indexes attribute will be slower because it's have to rebuild it's indexes, so that you will have faster speed SELECT query. Therefore number of indexes in table will slow inserting and updating speed as well.
Indexes as CHAR will be slower than indexes as INT because it takes more time to figure out where to find the right node to store data in mysql.
MYSQL Statement
MYSQL has a estimation system that help you to discover performance of a query by add EXPLAIN before your mysql statement.
Example
EXPLAIN SELECT SQL_NO_CACHE * FROM Table_A WHERE id = 1;
Document
I worked on a web application, where we used mysql (it's really good !) to scale really large data.
In addition to what #Lam Nguyen said in his answer here is few things to consider,
Check which mysql engine you are using to see which locks it obtains during select, insert , update. To check which engine you are using here is a sample query with which you could run your litmus test.
mysql> show table status where name="<your_table_name>";
+-------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+-------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| Login | InnoDB | 10 | Dynamic | 2 | 8192 | 16384 | 0 | 0 | 0 | NULL | 2019-04-28 12:16:59 | NULL | NULL | utf8mb4_general_ci | NULL | | |
+-------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
The default engine which comes with mysql installation is InnoDB. InnoDB does not acquire any lock while inserting a row.
SELECT ... FROM is a consistent read, reading a snapshot of the database and setting no locks unless the transaction isolation level is set to SERIALIZABLE.
A locking read, an UPDATE, or a DELETE generally set record locks on every index record that is scanned in the processing of the SQL statement.
InnoDB lock sets
Check for columns which you are indexing. Index the column which you would really query a lot. Avoid indexing char columns.
To check which columns of you table got indexed run,
mysql> show index from BookStore2;
+------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Bookstore2 | 0 | PRIMARY | 1 | ISBN_NO | A | 0 | NULL | NULL | | BTREE | | | YES | NULL |
| Bookstore2 | 1 | SHORT_DESC_IND | 1 | SHORT_DESC | A | 0 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Bookstore2 | 1 | SHORT_DESC_IND | 2 | PUBLISHER | A | 0 | NULL | NULL | YES | BTREE | | | YES | NULL |
+------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
3 rows in set (0.03 sec)
Do not run inner query on a large data set in a table. To actually see what your query does run explain on your query and see the number of rows iter
mysql> explain select * from login;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
| 1 | SIMPLE | login | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
1 row in set, 1 warning (0.03 sec)
Avoid joining too may tables.
Make sure you are querying with a primary key in criteria or at least you are querying on your indexed column.
When your table grows too big make sure you split it across clusters.
With few tweaks, we would still be able to get query results in minimal time.

MySQL: Slow avg query for 411M rows

I have a simple table (created by django) - engine InnoDB:
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| correlation | double | NO | | NULL | |
| gene1_id | int(10) unsigned | NO | MUL | NULL | |
| gene2_id | int(10) unsigned | NO | MUL | NULL | |
+-------------+------------------+------+-----+---------+----------------+
The table has more than 411 million rows.
(The target table will have around 461M rows, 21471*21470 rows)
My main query looks like this, there might be up to 10 genes specified at most.
SELECT gene1_id, AVG(correlation) AS avg FROM genescorrelation
WHERE gene2_id IN (176829, 176519, 176230)
GROUP BY gene1_id ORDER BY NULL
This query is very slow, it takes almost 2 mins to run:
21471 rows in set (1 min 11.03 sec)
Indexes (cardinality looks strange - too small?):
Non_unique| Key_name | Seq_in_index | Column_name | Collation | Cardinality |
0 | PRIMARY | 1 | id | A | 411512194 |
1 | c_gene1_id_6b1d81605661118_fk_genes_gene_entrez | 1 | gene1_id | A | 18 |
1 | c_gene2_id_2d0044eaa6fd8c0f_fk_genes_gene_entrez | 1 | gene2_id | A | 18 |
I just run select count(*) on that table and it took 22 mins:
select count(*) from predictions_genescorrelation;
+-----------+
| count(*) |
+-----------+
| 411512002 |
+-----------+
1 row in set (22 min 45.05 sec)
What could be wrong?
I suspect that mysql configuration is not set up right.
During the import of data I experienced problem with space, so that might also affected the database, although I ran check table later - it took 2hours and stated OK.
Additionally - the cardinality of the indexes look strange. I have set up smaller database locally and there values are totally different (254945589,56528,17).
Should I redo indexes?
What params should I check of MySQL?
My tables are set up as InnoDB, would MyISAM make any difference?
Thanks,
matali
https://www.percona.com/blog/2006/12/01/count-for-innodb-tables/
SELECT COUNT(*) queries are very slow without WHERE clause or without SELECT COUNT(id) ... USE INDEX (PRIMARY).
to speedup this:
SELECT gene1_id, AVG(correlation) AS avg FROM genescorrelation
WHERE gene2_id IN (176829, 176519, 176230)
GROUP BY gene1_id ORDER BY NULL
you should have composite key on (gene2_id, gene1_id, correlation) in that order. try
About index-cardinality: stats of Innodb tables are approximate, not accurate (sometimes insane). there even was (IS?) a bug-report https://bugs.mysql.com/bug.php?id=58382
Try to ANALIZE table and watch cardinality again

MySQL indexing columns vs joining tables

I am trying to figure out the most efficient way to extract values from database that has the structure similar to this:
table test:
int id (primary, auto increment)
varchar(50) stuff,
varchar(50) important_stuff;
where I need to do a query like
select * from test where important_stuff like 'prefix%';
The size of the entire table is approximately 10 million rows, however there are only about 500-1000 distinct values for important_stuff. My current solution is indexing important_stuff however the performance is not satisfactory. Will it be better to create a separate table that will match distinct important_stuff to a certain id, which will be stored in the 'test' table and then do
(select id from stuff_lookup where important_stuff like 'prefix%') a join select * from test b where b.stuff_id=a.id
or this:
select * from test where stuff_id exists in(select id from stuff_lookup where important_stuff like 'prefix%')
What is the best way to optimize things like that?
How big is innodb_buffer_pool_size? How much RAM is available? The former should be about 70% of the latter. You'll see in a minute why I bring up this setting.
Based on your 3 suggested SELECTs, the original one will work as good as the two complex ones. In some other case, the complex formulation might work better.
INDEX(important_stuff) is the 'best' index for
select * from test where important_stuff like 'prefix%';
Now, let's study how that query works with that index:
Reach into the BTree index, starting at 'prefix'. (Effort: Virtually instantaneous)
Scan forward for, say, 1000 entries. That will be about 10 InnoDB blocks (16KB each). Each entry will have the PRIMARY KEY (id). (Effort: <= 10 disk hits)
For each entry, look up the row (so you can get "*"). That's 1000 PK lookups in the BTree that contains both the PK and the data. At best, they might all be in 10 blocks. At worst, they could be in 1000 separate blocks. (Effort: 10-1000 blocks)
Total Effort: ~1010 blocks (worst case).
A standard spinning disk can handle ~100 reads/second. So. we are looking at 10 seconds.
Now, run the query again. Guess what; all those blocks are now in RAM (cached in the "buffer_pool", which is hopefully big enough for all of them). And it runs in less than 1 second.
OPTIMIZE TABLE was not necessary! It was not a statistics refresh, but rather caching that sped up the query.
I'm not MySQL user but I made some tests on my local database. I've added 10 millions rows as you wrote and distinct datas from third column are loaded quite fast. These are my results.
mysql> describe bigtable;
+-----------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| stuff | varchar(50) | NO | | NULL | |
| important_stuff | varchar(50) | NO | MUL | NULL | |
+-----------------+-------------+------+-----+---------+----------------+
3 rows in set (0.03 sec)
mysql> select count(*) from bigtable;
+----------+
| count(*) |
+----------+
| 10000089 |
+----------+
1 row in set (2.87 sec)
mysql> select count(distinct important_stuff) from bigtable;
+---------------------------------+
| count(distinct important_stuff) |
+---------------------------------+
| 1000 |
+---------------------------------+
1 row in set (0.01 sec)
mysql> select distinct important_stuff from bigtable;
....
| is_987 |
| is_988 |
| is_989 |
| is_99 |
| is_990 |
| is_991 |
| is_992 |
| is_993 |
| is_994 |
| is_995 |
| is_996 |
| is_997 |
| is_998 |
| is_999 |
+-----------------+
1000 rows in set (0.15 sec)
Important information is that I refreshed statistics on this table (before this operation I needed ~10 seconds to load these data).
mysql> optimize table bigtable;

Mysql: Why are these queries being 'logged' as 'not using indexes' when they

So, I'm trying to find joins that aren't properly using indexes, but the log is being filled with queries which appear to have indexes to me.
I turn on slow_query_log and turn on log_queries_not_using_indexes and set the long_query_time to 10 seconds.
The log starts flooding with lines like this...
Query_time: 0.320889 Lock_time: 0.000030 Rows_sent: 0 Rows_examined: 338336
SET timestamp=1422564398;
select * from fversions where author=155669 order by entryID desc limit 40;
The query time is below 10 seconds, and from this explain, it seems to be using the primary key as the index.
Why is this query being logged? I can't see the problem queries to add indexes to them. Too much noise.
Thanks in advance!
PS. The answer for this doesn't seem to apply as I have a 'where'. MySQL why logged as slow query/log-queries-not-using-indexes when have indexes?
mysql> explain select * from fversions where author=155669 order by entryID desc limit 40;
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+---------+---------+------+------
| 1 | SIMPLE | fversions | index | NULL | PRIMARY | 8 | NULL | 40 | Using where |
+----+-------------+-----------+-------+---------------+---------+---------+----
Blockquote
--+------
1 row in set (0.00 sec)
mysql> show variables like 'slow_query_log';
+----------------+-------+
| Variable_name | Value |
+----------------+-------+
| slow_query_log | ON |
mysql> show variables like 'long_query_time';
+-----------------+-----------+
| Variable_name | Value |
+-----------------+-----------+
| long_query_time | 10.000000 |
mysql> show variables like 'log_queries_not_using_indexes';
+-------------------------------+-------+
| Variable_name | Value |
+-------------------------------+-------+
| log_queries_not_using_indexes | ON |

more records takes less time

This is almost driving me insane
I do the following query:
SELECT * FROM `photo_person` WHERE photo_person.photo_id IN (SELECT photo_id FROM photo_person WHERE `photo_person`.`person_id` ='1')
When I change the id, I get different processing time. Although it's all the same queries and tables.
By changing the person_id I get the following:
-- person_id=1 ( 3 total, Query took 0.4523 sec)
-- person_id=2 ( 99 total, Query took 0.1340 sec)
-- person_id=3 ( 470 total, Query took 0.0194 sec)
-- person_id=4 ( 1,869 total, Query took 0.0024 sec)
I do not understand how with the increase of the number of records/results the query time is lower.
The table structures are very straight forward
UPDATE: I have already disabled mysql query cache, so every time I run the query, I would get the same exact value (of course it varies on the milisecond level but this is can be neglected)
UPDATE: table is MyISAM
CREATE TABLE IF NOT EXISTS `photo_person` (
`entry_id` int(11) NOT NULL AUTO_INCREMENT,
`photo_id` int(11) NOT NULL DEFAULT '0',
`person_id` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`entry_id`),
UNIQUE KEY `PhotoID` (`photo_id`,`person_id`),
KEY `photo_id` (`photo_id`),
KEY `person_id` (`person_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=182072 ;
Here is the results of the profiling
+----------+------------+-----------------------------+
| Query_ID | Duration |Query |
+----------+------------+-----------------------------+
| 1 | 0.45541200 | SELECT ...`person_id` ='1') |
| 2 | 0.44833700 | SELECT ...`person_id` ='2') |
| 3 | 0.45587800 | SELECT ...`person_id` ='3') |
| 4 | 0.45074900 | SELECT ...`person_id` ='4') |
+----------+------------+-----------------------------+
now since the number are the same, it must be the caching :(
So the aparently the caching kicks in a certain number of records or bytes
mysql> SHOW VARIABLES LIKE "%cac%";
+------------------------------+------------+
| Variable_name | Value |
+------------------------------+------------+
| binlog_cache_size | 32768 |
| have_query_cache | YES |
| key_cache_age_threshold | 300 |
| key_cache_block_size | 1024 |
| key_cache_division_limit | 100 |
| max_binlog_cache_size | 4294963200 |
| query_cache_limit | 1024 |
| query_cache_min_res_unit | 4096 |
| query_cache_size | 1024 |
| query_cache_type | ON |
| query_cache_wlock_invalidate | OFF |
| table_definition_cache | 256 |
| table_open_cache | 64 |
| thread_cache_size | 8 |
+------------------------------+------------+
14 rows in set (0.00 sec)
How are you testing the query speeds? I suspect it's not an appropriate way. The more you query the table, the more likely MySQL is to do some agressive pre-fetching on the table, meaning further queries on the table will be faster, despite they require scanning more data. The reason it is so is because MySQL will not have to load the pages from disk, since it's already pre-fetched them in memory.
As other people have stated, query cache could also mess up you test's results, especially if they implied re-running the query several times in a row to get an "average" runtime.
Add SQL_NO_CACHE to your query to see if it is the cache that tricks you.
To see what is taking time try to use PROFILING like this:
mysql> SET profiling = 1;
mysql> Your select goes here;
mysql> SHOW PROFILES;
Also, try to use the simpler query:
SELECT * FROM photo_person WHERE `photo_person`.`person_id` ='1'
I don't know if MySQL is optimising or not your query, but logically, your and this are equivalent - except that your uses a subquery - always avoid subqueries where possible