Select all rows (700000) very long time - hours - mysql

I use mysql mariadb(Server version: 10.3.20-MariaDB-1:10.3.20+maria~stretch mariadb.org binary distribution).
I have ~700 000 records with columns:
id
html (mediumtext) with very big average length in field: ~150000
date
+2 small other
In html I have very long text (it's html's).
Now I need select * from table;, to analyse this html but this query takes over ~0.03819s per query (I tested on smaller part) so: total rows 700000*0.03819s per query = (700000*0.03819s)/60/60 = over 7 hours of selecting!
I have 8 cores and 60GB of RAM. Profiling query shows that time of transferring data is very very long.
How to speed it up? It's is possible, or that much of data it's too much for mysql and I need mongodb?
query_cache_limit = 64M
query_cache_size = 1024M
max_allowed_packet = 64M
net_buffer_length = 16384
max_connect_errors = 1000
thread_concurrency = 32
concurrent_insert = 2
read_rnd_buffer_size = 8M
bulk_insert_buffer_size = 8M
query_cache_limit = 64M
query_cache_size = 1024M
query_cache_type = 1
query_prealloc_size = 262144
query_alloc_block_size = 65536
transaction_alloc_block_size = 8192
transaction_prealloc_size = 4096
max_write_lock_count = 16
innodb_buffer_pool_size=30G
innodb_flush_log_at_trx_commit=2
innodb_thread_concurrency=16
innodb_flush_method=O_DIRECT
innodb_read_io_threads = 64
innodb_write_io_threads = 16
innodb_buffer_pool_instances = 20
MariaDB [db]> explain select id, href, html from raw limit 10;
+------+-------------+-------+------+---------------+------+---------+------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+------+--------+-------+
| 1 | SIMPLE | raw | ALL | NULL | NULL | NULL | NULL | 658793 | |
+------+-------------+-------+------+---------------+------+---------+------+--------+-------+
1 row in set (0.227 sec)
after playing with indexes:
MariaDB [db]> show index from raw;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| raw | 0 | PRIMARY | 1 | id | A | 658793 | NULL | NULL | | BTREE | | |
| raw | 1 | id | 1 | id | A | 658793 | NULL | NULL | | BTREE | | |
| raw | 1 | href | 1 | href | A | 658793 | NULL | NULL | YES | BTREE | | |
| raw | 1 | date | 1 | date | A | 131758 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (3.724 sec)

38ms to fetch 150Kb from a spinning disk is quite fast.
query_cache_size = 1024M -- This is much too high. Stop at about 50M.
A PRIMARY KEY is a unique index. So, if id is the primary key, do not also say KEY(id).
It's is possible, or that much of data it's too much for mysql and I need mongodb?
Assuming you are running at disk speed, you cannot expect any other product to run faster.
What will the client do with 100GB of data in one batch? MySQL will be happy to deliver it, but the client will probably choke to death.

Related

Slow query log - Row_send record count differs with actual records

Recently in my application which uses MariaDB 10.6, I am facing some weird issues where the same query took more than the expected time and consumes more IO at random times.
Enabled slow query to trace the same where we see a query stuck more than 9min and consumes more IO.
# Time: 230119 15:25:02
# User#Host: user[user] # [192.*.*.*]
# Thread_id: 125616 Schema: DB QC_hit: No
# Query_time: 567.099806 Lock_time: 0.000500 Rows_sent: 48 Rows_examined: 10859204
# Rows_affected: 0 Bytes_sent: 0
SET timestamp=1674152702;
select column1,column2....columnN where column1=v1 and column2=v2 and column3=v3 and column4=v4 and column5=v5;
On seeing the DB processlist, more no of query are in state "Waiting for table metadata lock" and end up in bigger issues.
| 106804 | userx | IP | DB | Query | 4239 | Sending data | Q1 | 0.000 |
| 106838 | userx | IP | DB | Query | 1980 | Waiting for table metadata lock | Q2 | 0.000 |
| 107066 | userx | IP | DB | Sleep | 0 | | NULL | 0.000 |
| 107196 | userx | IP | DB | Sleep | 1 | | NULL | 0.000 |
| 107223 | userx | IP | DB | Query | 4363 | Sending data | Q3 | 0.000 |
| 107277 | userx | IP | DB | Query | 3221 | Sending data | Q4 | 0.000 |
| 107299 | userx | IP | DB | Sleep | 26 | | NULL | 0.000 |
| 107324 | userx | IP | DB | Sleep | 54 | | NULL | 0.000 |
| 107355 | userx | IP | DB | Sleep | 0 | | NULL | 0.000 |
| 107357 | userx | IP | DB | Sleep | 1 | | NULL | 0.000 |
| 107417 | userx | IP | DB | Query | 1969 | Waiting for table metadata lock | | 0.000 |
| 107462 | userx | IP | DB | Sleep | 55 | | NULL | 0.000 |
| 107489 | userx | IP | DB | Query | 1979 | Waiting for table metadata lock | Q5 | 0.000 |
| 107492 | userx | IP | DB | Sleep | 25 | | NULL | 0.000 |
| 107519 | userx | IP | DB | Query | 1981 | Waiting for table metadata lock | Q6 | 0.000 |
Currently, the manual killing of the suspected query using KILL query cmd unblocks the other query to complete, and via MariaDB property, we can use max_statement_time to terminate the long-running query.
But Is there a way to check what was killed by the max_statement_time? Unable to find any traces in error.log.
The actual query should return around 1765 records while the slow query reports row_sent as 48.
Is it a problem with scanning the table or record fetched got stuck after some time?
Or Am I misinterpreting the slow query output Row_send record count
127.0.0.1:3307> select column1,column2....columnN where column1=v1 and column2=v2 and column3=v3 and column4=v4 and column5=v5;
+----------+
| count(*) |
+----------+
| 1756 |
+----------+
1 row in set (0.006 sec)
---EDITED---
Missed adding column 5 added now in the query.
The table is indexed and let me explain the statement.
127.0.0.1:3307> explain extended select..... from Tablename where column1=v1 and column2=v2 and column3=v3 and column4=v4 and column5=v5;
+------+-------------+-------+------+---------------+---------+---------+-------------------------------+------+----------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+------+---------------+---------+---------+-------------------------------+------+----------+-------+
| 1 | SIMPLE | s | ref | PRIMARY | PRIMARY | 7 | const,const,const,const,const | 73 | 100.00 | |
+------+-------------+-------+------+---------------+---------+---------+-------------------------------+------+----------+-------+
1 row in set, 1 warning (0.007 sec)
Assuming that v1..v4 are literal constants (numeric, string, date, etc), and assuming you really mean = in all 4 cases, then
INDEX(col1, col2, col3, col4)
(in any order) is optimal for that one SELECT. This will also minimize I/O for that query.
There are many reasons for "high" I/O. I'll cover a few of them.
Data is bigger than the cache, the size of which is controlled by innodb_buffer_pool_size. How big is the table?
That setting should be configured to be about 70% of available RAM. How much RAM do you have?
If any items in the query are TEXT or BLOB, there may be extra I/O.
Currently you do not have a good index, and it had to check 10859204 rows to find the 48 you needed. With the index, above, only 49 rows need to be fetched.
"Waiting for table metadata lock" -- This implies that you are doing some serious blocking or LOCKing or ALTERing. What other queries are going on?
Since the buffer_pool is shared among all MySQL processes, some other query may be at fault.
Adding the index should eliminate the 48 vs 1765 puzzle and any need for max-query-time.
You should use ENGINE=InnoDB; MyISAM has more locking/blocking issues.
If you provide more specifics, I may have more tips.
(after your edit)
The "Rows" in EXPLAIN is an estimate; it can easily be off by a factor of 2 either way.
Rows_sent in the slowlog is exact.

Connection lost while building primary key. Fix or punt?

This question is about possible future improvements to a task I'm almost done with.
I have loaded a MySQL database with a subset of the Universal Medical Language System's Metathesaurus. I used a Java application called MetaMorphoSys, which generated a Bash wrapper, one SQL script for defining tables and importing data from text files, and another for indexing.
Loading and indexing a small UMLS subset (3.3 M rows in table MRSAT) goes to completion without errors. Loading a larger subset (39.4 M rows in MRSAT) is also successful, but then the indexing fails at this step after 1500 to 1800 seconds:
ALTER TABLE MRSAT ADD CONSTRAINT X_MRSAT_PK PRIMARY KEY BTREE (ATUI)
Error Code: 2013. Lost connection to MySQL server during query
My only use for the MySQL database is converting the relational rows to RDF triples. This conversion is performed by a single python script, which does seem to access the MRSAT table, but doesn't appear to use the ATUI column. At this point, I have extracted almost all of the data I want.
How can I tell if the absence of the primary key is detrimental to the performance of the RDF-generation queries?
I have increased some timeouts but haven't made all of the changes in suggested in other answers to that question.
The documentation from the provider suggests MySQL 5.5 over 5.6 due to disk space usage issues. I am using 5.6 anyway (as I have done in the past) on a generous AWS x1e.2xlarge instance running Ubuntu 18.
The documentation provides tuning suggestions for 5.5, but I don't see equivalent settings names in the 5.6 documentation. I have applied these:
bulk_insert_buffer_size = 100M
join_buffer_size = 100M
myisam_sort_buffer_size = 200M
query_cache_limit = 3M
query_cache_size = 100M
read_buffer_size = 200M
sort_buffer_size = 500M
For key_buffer = 600M I did key_buffer_size= 600M. I didn't do anything for table_cache = 300
The primary key is supposed to be set on the alphanumerical column ATUI
mysql> select * from MRSAT limit 9;
+----------+----------+----------+-----------+-------+---------+-------------+-------+--------+-----+------------+----------+------+
| CUI | LUI | SUI | METAUI | STYPE | CODE | ATUI | SATUI | ATN | SAB | ATV | SUPPRESS | CVF |
+----------+----------+----------+-----------+-------+---------+-------------+-------+--------+-----+------------+----------+------+
| C0000005 | L0000005 | S0007492 | A26634265 | AUI | D012711 | AT212456753 | NULL | TH | MSH | UNK (19XX) | N | NULL |
| C0000005 | L0000005 | S0007492 | A26634265 | AUI | D012711 | AT212480766 | NULL | TERMUI | MSH | T037573 | N | NULL |
| C0000005 | L0000005 | S0007492 | A26634265 | SCUI | D012711 | AT60774257 | NULL | RN | MSH | 0 | N | NULL |
| C0000005 | L0270109 | S0007491 | A26634266 | AUI | D012711 | AT212327137 | NULL | TERMUI | MSH | T037574 | N | NULL |
| C0000005 | L0270109 | S0007491 | A26634266 | AUI | D012711 | AT212456754 | NULL | TH | MSH | UNK (19XX) | N | NULL |
| C0000005 | NULL | NULL | NULL | CUI | NULL | AT00368929 | NULL | DA | MTH | 19900930 | N | NULL |
| C0000005 | NULL | NULL | NULL | CUI | NULL | AT01344283 | NULL | MR | MTH | 20020910 | N | NULL |
| C0000005 | NULL | NULL | NULL | CUI | NULL | AT02319637 | NULL | ST | MTH | R | N | NULL |
| C0000039 | L0000035 | S0007560 | A26674543 | AUI | D015060 | AT212481191 | NULL | TH | MSH | UNK (19XX) | N | NULL |
+----------+----------+----------+-----------+-------+---------+-------------+-------+--------+-----+------------+----------+------+

RDS High CPU utilization

I am facing high CPU utilization issue, is too many concurrent create temporary table statement cause high CPU utilization?
Is there any query through that we can capture queries which causing high CPU utilization?
Variable we set:-
tmp_table_size = 1G
max_heap_table_size = 1G
innodb_buffer_pool_size = 145 G
innodb_buffer_pool_instance = 8
innodb_page_cleaner = 8
Status Variables:-
mysql> show global status like '%tmp%';
+-------------------------+-----------+
| Variable_name | Value |
+-------------------------+-----------+
| Created_tmp_disk_tables | 60844516 |
| Created_tmp_files | 135751 |
| Created_tmp_tables | 107643364 |
+-------------------------+-----------+
mysql> show global status like '%innodb_buffer%';
+---------------------------------------+--------------------------------------------------+
| Variable_name | Value |
+---------------------------------------+--------------------------------------------------+
| Innodb_buffer_pool_dump_status | Dumping of buffer pool not started |
| Innodb_buffer_pool_load_status | Buffer pool(s) load completed at 170917 19:11:45 |
| Innodb_buffer_pool_resize_status | |
| Innodb_buffer_pool_pages_data | 8935464 |
| Innodb_buffer_pool_bytes_data | 146398642176 |
| Innodb_buffer_pool_pages_dirty | 18824 |
| Innodb_buffer_pool_bytes_dirty | 308412416 |
| Innodb_buffer_pool_pages_flushed | 122454921 |
| Innodb_buffer_pool_pages_free | 188279 |
| Innodb_buffer_pool_pages_misc | 377817 |
| Innodb_buffer_pool_pages_total | 9501560 |
| Innodb_buffer_pool_read_ahead_rnd | 0 |
| Innodb_buffer_pool_read_ahead | 585245 |
| Innodb_buffer_pool_read_ahead_evicted | 14383 |
| Innodb_buffer_pool_read_requests | 304878851665 |
| Innodb_buffer_pool_reads | 10537188 |
| Innodb_buffer_pool_wait_free | 0 |
| Innodb_buffer_pool_write_requests | 14749510186 |
+---------------------------------------+--------------------------------------------------+
Step 1 -
show processlist
Find is any process is locking table if yes than change it to myisam.
Step 2 -
Check Ram and your db size
Step 3 -
Explain complex queries and check if file sort or maximum number od rows are getting scan remove it either by making table flat , not more than 4 sub queries
Step 4 -
Use of joins efficiently

Huge Temp table going on disk In mysql

How to avoid temporary table creation on disk in mysql if table has multiple text column.
I have given tmp_table_size to 1 GB and Max_heap_size = GB
Still tables going on Disk.
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 70742 |
| Created_tmp_files | 6 |
| Created_tmp_tables | 71076 |
Thanks :-)

why information_schema.tables.data_free always be 8388608?

Dose any one knows why information_schema.tables.data_free in InnoDB always be 8388608, no matter how many rows in tables;
table_schema table_name table_rows data_free engine
g33v1 appraise 0 8388608 InnoDB
g33v1 areatype 12403 8388608 InnoDB
g33v1 atype 581982 8388608 InnoDB
g33v1 atype2 579700 8388608 InnoDB
thanks.
I think it is the temporary table maximum size allocated for doing sorting and other things which requires the space on HDD. As the same things appears in other cases also
mysql> show global variables like '%tmp%';
+----------------+-----------------------+
| Variable_name | Value |
+----------------+-----------------------+
| bdb_tmpdir | /usr/local/mysql/tmp/ |
| max_tmp_tables | 32 |
| tmp_table_size | 8388608 |
| tmpdir | /usr/local/mysql/tmp |
+----------------+-----------------------+
mysql> show global variables like '%myisam%';
+---------------------------------+---------------+
| Variable_name | Value |
+---------------------------------+---------------+
| myisam_data_pointer_size | 4 |
| myisam_max_extra_sort_file_size | 2147483648 |
| myisam_max_sort_file_size | 2147483647 |
| myisam_recover_options | OFF |
| myisam_repair_threads | 1 |
| myisam_sort_buffer_size | 4194304 |
| myisam_stats_method | nulls_unequal |
+---------------------------------+---------------