I was checking the slow-query-log of MySQL, and found out an entry as below:
# Time: 131108 4:16:34
# Query_time: 14.726425 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 1
SET timestamp=1383884194;
UPDATE `Artist` SET ImageFilename = NULL, Title = 'Elton John', PopularityRating = 657, UniqueID = NULL, Description = NULL, IsFeatured = 0, FeaturedText = '', MetaDescription = '', MetaTitle = NULL, _Temporary_LastUpdOn = '2013-11-08 04:15:58 ', _Temporary_Flag = 0, _Deleted = 0, _DeletedOn = NULL, Priority = 0 WHERE ID = 3449748;
As you can see, it took a staggering 14.72sec to perform this query, when it is a simple update with just a WHERE by primary key. I've tried re-executing the query, but now it executing in 0.095sec which is much more reasonable.
Any ideas how I can debug why at that specific time it took so long?
Edit 1: query_cache% variables
mysql> SHOW variables where variable_name like 'query_cache%';
+------------------------------+-----------+
| Variable_name | Value |
+------------------------------+-----------+
| query_cache_limit | 1048576 |
| query_cache_min_res_unit | 4096 |
| query_cache_size | 210763776 |
| query_cache_type | ON |
| query_cache_wlock_invalidate | OFF |
+------------------------------+-----------+
Edit 2: Artist table info
CREATE TABLE `artist` (
`ID` bigint(20) NOT NULL,
`ImageFilename` mediumtext,
`Title` varchar(1000) DEFAULT NULL,
`PopularityRating` int(11) DEFAULT '0',
`UniqueID` mediumtext,
`Description` mediumtext,
`IsFeatured` tinyint(1) DEFAULT '0',
`FeaturedText` mediumtext,
`_Temporary_LastUpdOn` datetime DEFAULT '0001-01-01 00:00:00',
`_Temporary_Flag` tinyint(1) DEFAULT '0',
`_Deleted` tinyint(1) DEFAULT '0',
`_DeletedOn` datetime DEFAULT NULL,
`Priority` int(11) DEFAULT '0',
`MetaDescription` varchar(2000) DEFAULT NULL,
`MetaTitle` mediumtext,
PRIMARY KEY (`ID`),
KEY `_Temporary_Flag` (`_Temporary_Flag`),
KEY `_Deleted` (`_Deleted`),
KEY `Priority` (`Priority`),
KEY `PopularityRating` (`PopularityRating`),
KEY `Title` (`Title`(255)),
KEY `IsFeatured` (`IsFeatured`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Given the output you have provided, my suggestion here would be to minimize your cache size. Its is ofcourse only my best assumption that this caused the update time to span over 15 seconds because the query itself is optimal using WHERE on PRIMARY KEY.
Since you havent been able to reproduced the problem its hard to determine.
I was reading the cache documentation again to get some info.
When tables are modified, any relevant entries in the query cache are flushed.
This could be a reason for the update you did that it had to flush cached data.
Another part of the docmentation
Be cautious about sizing the query cache excessively large, which
increases the overhead required to maintain the cache, possibly beyond
the benefit of enabling it. Sizes in tens of megabytes are usually
beneficial. Sizes in the hundreds of megabytes might not be.
Either way, since you have the query cache enabled, i think thats a good starting point.
To set a new query cache while in production
SET GLOBAL query_cache_size = 1000000;
Mysql will automatically set the size to be aligned to the nearest 1024 byte block.
Read this documentation well, its very helpful to understand. Query cache can at the same time be your best and your worst setting.
http://dev.mysql.com/doc/refman/5.1/en/query-cache.html
There's a problem of your table. You created several indexes for the table, which includes fields you will update in the sql. Then mysql has to reconstruct the index everytime.
I think you have not tuned MySQL server variables, It is important to tune server variables to increase performance. It is recommended to have a look at the key_buffer_size and table_cache variables.
The key_buffer_size variable controls the amount of memory available for the MySQL index buffer. The higher this value, the more memory available for indexes and better the performance.
The table_cache variable controls the amount of memory available for the table cache, and thus the total number of tables MySQL can hold open at any given time. For busy servers with many databases and tables, this value should be increased so that MySQL can serve all requests reliably.
In case someone missed the comment above:
Maybe the table was locked that time.
Since you could not reproduce the problem, this was likely the case.
Related
Working on Debian 11 (Bullseye) first with the distribution's MariaDB version 10.5 and now with the version 10.6.7 from MariaDB repositories.
I'm failing to get correct indexes for some big tables from the dump of a genetics database (ensembl homo_sapiens_variation_106_37) from here: ftp://ftp.ensembl.org/pub/grch37/release-106/mysql/homo_sapiens_variation_106_37/
The one table is variation_feature:
CREATE TABLE `variation_feature` (
`variation_feature_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`seq_region_id` int(10) unsigned NOT NULL,
`seq_region_start` int(11) NOT NULL,
`seq_region_end` int(11) NOT NULL,
`seq_region_strand` tinyint(4) NOT NULL,
`variation_id` int(10) unsigned NOT NULL,
`allele_string` varchar(50000) DEFAULT NULL,
`ancestral_allele` varchar(50) DEFAULT NULL,
`variation_name` varchar(255) DEFAULT NULL,
`map_weight` int(11) NOT NULL,
`flags` set('genotyped') DEFAULT NULL,
`source_id` int(10) unsigned NOT NULL,
`consequence_types` set('intergenic_variant','splice_acceptor_variant','splice_donor_variant','stop_lost','coding_sequence_variant','missense_variant','stop_gained','synonymous_variant','frameshift_variant','non_coding_transcript_variant','non_coding_transcript_exon_variant','mature_miRNA_variant','NMD_transcript_variant','5_prime_UTR_variant','3_prime_UTR_variant','incomplete_terminal_codon_variant','intron_variant','splice_region_variant','downstream_gene_variant','upstream_gene_variant','start_lost','stop_retained_variant','inframe_insertion','inframe_deletion','transcript_ablation','transcript_fusion','transcript_amplification','transcript_translocation','TFBS_ablation','TFBS_fusion','TFBS_amplification','TFBS_translocation','regulatory_region_ablation','regulatory_region_fusion','regulatory_region_amplification','regulatory_region_translocation','feature_elongation','feature_truncation','regulatory_region_variant','TF_binding_site_variant','protein_altering_variant','start_retained_variant') NOT NULL DEFAULT 'intergenic_variant',
`variation_set_id` set('1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41','42','43','44','45','46','47','48','49','50','51','52','53','54','55','56','57','58','59','60','61','62','63','64') NOT NULL DEFAULT '',
`class_attrib_id` int(10) unsigned DEFAULT '0',
`somatic` tinyint(1) NOT NULL DEFAULT '0',
`minor_allele` varchar(50) DEFAULT NULL,
`minor_allele_freq` float DEFAULT NULL,
`minor_allele_count` int(10) unsigned DEFAULT NULL,
`alignment_quality` double DEFAULT NULL,
`evidence_attribs` set('367','368','369','370','371','372','418','421','573','585') DEFAULT NULL,
`clinical_significance` set('uncertain significance','not provided','benign','likely benign','likely pathogenic','pathogenic','drug response','histocompatibility','other','confers sensitivity','risk factor','association','protective','affects') DEFAULT NULL,
`display` int(1) DEFAULT '1',
PRIMARY KEY (`variation_feature_id`),
KEY `pos_idx` (`seq_region_id`,`seq_region_start`,`seq_region_end`),
KEY `variation_idx` (`variation_id`),
KEY `variation_set_idx` (`variation_set_id`),
KEY `consequence_type_idx` (`consequence_types`),
KEY `source_idx` (`source_id`)
) ENGINE=MyISAM AUTO_INCREMENT=743963234 DEFAULT CHARSET=latin1;
It has over 700,000,000 records and occupies on the disk:
# ls -lh variation_feature.*
-rw-rw---- 1 mysql mysql 56K Mai 3 09:41 variation_feature.frm
-rw-rw---- 1 mysql mysql 55G Mai 2 20:44 variation_feature.MYD
-rw-rw---- 1 mysql mysql 61G Mai 2 22:27 variation_feature.MYI
Despite not getting any errors importing the variation_feature.txt some essential indexes are not working.
In this case, selecting a known row of data based on variation_id won't return anything, e.g.
SELECT *
FROM variation_feature
WHERE variation_id = 617544728;
--> nothing
The value 617544728 seems not to be in the index, because
SELECT variation_id
FROM variation_feature
WHERE variation_id = 617544728;
--> nothing
Disabling the index and waiting for the long table scan returns the row:
ALTER TABLE variation_feature ALTER INDEX variation_idx IGNORED;
SELECT *
FROM variation_feature
WHERE variation_id = 617544728;
variation_feature_id seq_region_id seq_region_start seq_region_end seq_region_strand variation_id allele_string ancestral_allele variation_name map_weight flags source_id consequence_types variation_set_id class_attrib_id somatic minor_allele minor_allele_freq minor_allele_count alignment_quality evidence_attribs clinical_significance display
-------------------- ------------- ---------------- -------------- ----------------- ------------ ------------- ---------------- -------------- ---------- ------ --------- ----------------- ------------------------------------------------------------------------------------------- --------------- ------- ------------ ----------------- ------------------ ----------------- ------------------------------- --------------------- -------
632092737 27511 230845794 230845794 1 617544728 A/G G rs699 1 <null> 1 missense_variant 2,5,6,9,10,11,12,13,15,16,17,23,24,25,26,30,40,42,43,44,45,47,48,49,50,51,52,53,54,55,56,57 2 false A 0.2949 1477 <null> 368,370,371,372,418,421,573,585 benign,risk factor 1
myisamchk is fixing the indexes without error, but the index "variation_idx" won't work.
DROPping and re-CREATing the one index runs without error, but the index won't work.
The other indexes are OK.
In another genome of this database (ensembl homo_sapiens_variation_106_38 - slightly bigger) from here: ftp://ftp.ensembl.org/pub/release-106/mysql/homo_sapiens_variation_106_38/ I have the same problem (on another computer but with the same program installations).
With one difference: there is also the PRIMARY KEY (variation_feature_id) not working.
myisamchk is also running without error, but to no avail.
mysqlcheck (version 10.6 running very slow compared to 10.5) returns on the first computer then:
homo_sapiens_variation_106_37.variation_feature
error : Key in wrong position at page 22405134336
error : Corrupt
Now, this we know, but no repair tool can really repair or give a hint, what's wrong.
I've CREATEd an index on variation_name: it's working.
My changes to mariadb.cnf to adapt to the huge databases and the mysql versions of ensembl:
[mysqld]
## this was some time ago - because some bug mysql or mariadb didn't took this from the system
default-time-zone = Europe/Berlin
## ensembl has mysql version 5.6
## because of table creation scripts I need compatibility:
show_compatibility_56 = ON
performance_schema
## ensembl writes in DATETIME fields: "0000-00-00 00:00:00"
## the new sql_mode in 5.7 doesn't allow it any more
## SHOW VARIABLES LIKE 'sql_mode' ;
## ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
## so change sql_mode deleting NO_ZERO_IN_DATE,NO_ZERO_DATE
sql_mode = ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
datadir = /mnt/SRVDATA/var/lib/mysql
tmpdir = /mnt/WORK/tmp
[mariadb]
[mariadb-10.6]
## MyISAM for building ensembl homo_sapiens
lower_case_table_names=1
bulk_insert_buffer_size = 1G
myisam_sort_buffer_size = 56G
sort_buffer_size = 56G
Thank you for the patience to read all this.
sort_buffer_size should be limited to 1% of RAM. The huge setting you have could cause swapping, which is terrible for performance. Ditto for myisam_sort_buffer_size, though it it probably not relevant to the Question.
You should consider moving from MyISAM to InnoDB, if for no other reason than avoiding corrupt indexes. It will, however, increase the disk footprint (data + indexes) to upwards of 300GB.
InnoDB is much better at concurrent access to huge tables.
There are several Q&A for "Why is InnoDB (much) slower than MyISAM", but I could not find any topic for the opposite.
So I had a table defined as InnoDB wherin I stored file contents in a blob field. Because normally for that MyISAM should be used I switched over that table. Here is its structure:
CREATE TABLE `liv_fx_files_files` (
`fid` int(11) NOT NULL AUTO_INCREMENT,
`filedata` longblob NOT NULL,
`filetype` varchar(255) NOT NULL,
`filename` varchar(255) NOT NULL,
`filesize` int(11) NOT NULL,
`context` varchar(1) NOT NULL DEFAULT '',
`saveuser` varchar(32) NOT NULL,
`savetime` int(11) NOT NULL,
`_state` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`fid`),
KEY `_state` (`_state`)
) ENGINE=MyISAM AUTO_INCREMENT=4550 DEFAULT CHARSET=utf8;
There are 4549 records stored in it so far (with filedata going from 0 to 48M. Sum of all files is about 6G.
So whenever I need to know current total filesize I issue the query
SELECT SUM(filesize) FROM liv_fx_files_files;
The problem is that since I switched from InnoDB to MyISAM this simple query lasts really long (about 30sec and longer) whereas on InnoDB it was done in a under one second.
But aggregations are not the only queries which are very slow; it's almost every query.
I guess I could fix it by adopting config (which is currently optimized for InnoDB (only) use), but don't know which settings to adjust. Does anyone have a hint for me please?
current mysql server config (SHOW VARIABLES as csv)
Example for another query fired on both table types (both contain exact same data and have same definition). All other tested queries behave the same, say run much longer against MyISAM table as InnoDB!
SELECT sql_no_cache `fxfilefile`.`filename` AS `filename` FROM `myisamtable`|`innodbtable` AS `fxfilefile` WHERE `fxfilefile`.`filename` LIKE '%foo%';
Executive Summary: Use InnoDB, and change the my.cnf settings accordingly.
Details:
"MyISAM is faster" -- This is an old wives' tale. Today, InnoDB is faster in most situations.
Assuming you have at least 4GB of RAM...
If all-MyISAM, key_buffer_size should be about 20% of RAM; innodb_buffer_pool_size should be 0.
If all-InnoDB, key_buffer_size should be, say, only 20MB; innodb_buffer_pool_size should be about 70% of RAM.
If a mixture, do something in between. More discussion.
Let's look at how things are handled differently by the two Engines.
MyISAM puts the entire BLOB 'inline' with the other columns.
InnoDB puts most or all of each blob in other blocks.
Conclusion:
A table scan in a MyISAM table spends a lot of time stepping over cow paddies; InnoDB is much faster if you don't touch the BLOB.
This makes InnoDB a clear winner for SELECT SUM(x) FROM tbl; when there is no index on x. With INDEX(x), either engine will be fast.
Because of the BLOB being inline, MyISAM has fragmentation issues if you update records in the table; InnoDB has much less fragmentation. This impacts all operations, making InnoDB the winner again.
The order of the columns in the CREATE TABLE has no impact on performance in either engine.
Because the BLOB dominates the size of each row, the tweaks to the other columns will have very little impact on performance.
If you decide to go with MyISAM, I would recommend a 'parallel' table ('vertical partitioning'). Put the BLOB and the id in it a separate table. This would help MyISAM come closer to InnoDB's model and performance, but would add complexity to your code.
For "point queries" (looking up a single row via an index), there won't be much difference in performance between the engines.
Your my.cnf seems antique; set-variable has not been necessary in a long time.
Try to edit your MySQL config file, usually /etc/mysql/my.cnf and use "huge" preset.
# The MySQL server
[mysqld]
port = 3306
socket = /var/run/mysqld/mysqld.sock
skip-locking
set-variable = key_buffer=384M
set-variable = max_allowed_packet=1M
set-variable = table_cache=512
set-variable = sort_buffer=2M
set-variable = record_buffer=2M
set-variable = thread_cache=8
# Try number of CPU's*2 for thread_concurrency
set-variable = thread_concurrency=8
set-variable = myisam_sort_buffer_size=64M
Certainly 30 seconds to read 4500 records is very slow. Assuming there is plenty of room for I/O caching then the first thing I would try is to change the order of the fields; if these are written to the table in the order they are declared the DBMS would need to seek to the end of each record before reading the size value (I'd also recommend capping the size of those vharchar(255) columns, and that varhar(1) NOT NULL should be CHAR).
CREATE TABLE `liv_fx_files_files2` (
`fid` int(11) NOT NULL AUTO_INCREMENT,
`filesize` int(11) NOT NULL,
`context` char(1) NOT NULL DEFAULT '',
`saveuser` varchar(32) NOT NULL,
`savetime` int(11) NOT NULL,
`_state` int(11) NOT NULL DEFAULT '0',
`filetype` varchar(255) NOT NULL,
`filename` varchar(255) NOT NULL,
`filedata` longblob NOT NULL,
PRIMARY KEY (`fid`),
KEY `_state` (`_state`)
) ENGINE=MyISAM AUTO_INCREMENT=4550 DEFAULT CHARSET=utf8;
INSERT INTO liv_fx_files_files2
(fid, filesize, context, saveuser, savetime, _state, filetype, filename, filedata)
SELECT fid, filesize, context, saveuser, savetime, _state, filetype, filename, filedata
FROM liv_fx_files_files;
But ideally I'd split the data and metadata into separate tables.
I am running a CMS, but this has nothing do to with it.
I have a simple query which is:
UPDATE e107_online SET `online_location` = 'http://page.com/something.php?', `online_pagecount` = 133 WHERE `online_ip` = '175.44.*.*' AND `online_user_id` = '0' LIMIT 1;
but the same query reported from my website support gives that:
User#Host: cosyclim_website[cosyclim_website] # localhost []
Thread_id: 7493739 Schema: cosyclim_website
Query_time: 12.883518 Lock_time: 0.000028 Rows_sent: 0 Rows_examined: 0 Rows_affected: 1 Rows_read: 1
It takes 12 (almost 13) seconds for a simple update query? Is there a way I could optimize it somehow? If I run it through PhpMyAdmin it takes 0.0003s.
The table:
CREATE TABLE IF NOT EXISTS `e107_online` (
`online_timestamp` int(10) unsigned NOT NULL default '0',
`online_flag` tinyint(3) unsigned NOT NULL default '0',
`online_user_id` varchar(100) NOT NULL default '',
`online_ip` varchar(15) NOT NULL default '',
`online_location` varchar(255) NOT NULL default '',
`online_pagecount` tinyint(3) unsigned NOT NULL default '0',
`online_active` int(10) unsigned NOT NULL default '0',
KEY `online_ip` (`online_ip`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Your query is updating one row which meets certain criteria:
UPDATE e107_online
SET `online_location` = 'http://page.com/something.php?', `online_pagecount` = 133
WHERE `online_ip` = '175.44.*.*' AND `online_user_id` = '0'
LIMIT 1;
Given that you have ip addresses, I'm guessing that this table is pretty big. Millions and millions and millions of rows. There are many reasons why an update can take a long time -- such as server load, blocking transactions, and log file performance. In this case, let's make the assumption that the problem is finding one of the rows. You can test this just by doing a select with the same conditions and see how long that takes.
Assuming the select is consistently slow, then the problem can probably be fixed with indexes. If the table has no indexes -- or if MySQL cannot use existing indexes -- then it needs to do a full table scan. And, perhaps the one record that matches is at the end of the table. It takes a while to find it.
I would suggest adding an index on either e107_online(online_ip) or e107_online(online_user_id, online_ip) to help it find the record faster. The index needs to be a b-tree index, as explained here.
One consequence of using an index is that the ip with the lowest matching value will probably be the one chosen. I don't know if this lack of randomness makes a difference in your application.
Is it just this query that is slow, or are queries from your website generally slower? phpMyAdmin is most likely running queries directly on the machine your database lives on, which means network latency is effectively 0ms. I would have suggested adding an index including the two columns in your WHERE clause, but with <50 rows that doesn't make any sense. This is going to come down to a blockage between your website and database server.
Also make sure you're not doing anything silly like running without connection pooling turned on (or creating a ton of connections unnecessarily). I've seen connection pools that had run out of space cause problems similar to this.
I have a large table in MySQL (running within MAMP) it has 28 million rows and its 3.1GB in size. Here is its structure
CREATE TABLE `termusage` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`termid` bigint(20) DEFAULT NULL,
`date` datetime DEFAULT NULL,
`dest` varchar(255) DEFAULT NULL,
`cost_type` tinyint(4) DEFAULT NULL,
`cost` decimal(10,3) DEFAULT NULL,
`gprsup` bigint(20) DEFAULT NULL,
`gprsdown` bigint(20) DEFAULT NULL,
`duration` time DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `termid_idx` (`termid`),
KEY `date_idx` (`date`),
KEY `cost_type_idx` (`cost_type`),
CONSTRAINT `termusage_cost_type_cost_type_cost_code` FOREIGN KEY (`cost_type`) REFERENCES `cost_type` (`cost_code`),
CONSTRAINT `termusage_termid_terminal_id` FOREIGN KEY (`termid`) REFERENCES `terminal` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=28680315 DEFAULT CHARSET=latin1
Here is the output from SHOW TABLE STATUS :
Name,Engine,Version,Row_format,Rows,Avg_row_length,Data_length,Max_data_length,Index_length,Data_free,Auto_increment,Create_time,Update_time,Check_time,Collation,Checksum,Create_options,Comment
'termusage', 'InnoDB', '10', 'Compact', '29656469', '87', '2605711360', '0', '2156920832', '545259520', '28680315', '2011-08-16 15:16:08', NULL, NULL, 'latin1_swedish_ci', NULL, '', ''
Im trying to run the following select statement :
select u.id from termusage u
where u.date between '2010-11-01' and '2010-12-01'
it takes 35 minutes to return to result (approx 14 million rows) - this is using MySQL Worksbench.
I have the following MySQL config setup :
Variable_name Value
bulk_insert_buffer_size 8388608
innodb_buffer_pool_instances 1
innodb_buffer_pool_size 3221225472
innodb_change_buffering all
innodb_log_buffer_size 8388608
join_buffer_size 131072
key_buffer_size 8388608
myisam_sort_buffer_size 8388608
net_buffer_length 16384
preload_buffer_size 32768
read_buffer_size 131072
read_rnd_buffer_size 262144
sort_buffer_size 2097152
sql_buffer_result OFF
Eventually im trying to run a larger query - that joins a couple of tables and groups some data, all based on the variable - customer id -
select c.id,u.termid,u.cost_type,count(*) as count,sum(u.cost) as cost,(sum(u.gprsup) + sum(u.gprsdown)) as gprsuse,sum(time_to_sec(u.duration)) as duration
from customer c
inner join terminal t
on (c.id = t.customer)
inner join termusage u
on (t.id = u.termid)
where c.id = 1 and u.date between '2011-03-01' and '2011-04-01' group by c.id,u.termid,u.cost_type
This returns a maximum of 8 rows (as there are only 8 separate cost_types - but this query runs OK where there are not many (less than 1 million) rows in the termusage table to calculate - but takes forever when the number of rows in the termusage table is large - how can I reduce the select time.
Data is added to the termusage table once a month from CSV files using LOAD DATA method - so it doesn't need to be quite so tuned for inserts.
EDIT : Show explain on main query :
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,c,const,PRIMARY,PRIMARY,8,const,1,"Using index; Using temporary; Using filesort"
1,SIMPLE,u,ALL,"termid_idx,date_idx",NULL,NULL,NULL,29656469,"Using where"
1,SIMPLE,t,eq_ref,"PRIMARY,customer_idx",PRIMARY,8,wlnew.u.termid,1,"Using where"
Looks like you're asking two questions - correct?
The most likely reason the first query is taking so long is because it's IO-bound. It takes a long time to transfer 14 million records from disk and down the wire to your MySQL work bench.
Have you tried putting the second query though "explain"? Yes, you only get back 8 rows - but the SUM operation may be summing millions of records.
I'm assuming the "customer" and "terminal" tables are appropriately indexed? As you're joining on the primary key on termusage, that should be really quick...
You could try removing the where clause restricting by date and instead put an IF statement in the select so that if the date is within these boundaries, the value is returned otherwise a zero value is returned. The SUM will then of course only sum values which lie in this range as all others will be zero.
It sounds a bit nonsensical to fetch more rows than you need but we observed recently on an Oracle DB that this made quite a huge improvement. Of course it will be dependent on many other factors but it might be worth a try.
You may also think about breaking down the table into years or months. So you have a termusage_2010, termusage_2011, ... or something like this.
Not a very nice solution, but seeing your table is quite large it might be usefull on a smaller server.
I am running sql queries on a mysql db table that has 110Mn+ unique records for whole day.
Problem: Whenever I run any query with "where" clause it takes at least 30-40 mins. Since I want to generate most of data on the next day, I need access to whole db table.
Could you please guide me to optimize / restructure the deployment model?
Site description:
mysql Ver 14.12 Distrib 5.0.24, for pc-linux-gnu (i686) using readline 5.0
4 GB RAM,
Dual Core dual CPU 3GHz
RHEL 3
my.cnf contents :
[mysqld]
datadir=/data/mysql/data/
socket=/tmp/mysql.sock
sort_buffer_size = 2000000
table_cache = 1024
key_buffer = 128M
myisam_sort_buffer_size = 64M
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1
[mysql.server]
user=mysql
basedir=/data/mysql/data/
[mysqld_safe]
err-log=/data/mysql/data/mysqld.log
pid-file=/data/mysql/data/mysqld.pid
[root#reports root]#
DB table details:
CREATE TABLE `RAW_LOG_20100504` (
`DT` date default NULL,
`GATEWAY` varchar(15) default NULL,
`USER` bigint(12) default NULL,
`CACHE` varchar(12) default NULL,
`TIMESTAMP` varchar(30) default NULL,
`URL` varchar(60) default NULL,
`VERSION` varchar(6) default NULL,
`PROTOCOL` varchar(6) default NULL,
`WEB_STATUS` int(5) default NULL,
`BYTES_RETURNED` int(10) default NULL,
`RTT` int(5) default NULL,
`UA` varchar(100) default NULL,
`REQ_SIZE` int(6) default NULL,
`CONTENT_TYPE` varchar(50) default NULL,
`CUST_TYPE` int(1) default NULL,
`DEL_STATUS_DEVICE` int(1) default NULL,
`IP` varchar(16) default NULL,
`CP_FLAG` int(1) default NULL,
`USER_LOCATE` bigint(15) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=200000000;
Thanks in advance!
Regards,
I would encourage you to learn how to use EXPLAIN to analyze the database's plan for query optimization. Also see Baron Schwartz' presentation EXPLAIN Demystified (link to PDF of his slides is on that page).
Learn how to create indexes -- this is not the same thing as a primary key or an auto-increment pseudokey. See the presentation More Mastering the Art of Indexing by Yoshinori Matsunobu.
Your table could use an index on CP_FLAG and WEB_STATUS.
CREATE INDEX CW ON RAW_LAW_20100503 (CP_FLAG, WEB_STATUS);
This helps to look up the subset of rows based on your cp_flag condition.
Then you still run into MySQL's unfortunate inefficiency with GROUP BY queries. It copies an interim result set into a temporary file on disk and sorts it there. Disk I/O tends to kill performance.
You can raise your sort_buffer_size configuration parameter until it's large enough that MySQL can sort the result set in memory instead of on disk. But that might not work.
You might have to resort to precalculating the COUNT() you need, and update this statistic periodically.
The comment from #Marcus gave me another idea. You're grouping by web status, and the set of distinct values of web status is a fairly short list and they don't change. So you could run a separate query for each distinct value and generate the results you need much faster than by using a GROUP BY query that creates a temp table to do the sorting. Or you could run a subquery for each status value, and UNION them together:
(SELECT COUNT(*), WEB_STATUS FROM RAW_LOG_20100504 WHERE CP_FLAG > 0 AND WEB_STATUS = 200)
UNION
(SELECT COUNT(*), WEB_STATUS FROM RAW_LOG_20100504 WHERE CP_FLAG > 0 AND WEB_STATUS = 404)
UNION
(SELECT COUNT(*), WEB_STATUS FROM RAW_LOG_20100504 WHERE CP_FLAG > 0 AND WEB_STATUS = 304)
UNION
...etc...
ORDER BY 1 DESC;
Because your covering index includes CP_FLAG and WEB_STATUS, these queries never need to read the actual rows in the table. They only read entries in the index, which they can access much faster because (a) they're in a sorted tree, and (b) they may be cached in memory if you allocate enough to your key_buffer_size.
The EXPLAIN report I tried (with 1M rows of test data) shows that this uses indexes well, and does not create a temp table:
+------+--------------+------------------+------+--------------------------+
| id | select_type | table | key | Extra |
+------+--------------+------------------+------+--------------------------+
| 1 | PRIMARY | RAW_LOG_20100504 | CW | Using where; Using index |
| 2 | UNION | RAW_LOG_20100504 | CW | Using where; Using index |
| 3 | UNION | RAW_LOG_20100504 | CW | Using where; Using index |
| NULL | UNION RESULT | <union1,2,3> | NULL | Using filesort |
+------+--------------+------------------+------+--------------------------+
The Using filesort for the last line just means it has to sort without the benefit of an index. But sorting the three rows produced by the subqueries is trivial and MySQL does it in memory.
When designing optimal database solutions, there are rarely simple answers. A lot depends on how you use the data and what kind of queries are of higher priority to make fast. If there were a single, simple answer that worked in all circumstances, the software would just enable that design by default and you wouldn't have to do anything.
You really need to read a lot of manuals, books and blogs to understand how to take most advantage of all the features available to you.
Yes, I would still recommend using indexes. Clearly it was not working before, when you were querying 100 million rows without the benefit of an index.
You have to understand that you must design indexes that benefit the specific query you want to run. I have no way of knowing if the index you just described in your comment is appropriate, because you haven't shown the other query you're trying to speed up.
Indexing is a complex topic. If you define the index on the wrong columns, or if you get the columns in the wrong order, it may not be usable by a given query. I've been supporting SQL developers since 1994, and I've never found a single, concise rule to explain how to design indexes.
You seem like you need a mentor, because you're at a stage where you need a lot of questions answered. Is there someone where you work that you could ask to help you?
Add an index to any field that is in your where clause. Primary keys need to be unique; unique indexes need to be unique but uniqueness is not a prerequisite for an index.
Badly defined or non-existent indexes are one of the primary reasons for poor performance, and fixing these can often lead to phenomenal improvements
Quick info:
http://www.databasejournal.com/features/mysql/article.php/1382791/Optimizing-MySQL-Queries-and-Indexes.htm
http://www.tizag.com/mysqlTutorial/mysql-index.php