I'm learning about MySQL performance with a pet project consisting of ~2million rows + ~600k rows (two MyISAM tables). A range query using BETWEEN on two INT(10) indexed columns, LIMITed to 1 returned result takes about 160ms (including an INNER JOIN). I figure my configuration isn't optimised and am looking for some advice on how to either diagnose, or perhaps "common configuration".
I created a gist containing both tables, the query and the contents of my.cnf.
I created the b-tree index after inserting all data which was imported from a CSV file from MaxMinds open database. I tried two separate, and now a combined index with no difference in performance.
I'm running this locally on a Macbook Pro clocking at 2,6GHz (i5) and 8GB 1600MHz RAM. MySQL is installed using the downloadable binary from mysql's download page (unable to supply a third link because my rep is to low). It's a default installation with no major additions to the my.cnf config-file, included in the gist (located under /usr/local/mysql-5.6.xxx/ directory on my system).
My concern is that I'm reaching ~160ms which indicates to me that I'm missing something. I've considered compressing the table but I have a feeling that I'm missing other configurations. Also the myisampack wasn't in my PATH (I think) so I'm considering other optimisations before I explore this further.
Any advice is appreciated!
$ mysql --version
/usr/local/mysql-5.6.23-osx10.8-x86_64/bin/mysql Ver 14.14 Distrib 5.6.23, for osx10.8 (x86_64) using EditLine wrapper
Tables
CREATE TABLE `blocks` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`begin_range` int(10) unsigned NOT NULL,
`end_range` int(10) unsigned NOT NULL,
`_location_id` int(11) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `begin_range` (`begin_range`,`end_range`)
) ENGINE=MyISAM AUTO_INCREMENT=2008839 DEFAULT CHARSET=ascii;
CREATE TABLE `locations` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`country` varchar(2) NOT NULL DEFAULT '',
`region` varchar(255) DEFAULT NULL,
`city` varchar(255) DEFAULT NULL,
`postalcode` varchar(255) DEFAULT NULL,
`latitude` float NOT NULL,
`longitude` float NOT NULL,
`metro_code` int(11) DEFAULT NULL,
`area_code` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=641607 DEFAULT CHARSET=utf8;
Query
SELECT locations.latitude, locations.longitude
FROM blocks
INNER JOIN locations ON blocks._location_id = locations.id
WHERE INET_ATON('139.130.4.5') BETWEEN begin_range AND end_range
LIMIT 0, 1;
Edit;
Updated gist with EXPLAIN on the SELECT, also posted here for convenience.
EXPLAIN SELECT locations.latitude, locations.longitude FROM blocks INNER JOIN locations ON blocks._location_id = locations.id WHERE INET_ATON('94.137.106.123') BETWEEN begin_range AND end_range LIMIT 0, 1;
+----+-------------+-----------+--------+---------------+-------------+---------+---------------------------+---------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+--------+---------------+-------------+---------+---------------------------+---------+------------------------------------+
| 1 | SIMPLE | blocks | range | begin_range | begin_range | 4 | NULL | 1095345 | Using index condition; Using where |
| 1 | SIMPLE | locations | eq_ref | PRIMARY | PRIMARY | 4 | geoip.blocks._location_id | 1 | NULL |
+----+-------------+-----------+--------+---------------+-------------+---------+---------------------------+---------+------------------------------------+
2 rows in set (0.00 sec)
Edit 2; Included data into the question for convenience.
The problem, and the normal approach (which your code exemplifies) leads to hitting 1095345 rows. I have an approach that can do that query in one disk hit, even the cache is cold.
Excerpts from http://mysql.rjweb.org/doc.php/ipranges :
The Situation
Your data includes a large set of non-overlapping 'ranges'. These could be IP addresses, datetimes (show times for a single station), zipcodes, etc.
You have pairs of start and end values; one 'item' belongs to each such 'range'. So, instinctively, you create a table with start and end of the range, plus info about the item. Your queries involve a WHERE clause that compares for being between the start and end values.
The Problem
Once you get a large set of items, performance degrades. You play with the indexes, but find nothing that works well. The indexes fail to lead to optimal functioning because the database does not understand that the ranges are non-overlapping.
The Solution
I will present a solution that enforces the fact that items cannot have overlapping ranges. The solution builds a table to take advantage of that, then uses Stored Routines to get around the clumsiness imposed by it.
Related
We are trying to duplicate existing records in a table: make 10 records out of one. The original table contains 75.000 records, and once the statements are done will contain about 750.000 (10 times as many). The statements sometimes finish after 10 minutes, but many times they never return. Hours later we will receive a timeout. This happens about 1 out of 3 times. We are using a test database where nobody is working on, so there is no concurrent access to the table. I don't see any way to optimise the SQL since to me the EXPLAIN PLAN looks fine.
The database is mysql 5.5 hosted on AWS RDS db.m3.x-large. The CPU load goes up to 50% during the statements.
Question: What could cause this intermittent behaviour? How do I resolve it?
This is the SQL to create a temporary table, make roughly 9 new records per existing record in ct_revenue_detail in the temporary table, and then copy the data from the temporary table to ct_revenue_detail
---------------------------------------------------------------------------------------------------------
-- CREATE TEMPORARY TABLE AND COPY ROLL-UP RECORDS INTO TABLE
---------------------------------------------------------------------------------------------------------
CREATE TEMPORARY TABLE ct_revenue_detail_tmp
SELECT r.month,
r.period,
a.participant_eid,
r.employee_name,
r.employee_cc,
r.assignments_cc,
r.lob_name,
r.amount,
r.gp_run_rate,
r.unique_id,
r.product_code,
r.smart_product_name,
r.product_name,
r.assignment_type,
r.commission_pcent,
r.registered_name,
r.segment,
'Y' as account_allocation,
r.role_code,
r.product_eligibility,
r.revenue_core,
r.revenue_ict,
r.primary_account_manager_id,
r.primary_account_manager_name
FROM ct_revenue_detail r
JOIN ct_account_allocation_revenue a
ON a.period = r.period AND a.unique_id = r.unique_id
WHERE a.period = 3 AND lower(a.rollup_revenue) = 'y';
This is the second query. It copies the records from the temporary table back to the ct_revenue_detail TABLE
INSERT INTO ct_revenue_detail(month,
period,
participant_eid,
employee_name,
employee_cc,
assignments_cc,
lob_name,
amount,
gp_run_rate,
unique_id,
product_code,
smart_product_name,
product_name,
assignment_type,
commission_pcent,
registered_name,
segment,
account_allocation,
role_code,
product_eligibility,
revenue_core,
revenue_ict,
primary_account_manager_id,
primary_account_manager_name)
SELECT month,
period,
participant_eid,
employee_name,
employee_cc,
assignments_cc,
lob_name,
amount,
gp_run_rate,
unique_id,
product_code,
smart_product_name,
product_name,
assignment_type,
commission_pcent,
registered_name,
segment,
account_allocation,
role_code,
product_eligibility,
revenue_core,
revenue_ict,
primary_account_manager_id,
primary_account_manager_name
FROM ct_revenue_detail_tmp;
This is the EXPLAIN PLAN of the SELECT:
+----+-------------+-------+------+------------------------+--------------+---------+------------------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------+--------------+---------+------------------------------------+-------+-------------+
| 1 | SIMPLE | a | ref | ct_period,ct_unique_id | ct_period | 4 | const | 38828 | Using where |
| 1 | SIMPLE | r | ref | ct_period,ct_unique_id | ct_unique_id | 5 | optusbusiness_20160802.a.unique_id | 133 | Using where |
+----+-------------+-------+------+------------------------+--------------+---------+------------------------------------+-------+-------------+
This is the definition of ct_revenue_detail:
ct_revenue_detail | CREATE TABLE `ct_revenue_detail` (
`participant_eid` varchar(255) DEFAULT NULL,
`lob_name` varchar(255) DEFAULT NULL,
`amount` decimal(32,16) DEFAULT NULL,
`employee_name` varchar(255) DEFAULT NULL,
`period` int(11) NOT NULL DEFAULT '0',
`pk_id` int(11) NOT NULL AUTO_INCREMENT,
`gp_run_rate` decimal(32,16) DEFAULT NULL,
`month` int(11) DEFAULT NULL,
`assignments_cc` int(11) DEFAULT NULL,
`employee_cc` int(11) DEFAULT NULL,
`unique_id` int(11) DEFAULT NULL,
`product_code` varchar(50) DEFAULT NULL,
`smart_product_name` varchar(255) DEFAULT NULL,
`product_name` varchar(255) DEFAULT NULL,
`assignment_type` varchar(100) DEFAULT NULL,
`commission_pcent` decimal(32,16) DEFAULT NULL,
`registered_name` varchar(255) DEFAULT NULL,
`segment` varchar(100) DEFAULT NULL,
`account_allocation` varchar(25) DEFAULT NULL,
`role_code` varchar(25) DEFAULT NULL,
`product_eligibility` varchar(25) DEFAULT NULL,
`rollup` varchar(10) DEFAULT NULL,
`revised_amount` decimal(32,16) DEFAULT NULL,
`original_amount` decimal(32,16) DEFAULT NULL,
`comment` varchar(255) DEFAULT NULL,
`amount_revised_flag` varchar(255) DEFAULT NULL,
`exclude_segment` varchar(10) DEFAULT NULL,
`revenue_type` varchar(50) DEFAULT NULL,
`revenue_core` decimal(32,16) DEFAULT NULL,
`revenue_ict` decimal(32,16) DEFAULT NULL,
`primary_account_manager_id` varchar(100) DEFAULT NULL,
`primary_account_manager_name` varchar(100) DEFAULT NULL,
PRIMARY KEY (`pk_id`,`period`),
KEY `ct_participant_eid` (`participant_eid`),
KEY `ct_period` (`period`),
KEY `ct_employee_name` (`employee_name`),
KEY `ct_month` (`month`),
KEY `ct_segment` (`segment`),
KEY `ct_unique_id` (`unique_id`)
) ENGINE=InnoDB AUTO_INCREMENT=15338782 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY HASH (period)
PARTITIONS 120 */ |
Edit 29.9: The intermittent behaviour was caused by the omission of a delete SQL statement. If the original table was not deleted before automatically duplicating records. The first time all is fine: we started with 75,000 records and ended up with 750,000 records.
Because the delete statement was missed the next time we already had 750,000 records, and the script would make 7.5M records out of it. That would still work, but the subsequent run trying to make 7.5M into 75M records would fail. 1 in 3 failures.
We would then try all the scripts manually, and of course then we would delete the table properly, and all would go well. The reason why we didn't see that beforehand was that our application does not output anything when running the SQL.
The real delay would be with your second query inserting from the temporary table back into the original tables. There are several issues here.
Sheer amount of data
Looking at your table, there are several columns of varchar(255) a conservative estimate would put an average length of your rows at 2kb. That's roughly about 1.5GB that's being copied from one table to another and being moved to different partitions! Partitioning makes reads more efficient but for inserting the engine has to figure out which partition the data should be moved to so it's actually writing to lots of different files instead of sequentially to one file. For spinning disks, this is slow.
Rebuilding the indexes
One of the biggest costs of inserts is rebuilding the indexes. In your case you have many of them.
KEY `ct_participant_eid` (`participant_eid`),
KEY `ct_period` (`period`),
KEY `ct_employee_name` (`employee_name`),
KEY `ct_month` (`month`),
KEY `ct_segment` (`segment`),
KEY `ct_unique_id` (`unique_id`)
And some of this indexes like employee_name are on varchar(255) columns. That means pretty hefty indexes.
Solution part 1 - Normalize
Your database isn't normalized. Here is a classic example:
primary_account_manager_id varchar(100) DEFAULT NULL,
primary_account_manager_name varchar(100) DEFAULT NULL,
You should really be having a table called account_manager and these two fields should be in that. primary_account_manager_id probably should be an integer field. It is only the id that should be in your ct_revenue_detail table.
Similarly you really shouldn't have employee_name, registered_name etc in this table. They should be in separate tables and they should be linked to ct_revenue_detail by foreign keys.
Solution part 2 - Rethink indexes.
Do you need so many? Mysql only uses one index per table per where clause anyway so some of these indexes are probably never used. Is this one really needed:
KEY `ct_unique_id` (`unique_id`)
You already have primary key why do you even need another unique column?
Indexes for the SELECT: For
SELECT ...
FROM ct_revenue_detail r
JOIN ct_account_allocation_revenue a
ON a.period = r.period AND a.unique_id = r.unique_id
WHERE a.period = 3 AND lower(a.rollup_revenue) = 'y';
a needs INDEX(period, rollup_revenue) in either order. However, you also need to declare rollup_revenue to have a ..._ci collation and avoiding the column in a function. That is change lower(a.rollup_revenue) = 'y' to a.rollup_revenue = 'y'.
r needs INDEX(period, unique_id) in either order. But, as e4c5 mentioned, if unique_id is really "unique" in this table, then take advantage of such.
Bulkiness is a problem when shoveling data around.
decimal(32,16) takes 16 bytes and gives you precision and range that are probably unnecessary. Consider FLOAT (4 bytes, ~7 significant digits, adequate range) or DOUBLE (8 bytes, ~16 significant digits, adequate range).
month int(11) takes 4 bytes. If that is just a value 1..12, then use TINYINT UNSIGNED (1 byte).
DEFAULT NULL -- I suspect most columns will never be NULL; if so, say NOT NULL for them.
amount_revised_flag varchar(255) -- if that is really a "flag", such as "yes"/"no", then use an ENUM and save lots of space.
It is uncommon to have both an id and a name in the same table (see primary_account_manager*); that is usually relegated to a "normalization table".
"Normalize" (already mentioned by #e4c5).
HASH partitioning
Hash partitioning is virtually useless. Unless you can justify it (preferably with a benchmark), I recommend removing partitioning. More discussion.
Adding or removing partitioning usually involves changing the indexes. Please show us the main queries so we can help you build suitable indexes (especially composite indexes) for the queries.
I'm trying to create a new table by joining four existing ones. My database is static, so making one large preprocessed table will simplify programming, and save lots of time in future queries. My query works fine when limited with a WHERE, but seems to either hang, or go too slowly to notice any progress.
Here's the working query. The result only takes a few seconds.
SELECT group.group_id, MIN(application.date), person.person_name, pers_appln.sequence
FROM group
JOIN application ON group.appln_id=application.appln_id
JOIN pers_appln ON pers_appln.appln_id=application.appln_id
JOIN person ON person.person_id=pers_appln.person_id
WHERE group_id="24601"
GROUP BY group.group_id, pers_appln.sequence
;
If I simply remove the WHERE line, it will run for days with nothing to show. Adding a CREATE TABLE newtable AS at the beginning does the same thing. It never moves beyond 0% progress.
The group, application, and person tables all use the MyISAM engine, while pers_appln uses InnoDB. The columns are all indexed. The table sizes range from about 40 million to 150 million rows. I know it's rather large, but I wouldn't think it would pose this much of a problem. The computer currently has 4GB of ram.
Any ideas how to make this work?
Here's the SHOW CREATE TABLE info. There are no views or virtual tables:
CREATE TABLE `group` (
`APPLN_ID` int(10) unsigned NOT NULL,
`GROUP_ID` int(10) unsigned NOT NULL,
KEY `idx_appln` (`APPLN_ID`),
KEY `idx_group` (`GROUP_ID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
CREATE TABLE `application` (
`APPLN_ID` int(10) unsigned NOT NULL,
`APPLN_AUTH` char(2) NOT NULL DEFAULT '',
`APPLN_NR` varchar(20) NOT NULL DEFAULT '',
`APPLN_KIND` char(2) DEFAULT '',
`DATE` date DEFAULT NULL,
`IPR_TYPE` char(2) DEFAULT '',
PRIMARY KEY (`APPLN_ID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
CREATE TABLE `person` (
`PERSON_ID` int(10) unsigned NOT NULL,
`PERSON_CTRY_CODE` char(2) NOT NULL,
`PERSON_NAME` varchar(300) DEFAULT NULL,
`PERSON_ADDRESS` varchar(500) DEFAULT NULL,
KEY `idx_person` (`PERSON_ID`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8 MAX_ROWS=30000000 AVG_ROW_LENGTH=100
CREATE TABLE `pers_appln` (
`PERSON_ID` int(10) unsigned NOT NULL,
`APPLN_ID` int(10) unsigned NOT NULL,
`SEQUENCE` smallint(4) unsigned DEFAULT NULL,
`PLACE` smallint(4) unsigned DEFAULT NULL,
KEY `idx_pers_appln` (`APPLN_ID`),
KEY `idx_person` (`PERSON_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (appln_id)
PARTITIONS 20 */
Here's the EXPLAIN of my query:
+----+-------------+-------------+--------+----------------------------+-----------------+---------+--------------------------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+----------------------------+-----------------+---------+--------------------------+----------+---------------------------------+
| 1 | SIMPLE | person | ALL | idx_person | NULL | NULL | NULL | 47827690 | Using temporary; Using filesort |
| 1 | SIMPLE | pers_appln | ref | idx_application,idx_person | idx_person | 4 | mydb.person.PERSON_ID | 1 | |
| 1 | SIMPLE | application | eq_ref | PRIMARY | PRIMARY | 4 | mydb.pers_appln.APPLN_ID | 1 | |
| 1 | SIMPLE | group | ref | idx_application | idx_application | 4 | mydb.pers_appln.APPLN_ID | 1 | |
+----+-------------+-------------+--------+----------------------------+-----------------+---------+--------------------------+----------+---------------------------------+
Verify that key_buffer_size is about 200M and innodb_buffer_pool_size is about 1200M. Perhaps they could be bigger, but make sure you are not swapping.
group should have PRIMARY KEY(appln_id, group_id) and INDEX(group_id, appln_id) instead of the two KEYs it has.
pers_appln should have INDEX(person_id, appln_id) and INDEX(appln_id, person_id) instead of the two keys it has. If possible, one of those should be PRIMARY KEY, but watch out for the PARTITIONing.
A minor improvement would be to change those CHAR(2) fields to be CHARACTER SET ascii -- assuming you don't really need utf8. That would shrink the field from 6 bytes to 2 bytes per row.
The PARTITIONing is probably not helping at all. (No, I can't say that removing the PARTITIONing will speed it up much.)
If these suggestions do not help enough, please provide the output from EXPLAIN SELECT ...
Edit
Converting to InnoDB and specifying PRIMARY KEYs for all tables will help. This is because InnoDB "clusters" the PRIMARY KEY with the data. What you have now is a lot of bouncing between a MyISAM index and its data -- literally hundreds of millions of times. Assuming not everything can be cached in your small 4GB, that means a lot of disk I/O. I would not be surprised if the non-WHERE version would take a week to run. Even with InnoDB, there will be I/O, but some of it will be avoided because:
1. reaching into a table with the PK gets the data without another disk hit.
2. the extra indexes I proposed will avoid hitting the data, again avoiding an extra disk hit.
(Millions of references * "an extra disk hit" = days of time.)
If you switch all of your tables to InnoDB, you should lower key_buffer_size to 20M and raise innodb_buffer_pool_size to 1500M. (These are approximate; do not raise them so high that there is any swapping.)
Please show us the CREATE TABLEs with InnoDB -- I want to make sure each table has a PRIMARY KEY and which column(s) that is. The PRIMARY KEY makes a big difference in this particular situation.
For person, the MyISAM version has just a KEY(person_id). If you did not change the keys in the conversions, InnoDB will invent a PRIMARY KEY. When the JOIN to that table occurs, InnoDB will (1) drill down the BTree for key to find that invented PK value, then (2) drill down the PK+data BTree to find the row. If, instead, person_id could be the PK, that JOIN would run twice as fast. Possibly even faster--depending on how big the table is and how much it needs to jump around in the index / data. That is, the two BTree lookups is adding to the pressure on the cache (buffer_pool).
How big is each table? What was the final value for innodb_buffer_pool_size? Once you have changed everything from MyISAM to InnoDB, set key_buffer_size to 40M or less, and set innodb_buffer_pool_size to about 70% of available RAM. If the Data + Index sizes for all the tables are less than the buffer_pool, then (once cache is primed) the query won't have to do any I/O. This is easily a 10x speedup.
pers_appln is a many-to-many relationship? Then, probably
PRIMARY KEY(appln_id, person_id),
INDEX(person_id, appln_id) -- if you need to go the other direction, too.
I found the solution: switching to an SSD. My table creation time went from an estimated 45 days to 16 hours. Previously, the database spent all its time with hard drive I/O, barely even using 5% of the CPU or RAM.
Thanks everyone.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have noticed, that I am having problem with writing SQL queries, because of the problem with Mysqld, in peak hours. It causes my website to load 3-5 times slower than usually. So I'have tried siege -d5 -c150 http://mydomain.com/ and looked into top and my mysqld takes over 700% of CPU! I've also noticed in mysql status: Copying to tmp table and queries adding to some queue or something like this.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25877 mysql 20 0 1076m 227m 8268 S 749.0 2.8 224:02.21 mysqld
This is my query
SELECT COUNT(downloaded.id) AS downloaded_count
, downloaded.file_name
,uploaded.*
FROM `downloaded` JOIN uploaded
ON downloaded.file_name = uploaded.file_name
WHERE downloaded.completed = '1'
AND uploaded.active = '1'
AND uploaded.nsfw = '0'
AND downloaded.datetime > DATE_SUB(NOW(), INTERVAL 7 DAY)
GROUP BY downloaded.file_name
ORDER BY downloaded_count DESC LIMIT 30;
Showing rows 0 - 29 ( 30 total, Query took 0.1639 sec) //is this that much? shouldn't it be 0.01s instead?
UPDATED: (removed ORDER BY)
Showing rows 0 - 29 ( 30 total, Query took 0.0064 sec)
Why ORDER BY makes it 20x slower?
EXPLAIN
+----+-------------+------------+------+---------------+-----------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+-----------+---------+--------------------------+------+----------------------------------------------+
| 1 | SIMPLE | uploaded | ALL | file_name_up | NULL | NULL | NULL | 3139 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | downloaded | ref | file_name | file_name | 767 | piqik.uploaded.file_name | 8 | Using where |
+----+-------------+------------+------+---------------+-----------+---------+--------------------------+------+----------------------------------------------+
table: uploaded (Total 720.5 KiB)
CREATE TABLE IF NOT EXISTS `uploaded` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`sid` int(1) NOT NULL,
`file_name` varchar(255) NOT NULL,
`file_size` varchar(255) NOT NULL,
`file_ext` varchar(255) NOT NULL,
`file_name_keyword` varchar(255) NOT NULL,
`access_key` varchar(40) NOT NULL,
`upload_datetime` datetime NOT NULL,
`last_download` datetime NOT NULL,
`file_password` varchar(255) NOT NULL DEFAULT '',
`nsfw` int(1) NOT NULL,
`votes` int(11) NOT NULL,
`downloads` int(11) NOT NULL,
`video_thumbnail` int(1) NOT NULL DEFAULT '0',
`video_duration` varchar(255) NOT NULL DEFAULT '',
`video_resolution` varchar(11) NOT NULL,
`video_additional` varchar(255) NOT NULL DEFAULT '',
`active` int(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`),
FULLTEXT KEY `file_name_keyword` (`file_name_keyword`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=3328 ;
table: downloaded (Total 5,152.0 KiB)
CREATE TABLE IF NOT EXISTS `downloaded` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`file_name` varchar(255) NOT NULL,
`completed` int(1) NOT NULL,
`client_ip_addr` varchar(40) NOT NULL,
`client_access_key` varchar(40) NOT NULL,
`datetime` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=31475 ;
(not sure why I've chosen InnoDB here)
Please note, that I am (still) not using indexes (which as I read is very important!) because of lack of knowledge and I am not sure how to add them correctly.
So the question is, how to improve this query to prevent webserver from slow loading of website? I have only "few" records and can not believe I am having so major problems, people here deal with millions of records and their projects work. How do webhosting companies prevent this problem? (I am hosting only my webpages with over 150 concurrent clients)
Additional info:
Mysql: 5.5.33
Nginx 1.2.1, php5-fpm
Debian 7.1 Wheezy
2x L5420 # 2.50GHz
8GB RAM
A few observations:
You may not have actively chosen InnoDB as a storage engine: it will be the default engine for your version of MySQL. It's probably the right choice for your context, though, as it offers row-level locking instead of table-level locking (amongst other things) which you likely want.
Don't quote your integers in your comparisons (eg uploaded.active = '1'). You'll end up with slower string comparison, instead of integer comparison.
The comparison downloaded.datetime > DATE_SUB(NOW(), INTERVAL 7 DAY) with a derived value is going to be slower than comparison with a normal column value.
Regarding the last point, you could replace this with a user defined variable declared before the query:
SET #one_week_ago := DATE_SUB(NOW(), INTERVAL 7 DAY);
and then within the query compare to that pre-computed value:
...
downloaded.datetime > #one_week_ago
...
More importantly, though, you'll definitely want to have an index on any key that you're joining on.
In this case, you can add them by:
CREATE INDEX idx_file_name ON uploaded(file_name);
CREATE INDEX idx_file_name ON downloaded(file_name);
If you don't have indices, you're going to end up with multiple full table scans, which is slow.
There is a cost to adding an index: it takes up space, and it also means writes to the table are slower because the index has to be updated to include them. If this is a query that is running as part of the operation of your website, though, you definitely need the indices.
I have this table, that contains around 80,000,000 rows.
CREATE TABLE `mytable` (
`date` date NOT NULL,
`parameters` mediumint(8) unsigned NOT NULL,
`num` tinyint(3) unsigned NOT NULL,
`val1` int(11) NOT NULL,
`val2` int(10) NOT NULL,
`active` tinyint(3) unsigned NOT NULL,
`ref` int(10) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`ref`) USING BTREE,
KEY `parameters` (`parameters`)
) ENGINE=MyISAM AUTO_INCREMENT=79092001 DEFAULT CHARSET=latin1
it's articulated around 2 main columns: "parameters" and "date".
there are around 67,000 possible values for "parameters"
for each "parameters" there are around 1200 rows, each with a different date.
so for each date, there are 67,000 rows.
1200 * 67,000 = 80,400,000.
table size appears as 1.5GB, index size 1.4GB.
now, I want to query the table to retrieve all rows of one "parameters"
(actually I want to do it for each parameter, but this is a good start)
SELECT val1 FROM mytable WHERE parameters=1;
the first run gives me results in 8 seconds
subsequent runs for different but close values of parameters (2, 3, 4...) are instantaneous
a run for a "far away" value (parameters=1000) gives me results in 8 seconds again.
I did tests running the same query without the index, and got results in 20 seconds, so I guess the index is kicking in as shown by EXPLAIN, but not giving a drastic jump in performances:
+----+-------------+----------+------+---------------+------------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------------+---------+-------+------+-------+
| 1 | SIMPLE | mytable | ref | parameters | parameters | 3 | const | 1097 | |
+----+-------------+----------+------+---------------+------------+---------+-------+------+-------+
but I'm still baffled by the time for such and easy request (no join, directly on the index).
the server is 2 years-old 2 cpu quad core 2.6GHz running Ubuntu, with 4G of RAM.
I've raised the key_buffer parameter to 1G, and have restarted mysql, but noticed no change whatsoever.
should I consider this normal ? or is there something I'm doing wrong ? I get the feeling with the right config the request should be almost immediate.
Try using a covering index, i.e. create an index that includes both of the columns you need. It won't need the second disk I/O to fetch the values from the main table since the data's right there in the index.
I've got a table that refuses to use index, and it always uses filesort.
The table is:
CREATE TABLE `article` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`Category_ID` int(11) DEFAULT NULL,
`Subcategory` int(11) DEFAULT NULL,
`CTimestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`Publish` tinyint(4) DEFAULT NULL,
`Administrator_ID` int(11) DEFAULT NULL,
`Position` tinyint(4) DEFAULT '0',
PRIMARY KEY (`ID`),
KEY `Subcategory` (`Subcategory`,`Position`,`CTimestamp`,`Publish`),
KEY `Category_ID` (`Category_ID`,`CTimestamp`,`Publish`),
KEY `Position` (`Position`,`Category_ID`,`Publish`),
KEY `CTimestamp` (`CTimestamp`),
CONSTRAINT `article_ibfk_1` FOREIGN KEY (`Category_ID`) REFERENCES `category` (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=94290 DEFAULT CHARSET=utf8
The query is:
SELECT * FROM article ORDER BY `CTimestamp`;
The explain is:
+----+-------------+---------+------+---------------+------+---------+------+-------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+-------+----------------+
| 1 | SIMPLE | article | ALL | NULL | NULL | NULL | NULL | 63568 | Using filesort |
+----+-------------+---------+------+---------------+------+---------+------+-------+----------------+
When I remove the "ORDER BY" then all are working properly. All other indices (Subcategory, Position, etc) are working fine in other queries. Unfortunately, the timestamp refuses to be used, even with my simple select query. I'm sure I'm missing something important here.
How can I make MySQL use the timestamp index?
Thank you.
In this case, MySQL is not using your index for sorting, and it is a GOOD thing.
Why? Your table contains just 64k rows, average row width is about 26 bytes (if I added column sizes right), so total table size on disk should be around 2MB.
It is very cheap to read just 2MB of data from disk into memory (probably in just 1-2 disk operations or seeks) and then simply perform filesort in memory (probably variation of quicksort).
If MySQL did retrieval by index order as you wish, it would have to perform 64000 disk seek operations, one record after another! It would have been very, very slow.
Indexes can be good when you can use them to quickly jump to known location in huge file and read just small amount of data, like in WHERE clause. But, in this case, it is not good idea - and MySQL is not stupid!
If your table was very big (more than RAM size), then MySQL would certainly start using your index - and this is also good thing.
Well, you can always hint the index. Change your query to
SELECT * FROM article use index (CTimestamp);
This forces MySQL to use the index for the query. The EXPLAIN:
1, 'SIMPLE', 'article', 'ALL', '', '', '', '', 1, 100.00, ''
No filesort to see, and as the used index is CTimestamp, the result should be ordered accordingly.
Alternatively, you can keep your order by clause, but force the index usage:
SELECT * FROM article force index (CTimestamp) order by CTimestamp;
The problem is still strange, though. Have you considered posting it to the official MySQL help forums?
Edit: You seem to be in good company.
Edit: Forcing the index seems to work out well.