Large MySQL table with very slow select - mysql

I have a large table in MySQL (running within MAMP) it has 28 million rows and its 3.1GB in size. Here is its structure
CREATE TABLE `termusage` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`termid` bigint(20) DEFAULT NULL,
`date` datetime DEFAULT NULL,
`dest` varchar(255) DEFAULT NULL,
`cost_type` tinyint(4) DEFAULT NULL,
`cost` decimal(10,3) DEFAULT NULL,
`gprsup` bigint(20) DEFAULT NULL,
`gprsdown` bigint(20) DEFAULT NULL,
`duration` time DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `termid_idx` (`termid`),
KEY `date_idx` (`date`),
KEY `cost_type_idx` (`cost_type`),
CONSTRAINT `termusage_cost_type_cost_type_cost_code` FOREIGN KEY (`cost_type`) REFERENCES `cost_type` (`cost_code`),
CONSTRAINT `termusage_termid_terminal_id` FOREIGN KEY (`termid`) REFERENCES `terminal` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=28680315 DEFAULT CHARSET=latin1
Here is the output from SHOW TABLE STATUS :
Name,Engine,Version,Row_format,Rows,Avg_row_length,Data_length,Max_data_length,Index_length,Data_free,Auto_increment,Create_time,Update_time,Check_time,Collation,Checksum,Create_options,Comment
'termusage', 'InnoDB', '10', 'Compact', '29656469', '87', '2605711360', '0', '2156920832', '545259520', '28680315', '2011-08-16 15:16:08', NULL, NULL, 'latin1_swedish_ci', NULL, '', ''
Im trying to run the following select statement :
select u.id from termusage u
where u.date between '2010-11-01' and '2010-12-01'
it takes 35 minutes to return to result (approx 14 million rows) - this is using MySQL Worksbench.
I have the following MySQL config setup :
Variable_name Value
bulk_insert_buffer_size 8388608
innodb_buffer_pool_instances 1
innodb_buffer_pool_size 3221225472
innodb_change_buffering all
innodb_log_buffer_size 8388608
join_buffer_size 131072
key_buffer_size 8388608
myisam_sort_buffer_size 8388608
net_buffer_length 16384
preload_buffer_size 32768
read_buffer_size 131072
read_rnd_buffer_size 262144
sort_buffer_size 2097152
sql_buffer_result OFF
Eventually im trying to run a larger query - that joins a couple of tables and groups some data, all based on the variable - customer id -
select c.id,u.termid,u.cost_type,count(*) as count,sum(u.cost) as cost,(sum(u.gprsup) + sum(u.gprsdown)) as gprsuse,sum(time_to_sec(u.duration)) as duration
from customer c
inner join terminal t
on (c.id = t.customer)
inner join termusage u
on (t.id = u.termid)
where c.id = 1 and u.date between '2011-03-01' and '2011-04-01' group by c.id,u.termid,u.cost_type
This returns a maximum of 8 rows (as there are only 8 separate cost_types - but this query runs OK where there are not many (less than 1 million) rows in the termusage table to calculate - but takes forever when the number of rows in the termusage table is large - how can I reduce the select time.
Data is added to the termusage table once a month from CSV files using LOAD DATA method - so it doesn't need to be quite so tuned for inserts.
EDIT : Show explain on main query :
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,c,const,PRIMARY,PRIMARY,8,const,1,"Using index; Using temporary; Using filesort"
1,SIMPLE,u,ALL,"termid_idx,date_idx",NULL,NULL,NULL,29656469,"Using where"
1,SIMPLE,t,eq_ref,"PRIMARY,customer_idx",PRIMARY,8,wlnew.u.termid,1,"Using where"

Looks like you're asking two questions - correct?
The most likely reason the first query is taking so long is because it's IO-bound. It takes a long time to transfer 14 million records from disk and down the wire to your MySQL work bench.
Have you tried putting the second query though "explain"? Yes, you only get back 8 rows - but the SUM operation may be summing millions of records.
I'm assuming the "customer" and "terminal" tables are appropriately indexed? As you're joining on the primary key on termusage, that should be really quick...

You could try removing the where clause restricting by date and instead put an IF statement in the select so that if the date is within these boundaries, the value is returned otherwise a zero value is returned. The SUM will then of course only sum values which lie in this range as all others will be zero.
It sounds a bit nonsensical to fetch more rows than you need but we observed recently on an Oracle DB that this made quite a huge improvement. Of course it will be dependent on many other factors but it might be worth a try.

You may also think about breaking down the table into years or months. So you have a termusage_2010, termusage_2011, ... or something like this.
Not a very nice solution, but seeing your table is quite large it might be usefull on a smaller server.

Related

MySQL GROUP BY on large tables

I have a table with over 75 millions registers. I want to run a group by to summarize this registries.
The table structure is:
CREATE TABLE `output_medicos_full` (
`name` varchar(100) NOT NULL DEFAULT '',
`term` varchar(50) NOT NULL DEFAULT '',
`hash` varchar(40) NOT NULL DEFAULT '',
`url` varchar(2000) DEFAULT NULL,
PRIMARY KEY (`name`,`term`,`hash`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I want execute the query bellow, but is taking so long using a dedicate mysql server 5.5 with 4GB RAM:
INSERT INTO TABLE report
SELECT
`hash`
,CASE UPPER(SUBSTRING_INDEX(url, ':', 1))
WHEN 'HTTP' THEN 1
WHEN 'HTTPS' THEN 2
WHEN 'FTP' THEN 3
WHEN 'FTPS' THEN 4
ELSE 0 end
,url
FROM output_medicos_full
GROUP BY `hash`;
On table report there is an unique index on hash column
Any help to speed it up?
Thank's
The main cost here is all the I/O. The entire table needs to be read.
innodb_buffer_pool_size = 2G is dangerously high for 4GB of RAM. If swapping occurs, performance will suffer terribly.
Since the hash is a SHA1, it is extremely likely to be unique across a mere 75M urls. So that GROUP BY will yield 75M rows. This is probably not what you wanted. Once you rewrite the query, we can discuss optimizations.

Why is myisam slower than Innodb

There are several Q&A for "Why is InnoDB (much) slower than MyISAM", but I could not find any topic for the opposite.
So I had a table defined as InnoDB wherin I stored file contents in a blob field. Because normally for that MyISAM should be used I switched over that table. Here is its structure:
CREATE TABLE `liv_fx_files_files` (
`fid` int(11) NOT NULL AUTO_INCREMENT,
`filedata` longblob NOT NULL,
`filetype` varchar(255) NOT NULL,
`filename` varchar(255) NOT NULL,
`filesize` int(11) NOT NULL,
`context` varchar(1) NOT NULL DEFAULT '',
`saveuser` varchar(32) NOT NULL,
`savetime` int(11) NOT NULL,
`_state` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`fid`),
KEY `_state` (`_state`)
) ENGINE=MyISAM AUTO_INCREMENT=4550 DEFAULT CHARSET=utf8;
There are 4549 records stored in it so far (with filedata going from 0 to 48M. Sum of all files is about 6G.
So whenever I need to know current total filesize I issue the query
SELECT SUM(filesize) FROM liv_fx_files_files;
The problem is that since I switched from InnoDB to MyISAM this simple query lasts really long (about 30sec and longer) whereas on InnoDB it was done in a under one second.
But aggregations are not the only queries which are very slow; it's almost every query.
I guess I could fix it by adopting config (which is currently optimized for InnoDB (only) use), but don't know which settings to adjust. Does anyone have a hint for me please?
current mysql server config (SHOW VARIABLES as csv)
Example for another query fired on both table types (both contain exact same data and have same definition). All other tested queries behave the same, say run much longer against MyISAM table as InnoDB!
SELECT sql_no_cache `fxfilefile`.`filename` AS `filename` FROM `myisamtable`|`innodbtable` AS `fxfilefile` WHERE `fxfilefile`.`filename` LIKE '%foo%';
Executive Summary: Use InnoDB, and change the my.cnf settings accordingly.
Details:
"MyISAM is faster" -- This is an old wives' tale. Today, InnoDB is faster in most situations.
Assuming you have at least 4GB of RAM...
If all-MyISAM, key_buffer_size should be about 20% of RAM; innodb_buffer_pool_size should be 0.
If all-InnoDB, key_buffer_size should be, say, only 20MB; innodb_buffer_pool_size should be about 70% of RAM.
If a mixture, do something in between. More discussion.
Let's look at how things are handled differently by the two Engines.
MyISAM puts the entire BLOB 'inline' with the other columns.
InnoDB puts most or all of each blob in other blocks.
Conclusion:
A table scan in a MyISAM table spends a lot of time stepping over cow paddies; InnoDB is much faster if you don't touch the BLOB.
This makes InnoDB a clear winner for SELECT SUM(x) FROM tbl; when there is no index on x. With INDEX(x), either engine will be fast.
Because of the BLOB being inline, MyISAM has fragmentation issues if you update records in the table; InnoDB has much less fragmentation. This impacts all operations, making InnoDB the winner again.
The order of the columns in the CREATE TABLE has no impact on performance in either engine.
Because the BLOB dominates the size of each row, the tweaks to the other columns will have very little impact on performance.
If you decide to go with MyISAM, I would recommend a 'parallel' table ('vertical partitioning'). Put the BLOB and the id in it a separate table. This would help MyISAM come closer to InnoDB's model and performance, but would add complexity to your code.
For "point queries" (looking up a single row via an index), there won't be much difference in performance between the engines.
Your my.cnf seems antique; set-variable has not been necessary in a long time.
Try to edit your MySQL config file, usually /etc/mysql/my.cnf and use "huge" preset.
# The MySQL server
[mysqld]
port = 3306
socket = /var/run/mysqld/mysqld.sock
skip-locking
set-variable = key_buffer=384M
set-variable = max_allowed_packet=1M
set-variable = table_cache=512
set-variable = sort_buffer=2M
set-variable = record_buffer=2M
set-variable = thread_cache=8
# Try number of CPU's*2 for thread_concurrency
set-variable = thread_concurrency=8
set-variable = myisam_sort_buffer_size=64M
Certainly 30 seconds to read 4500 records is very slow. Assuming there is plenty of room for I/O caching then the first thing I would try is to change the order of the fields; if these are written to the table in the order they are declared the DBMS would need to seek to the end of each record before reading the size value (I'd also recommend capping the size of those vharchar(255) columns, and that varhar(1) NOT NULL should be CHAR).
CREATE TABLE `liv_fx_files_files2` (
`fid` int(11) NOT NULL AUTO_INCREMENT,
`filesize` int(11) NOT NULL,
`context` char(1) NOT NULL DEFAULT '',
`saveuser` varchar(32) NOT NULL,
`savetime` int(11) NOT NULL,
`_state` int(11) NOT NULL DEFAULT '0',
`filetype` varchar(255) NOT NULL,
`filename` varchar(255) NOT NULL,
`filedata` longblob NOT NULL,
PRIMARY KEY (`fid`),
KEY `_state` (`_state`)
) ENGINE=MyISAM AUTO_INCREMENT=4550 DEFAULT CHARSET=utf8;
INSERT INTO liv_fx_files_files2
(fid, filesize, context, saveuser, savetime, _state, filetype, filename, filedata)
SELECT fid, filesize, context, saveuser, savetime, _state, filetype, filename, filedata
FROM liv_fx_files_files;
But ideally I'd split the data and metadata into separate tables.

simple update query taking very long to execute in MySQL

I was checking the slow-query-log of MySQL, and found out an entry as below:
# Time: 131108 4:16:34
# Query_time: 14.726425 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 1
SET timestamp=1383884194;
UPDATE `Artist` SET ImageFilename = NULL, Title = 'Elton John', PopularityRating = 657, UniqueID = NULL, Description = NULL, IsFeatured = 0, FeaturedText = '', MetaDescription = '', MetaTitle = NULL, _Temporary_LastUpdOn = '2013-11-08 04:15:58 ', _Temporary_Flag = 0, _Deleted = 0, _DeletedOn = NULL, Priority = 0 WHERE ID = 3449748;
As you can see, it took a staggering 14.72sec to perform this query, when it is a simple update with just a WHERE by primary key. I've tried re-executing the query, but now it executing in 0.095sec which is much more reasonable.
Any ideas how I can debug why at that specific time it took so long?
Edit 1: query_cache% variables
mysql> SHOW variables where variable_name like 'query_cache%';
+------------------------------+-----------+
| Variable_name | Value |
+------------------------------+-----------+
| query_cache_limit | 1048576 |
| query_cache_min_res_unit | 4096 |
| query_cache_size | 210763776 |
| query_cache_type | ON |
| query_cache_wlock_invalidate | OFF |
+------------------------------+-----------+
Edit 2: Artist table info
CREATE TABLE `artist` (
`ID` bigint(20) NOT NULL,
`ImageFilename` mediumtext,
`Title` varchar(1000) DEFAULT NULL,
`PopularityRating` int(11) DEFAULT '0',
`UniqueID` mediumtext,
`Description` mediumtext,
`IsFeatured` tinyint(1) DEFAULT '0',
`FeaturedText` mediumtext,
`_Temporary_LastUpdOn` datetime DEFAULT '0001-01-01 00:00:00',
`_Temporary_Flag` tinyint(1) DEFAULT '0',
`_Deleted` tinyint(1) DEFAULT '0',
`_DeletedOn` datetime DEFAULT NULL,
`Priority` int(11) DEFAULT '0',
`MetaDescription` varchar(2000) DEFAULT NULL,
`MetaTitle` mediumtext,
PRIMARY KEY (`ID`),
KEY `_Temporary_Flag` (`_Temporary_Flag`),
KEY `_Deleted` (`_Deleted`),
KEY `Priority` (`Priority`),
KEY `PopularityRating` (`PopularityRating`),
KEY `Title` (`Title`(255)),
KEY `IsFeatured` (`IsFeatured`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Given the output you have provided, my suggestion here would be to minimize your cache size. Its is ofcourse only my best assumption that this caused the update time to span over 15 seconds because the query itself is optimal using WHERE on PRIMARY KEY.
Since you havent been able to reproduced the problem its hard to determine.
I was reading the cache documentation again to get some info.
When tables are modified, any relevant entries in the query cache are flushed.
This could be a reason for the update you did that it had to flush cached data.
Another part of the docmentation
Be cautious about sizing the query cache excessively large, which
increases the overhead required to maintain the cache, possibly beyond
the benefit of enabling it. Sizes in tens of megabytes are usually
beneficial. Sizes in the hundreds of megabytes might not be.
Either way, since you have the query cache enabled, i think thats a good starting point.
To set a new query cache while in production
SET GLOBAL query_cache_size = 1000000;
Mysql will automatically set the size to be aligned to the nearest 1024 byte block.
Read this documentation well, its very helpful to understand. Query cache can at the same time be your best and your worst setting.
http://dev.mysql.com/doc/refman/5.1/en/query-cache.html
There's a problem of your table. You created several indexes for the table, which includes fields you will update in the sql. Then mysql has to reconstruct the index everytime.
I think you have not tuned MySQL server variables, It is important to tune server variables to increase performance. It is recommended to have a look at the key_buffer_size and table_cache variables.
The key_buffer_size variable controls the amount of memory available for the MySQL index buffer. The higher this value, the more memory available for indexes and better the performance.
The table_cache variable controls the amount of memory available for the table cache, and thus the total number of tables MySQL can hold open at any given time. For busy servers with many databases and tables, this value should be increased so that MySQL can serve all requests reliably.
In case someone missed the comment above:
Maybe the table was locked that time.
Since you could not reproduce the problem, this was likely the case.

SQL performance on multiple id matching and a Join statement

Consider this query:
SELECT DISTINCT (linkindex_tags.link_id)
, links_sorted.link_title
, links_sorted.link_url
FROM linkindex_tags
INNER JOIN links_sorted ON links_sorted.link_id = linkindex_tags.link_id
ORDER BY
(
IF (word_id = 400, 1,0)+
IF (word_id = 177, 1,0)+
IF (word_id = 114, 1,0)+
IF (word_id = 9, 1,0)+
IF (word_id = 270, 1,0)+
IF (word_id = 715, 1,0)+
IF (word_id = 279, 1,0)+
IF (word_id = 1, 1,0)+
IF (word_id = 1748, 1,0)
) DESC
LIMIT 0,15;
So looking for matches to a series of word_id's and odering by the score of those matches (e.g. find a link with 5 word_ids then it's a score of 5)
The linkindex_tags table is currently 552,196 rows (33 MB) but will expan to many millions
The link_sorted table is currently 823,600 (558MB - obv more data per row) rows but will also expand to more.
The linkindex_tags table is likely to be around 8-12 times larger than links_sorted.
Execution Time : 7.069 sec on a local i3 core windows 7 machine.
My server is CentOs 64bit 8GB ram Intel Xeon 3470 (Quad Core) - so that will aid in the matter slightly I guess as can assign decent RAM allocation.
It is running slowly and was wondering if my approach is all wrong. Here's the slow bits from the profile breakdown:
Copying to tmp table - (time) 3.88124 - (%) 55.08438
Copying to tmp table on disk - (time) 2.683123 -(%) 8.08010
converting HEAP to MyISAM - (time) 0.37656 - (%) 5.34432
Here's the EXPLAIN:
id - 1
select_type - SIMPLE
table - linkindex_tags
type - index
possible_keys - link_id,link_id_2
key - link_id
key_len - 8
ref - \N
rows - 552196
Extra - Using index; Using temporary; Using filesort
2nd row
id - 1
select_type - SIMPLE
table - links_sorted
type - eq_ref
possible_keys - link_id
key - link_id
key_len - 4
ref - flinksdb.linkindex_tags.link_id
rows - 1
Extra -
And finally the 2 table schema's:
CREATE TABLE IF NOT EXISTS `linkindex_tags` (
`linkindex_tag_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`link_id` int(10) unsigned NOT NULL,
`word_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`linkindex_tag_id`),
UNIQUE KEY `link_id` (`link_id`,`word_id`),
KEY `link_id_2` (`link_id`),
KEY `word_id` (`word_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0 ;
CREATE TABLE IF NOT EXISTS `links_sorted` (
`link_sorted_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`site_id` int(10) unsigned NOT NULL,
`link_id` int(10) unsigned NOT NULL,
`link_title` char(255) NOT NULL,
`link_duration` char(20) NOT NULL,
`link_url` char(255) NOT NULL,
`active` tinyint(4) NOT NULL,
PRIMARY KEY (`link_sorted_id`),
UNIQUE KEY `link_id` (`link_id`),
KEY `link_title` (`link_title`,`link_url`,`active`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0 ;
Have to stick with INT as may enter a range bigger than MEDIUMINT.
Without the join, just getting the ids the query is fast now I've upped some MySQL settings.
Don't know too much about MySQL settings and their effects so if you need me to change a few settings and run some tests by all means fire away!
Oh and I played with the mysql.ini settings so they're like this - just guessing and toying really!
key_buffer = 512M
max_allowed_packet = 1M
table_cache = 512M
sort_buffer_size = 512M
net_buffer_length = 8K
read_buffer_size = 512M
read_rnd_buffer_size = 512K
How can I speed up this query?
A few comments:
DISTINCT
SELECT DISTINCT works on all the fields selected, no matter how many () you use, use a GROUP BY clause instead if you only want 1 field to be distinct.
Note that this will make the results of your query indeterminate!
Keep the distinct, or aggregate the other fields in a GROUP_CONCAT if you want to prevent that.
ORDER BY
A field can only have one value at a time, adding different IF's together, when there can be only one that matches is a waste of time, use an IN instead.
A boolean = 1 for true, 0 for false, you don't need an extra IF to assert that.
WHERE
If you have a lot of rows, consider adding a where that can reduce the number of rows under consideration, without altering the outcome.
?
Is the series: 400,177,114,9,270,715,279,1,1748 same sort of magical construct like the 4-8-15-16-23-42 in Lost?
SELECT lt.link_id
, GROUP_CONCAT(ls.link_title) as link_titles
, GROUP_CONCAT(ls.link_url) as link_urls
FROM linkindex_tags lt
INNER JOIN links_sorted ls ON ls.link_id = lt.link_id
WHERE lt.word_id <= 1748
GROUP BY lt.link_id
ORDER BY
(
lt.word_id IN (400,177,114,9,270,715,279,1,1748)
) DESC
LIMIT 15 OFFSET 0;

How long should it take to build an index using ALTER TABLE in MySQL?

This might be a bit like asking how long a length of string is, but the stats are:
Intel dual core 4GB RAM
Table with 8million rows, ~ 20 columns, mostly varchars with an auto_increment primary id
Query is: ALTER TABLE my_table ADD INDEX my_index (my_column);
my_column is varchar(200)
Storage is MyISAM
Order of magnitude, should it be 1 minute, 10 minutes, 100 minutes?
Thanks
Edit: OK it took 2 hours 37 minutes, compared to 0 hours 33 mins on a lesser spec machine, with essentially identical set ups. I've no idea why it took so much longer. The only possibility is that the prod machine HD is 85% full, with 100GB free. Should be enough, but i guess it depends on how that free space is distributed.
If you are just adding the single index, it should take about 10 minutes. However, it will take 100 minutes or more if you don't have that index file in memory.
Your 200 varchar with 8 million rows will take a maximum of 1.6GB, but with all of the indexing overhead it will take about 2-3 GB. But it will take less if most of the rows are less than 200 characters. (You might want to do a select sum(length(my_column)) to see how much space is required.)
You want to edit your /etc/mysql/my.cnf file. Play with these settings;
myisam_sort_buffer_size = 100M
sort_buffer_size = 100M
Good luck.
On my test MusicBrainz database, table track builds a PRIMARY KEY and three secondary indexes in 25 minutes:
CREATE TABLE `track` (
`id` int(11) NOT NULL,
`artist` int(11) NOT NULL,
`name` varchar(255) NOT NULL,
`gid` char(36) NOT NULL,
`length` int(11) DEFAULT '0',
`year` int(11) DEFAULT '0',
`modpending` int(11) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `gid` (`gid`),
KEY `artist` (`artist`),
KEY `name` (`name`)
) DEFAULT CHARSET=utf8
The table has 9001870 records.
Machine is Intel(R) Core(TM)2 CPU 6400 # 2.13GHz with 2Gb RAM, Fedora Core 12, MySQL 5.1.42.
##myisam_sort_buffer_size is 256M.
Additionally, if you ever need to build multiple indexes, its best to create all indexes in one call instead of individually... Reason: it basically apears to rewrite all the index pages to be inclusive of your new index with whatever else it had. I found this out in the past having a 2+ gig table and needed to build about 15 indexes on it. Building all individually kept incrementally growing in time between every index. Then trying all at once was a little more than about 3 individual indexes since it built all per record and wrote all at once instead of having to keep rebuilding pages.