InnoDB Unique Constraints Slowing down inserts - mysql

Hey guys so I have been wrestling with a problem in my Innodb. The database I am designing is binning built to house all domains listed under the .com and .net. I am reading these from a file and then input them into a data base each week. As you can guess there will be a lot of records. I have calculated close to 106 million .coms and 14 million .net (estimated) now in order to prevent duplicate records for domains I put a unique constraint on the domain name column and a second TLDid. When ever I do a update each week the inserts take 5-6 days. On the initial build with no data I got regular insert speeds until I started the inserts at 25 million and then it really started slowing.
I changed my innodb_buffer_pool_size=6000M with out much change. I was able to do inserts up to 45 million before it started to slow at around 3 hours.
I have read a lot of performance articles and changed more settings:
innodb_thread_concurrency=18
innodb_lock_wait_timeout = 50
innodb_file_per_table = 1
innodb_read_io_threads=3000 (Defaults to 64)
innodb_write_io_threads=7000 (Defaults to 64)
innodb_io_capacity=10000
innodb_flush_log_at_trx_commit = 2
I still am getting slow inserts:
Here is what the table looks like:
-- Dumping structure for table domains.domains
CREATE TABLE IF NOT EXISTS `domains` (
`DomainID` bigint(19) unsigned NOT NULL AUTO_INCREMENT,
`DomainName` varchar(100) DEFAULT NULL,
`TLDid` int(5) unsigned DEFAULT '1',
`FirstSeen` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`LastUpdated` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`DomainID`),
UNIQUE KEY `UNIQUE DOMAIN INDEX` (`TLDid`,`DomainName`),
KEY `TIMESTAMP INDEX` (`LastUpdated`,`FirstSeen`),
KEY `TLD INDEX` (`TLDid`),
KEY `DOMAIN NAME INDEX` (`DomainName`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
TLDid is either a 1 or a 2 but it will represent the extension of the domain for example "Test.com" will be stored as DomainName: Test TLDid 1. "Test.net" will be stored as DomainName: Test TLDid: 2.
My question is how can I optimize this table with 130 million plus records with 2 unique constraints that need to be check before inserts so that it doesn't slow down the table to take 14 days to update new and current records?
Thanks Guys

Related

MySQL seems to be very slow for updates

MySQL seems to be very slow for updates.
A simple update statement is taking more time than MS SQL for same update call.
Ex:
UPDATE ValuesTbl SET value1 = #value1,
value2 = #value2
WHERE co_id = #co_id
AND sel_date = #sel_date
I have changed some config settings as below
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=10G
innodb_log_file_size=2G
log-bin="foo-bin"
skip-log-bin
This is the create table query
CREATE TABLE `valuestbl` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`sel_date` datetime NOT NULL,
`co_id` int(11) NOT NULL,
`value1` decimal(10,2) NOT NULL,
`value2` decimal(10,2) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=21621889 DEFAULT CHARSET=latin1;
MySQL version: 8.0 on Windows
The update query takes longer time to update when compared to MS SQL, anything else I need to do to make it faster?
There are no indices, the ValuesTbl tables has a PK, not using for anything. the id column is a Primary key from another table, the sel_date is a date field and 2 decimal columns
If there are no indexes on ValuesTbl then the update has to scan the entire table which will be slow if the table is large. No amount of server tuning will fix this.
A simple update statement is taking more time than MS SQL for same update call.
The MS SQL server probably has an index on either co_id or sel_date. Or it has fewer rows in the table.
You need to add indexes, like the index of a book, so the database doesn't have to search the whole table. At minimum an index on co_id will vastly help performance. If there are many columns with different sel_date per ID, a compound index on (co_id, sel_date) would help further.
See Use The Index, Luke for an extensive tutorial on indexes.

MySQL Count Distinct - Very Slow

I have a very big MySQL InnoDB table with following structure:
TABLE `whois_records` (
`record_id` int(10) unsigned NOT NULL,
`domain_name` varchar(100) NOT NULL,
`tld_id` smallint(5) unsigned DEFAULT NULL,
`create_date` date DEFAULT NULL,
`update_date` date DEFAULT NULL,
`expiry_date` date DEFAULT NULL,
`query_time` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
PRIMARY KEY (`record_id`)
UNIQUE KEY `domain_time` (`domain_name`,`query_time`)
INDEX `tld_id` (`tld_id`)
This table currently has 10 Million rows.
It stores frequently updated details of domain names.
So there can be multiple records for same domain name in the table.
TLD ID is the numeric value of the type of domain extension.
Problem is when I'm trying to count the total number of domain names of a particular TLD.
I have tried the following 3 SQL queries:
SELECT COUNT(DISTINCT(domain_name)) FROM `whois_records` WHERE tld_id=159
SELECT COUNT(*) FROM `whois_records` WHERE tld_id=159 GROUP BY domain_name
SELECT COUNT(*) FROM ( SELECT 1 FROM `whois_records` WHERE tld_id=159 GROUP BY domain_name) q
All the 3 are very slow, taking between 5 to 10 minutes. It is also using up a lot of CPU to complete. There is INDEX defined on the TLD ID column, so these queries might be doing a FULL INDEX SCAN. It is still very slow. TLD ID of 159 is for ".com", which are the most in number. So when doing a search for 159, it is slowest. For non-popular TLD, with less than 100 domains, the same query takes around 0.10 seconds. TLD ID 159 has around 6 Million records, which is 60% of the entire table consisting of 10 Million rows.
Is there any way to optimize the calculation?
As table grows, the current queries will take longer. So please can anyone help me with a future proof solution to this problem. Is any alteration of table required? Plz help, thank you :)
Extend the index to contain domain_name as well:
INDEX `tld_id` (`tld_id`, `domain_name`)
This should make MySQL use only the index and not table data to compute the result. If the combination of both values is unique, instead add a new unique index:
UNIQUE INDEX `new_index` (`tld_id`, `domain_name`)
I doubt you can push it a lot further than that. If it is still not fast enough, think about caching the counters.

Count the number of rows between unix time stamps for each ID

I'm trying to populate some data for a table. The query is being run on a table that contains ~50 million records. The query I'm currently using is below. It counts the number of rows that match the template id and are BETWEEN two unix timestamps:
SELECT COUNT(*) as count FROM `s_log`
WHERE `time_sent` BETWEEN '1346904000' AND '1346993271'
AND `template` = '1'
While the query above does work, performance is rather slow while looping through each template which at times can be in the hundreds. The time stamps are stored as int and are properly indexed. Just to test thing out, I tried running the query below, omitting the time_sent restriction:
SELECT COUNT(*) as count FROM `s_log`
AND `template` = '1'
As expected, it runs very fast, but is obviously not restricting count results inside the correct time frame. How can I obtain a count for a specific template AND restrict that count BETWEEN two unix timestamps?
EXPLAIN:
1 | SIMPLE | s_log | ref | time_sent,template | template | 4 | const | 71925 | Using where
SHOW CREATE TABLE s_log:
CREATE TABLE `s_log` (
`id` int(255) NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL,
`time_sent` int(25) NOT NULL,
`template` int(55) NOT NULL,
`key` varchar(255) NOT NULL,
`node_id` int(55) NOT NULL,
`status` varchar(55) NOT NULL,
PRIMARY KEY (`id`),
KEY `email` (`email`),
KEY `time_sent` (`time_sent`),
KEY `template` (`template`),
KEY `node_id` (`node_id`),
KEY `key` (`key`),
KEY `status` (`status`),
KEY `timestamp` (`timestamp`)
) ENGINE=MyISAM AUTO_INCREMENT=2078966 DEFAULT CHARSET=latin1
The best index you may have in this case is composite one template + time_sent
CREATE INDEX template_time_sent ON s_log (template, time_sent)
PS: Also as long as all your columns in the query are integer DON'T enclose their values in quotes (in some cases it could lead to issues, at least with older mysql versions)
First, you have to create an index that has both of your columns together (not seperately). Also check your table type, i think it would work great if your table is innoDB.
And lastly, use your WHERE clause in this fashion:
`WHEREtemplate= '1' ANDtime_sent` BETWEEN '1346904000' AND '1346993271'
What this does is first check if template is 1, if it is then it would check for the second condition else skip. This will definitely give you performance-edge
If you have to call the query for each template maybe it would be faster to get all the information with one query call by using GROUP BY:
SELECT template, COUNT(*) as count FROM `s_log`
WHERE `time_sent` BETWEEN 1346904000 AND 1346993271;
GROUP BY template
It's just a guess that this would be faster and you also would have to redesign your code a bit.
You could also try to use InnoDB instead of MyISAM. InnoDB uses a clustered index which maybe performs better on large tables. From the MySQL site:
Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (For example, MyISAM uses one file for data rows and another for index records.)
There are some questions on Stackoverflow which discuss the performance between InnoDB and MyISAM:
Should I use MyISAM or InnoDB Tables for my MySQL Database?
Migrating from MyISAM to InnoDB
MyISAM versus InnoDB

Slow MySQL InnoDB Inserts and Updates

I am using magento and having a lot of slowness on the site. There is very, very light load on the server. I have verified cpu, disk i/o, and memory is light- less than 30% of available at all times. APC caching is enabled- I am using new relic to monitor the server and the issue is very clearly insert/updates.
I have isolated the slowness to all insert and update statements. SELECT is fast. Very simple insert / updates into tables take 2-3 seconds whether run from my application or the command line mysql.
Example:
UPDATE `index_process` SET `status` = 'working', `started_at` = '2012-02-10 19:08:31' WHERE (process_id='8');
This table has 9 rows, a primary key, and 1 index on it.
The slowness occurs with all insert / updates. I have run mysqltuner and everything looks good. Also, changed innodb_flush_log_at_trx_commit to 2.
The activity on this server is very light- it's a dv box with 1 GB RAM. I have magento installs that run 100x better with 5x the load on a similar setup.
I started logging all queries over 2 seconds and it seems to be all inserts and full text searches.
Anyone have suggestions?
Here is table structure:
CREATE TABLE IF NOT EXISTS `index_process` (
`process_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`indexer_code` varchar(32) NOT NULL,
`status` enum('pending','working','require_reindex') NOT NULL DEFAULT 'pending',
`started_at` datetime DEFAULT NULL,
`ended_at` datetime DEFAULT NULL,
`mode` enum('real_time','manual') NOT NULL DEFAULT 'real_time',
PRIMARY KEY (`process_id`),
UNIQUE KEY `IDX_CODE` (`indexer_code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=10 ;
First: (process_id='8') - '8' is char/varchar, not int, so mysql convert value first.
On my system, I had long times (greater than one second) to update users.last_active_time.
The reason was that I had a few queries that long to perform. As I joined them for the users table. This resulted in blocking of the table to read. Death lock by SELECT.
I rewrote query from: JOIN to: sub-queries and porblem gone.

How long should it take to build an index using ALTER TABLE in MySQL?

This might be a bit like asking how long a length of string is, but the stats are:
Intel dual core 4GB RAM
Table with 8million rows, ~ 20 columns, mostly varchars with an auto_increment primary id
Query is: ALTER TABLE my_table ADD INDEX my_index (my_column);
my_column is varchar(200)
Storage is MyISAM
Order of magnitude, should it be 1 minute, 10 minutes, 100 minutes?
Thanks
Edit: OK it took 2 hours 37 minutes, compared to 0 hours 33 mins on a lesser spec machine, with essentially identical set ups. I've no idea why it took so much longer. The only possibility is that the prod machine HD is 85% full, with 100GB free. Should be enough, but i guess it depends on how that free space is distributed.
If you are just adding the single index, it should take about 10 minutes. However, it will take 100 minutes or more if you don't have that index file in memory.
Your 200 varchar with 8 million rows will take a maximum of 1.6GB, but with all of the indexing overhead it will take about 2-3 GB. But it will take less if most of the rows are less than 200 characters. (You might want to do a select sum(length(my_column)) to see how much space is required.)
You want to edit your /etc/mysql/my.cnf file. Play with these settings;
myisam_sort_buffer_size = 100M
sort_buffer_size = 100M
Good luck.
On my test MusicBrainz database, table track builds a PRIMARY KEY and three secondary indexes in 25 minutes:
CREATE TABLE `track` (
`id` int(11) NOT NULL,
`artist` int(11) NOT NULL,
`name` varchar(255) NOT NULL,
`gid` char(36) NOT NULL,
`length` int(11) DEFAULT '0',
`year` int(11) DEFAULT '0',
`modpending` int(11) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `gid` (`gid`),
KEY `artist` (`artist`),
KEY `name` (`name`)
) DEFAULT CHARSET=utf8
The table has 9001870 records.
Machine is Intel(R) Core(TM)2 CPU 6400 # 2.13GHz with 2Gb RAM, Fedora Core 12, MySQL 5.1.42.
##myisam_sort_buffer_size is 256M.
Additionally, if you ever need to build multiple indexes, its best to create all indexes in one call instead of individually... Reason: it basically apears to rewrite all the index pages to be inclusive of your new index with whatever else it had. I found this out in the past having a 2+ gig table and needed to build about 15 indexes on it. Building all individually kept incrementally growing in time between every index. Then trying all at once was a little more than about 3 individual indexes since it built all per record and wrote all at once instead of having to keep rebuilding pages.