I have a MySQL query
SELECT * FROM table WHERE INET_ATON("10.0.0.1") BETWEEN INET_ATON(s_ip) AND INET_ATON(e_ip);
Here "10.0.0.1" comes dynamically when a user visits the website and s_ip is the starting ip address column which would probably have "10.0.0.0" as starting ip address range and e_ip is the ending IP address.
Now, the problem is I have almost ~350K records which do only one thing when this query is executed and that is to get me the country code of the visitor.
When this query is executed MySQL peaks CPU consumption at 1100% and multiply that by 1000 requests/minute and my server just cannot handle it.
My server is running CentOS 7 with 100 GB of RAM and 24 Cores clocked at 3.0 GHz but still the performance is becoming a nightmare for me to handle.
I was thinking of outsourcing this functionality to third party service but I just want to make sure that nothing can be done from my side to fix this issue.
(From Comments)
CREATE TABLE ip` (
ip_ip varbinary(16) NOT NULL,
ip_last_request_time timestamp(3) NULL DEFAULT NULL,
ip_min_timeSpan_get smallint(5) unsigned NOT NULL,
ip_min_timeSpan_post smallint(5) unsigned NOT NULL,
ip_violationsCount_get smallint(5) unsigned NOT NULL,
ip_violationsCount_post smallint(5) unsigned NOT NULL,
ip_maxViolations_get smallint(5) unsigned NOT NULL,
ip_maxViolations_post smallint(5) unsigned NOT NULL,
ip_bannedAt timestamp(3) NULL DEFAULT NULL,
ip_banSeconds mediumint(8) unsigned NOT NULL DEFAULT '300',
ip_isCapatchaResolved tinyint(1) NOT NULL DEFAULT '0',
ip_isManualBanned tinyint(1) NOT NULL DEFAULT '0',
ip_city varchar(45) DEFAULT '',
ip_region varchar(45) DEFAULT '',
ip_regionCode varchar(5) DEFAULT '',
ip_regionName varchar(45) DEFAULT '',
ip_countryCode varchar(3) DEFAULT '',
ip_countryName varchar(45) DEFAULT '',
ip_continentCode varchar(3) DEFAULT '',
ip_continentName varchar(45) DEFAULT '',
ip_timezone varchar(45) DEFAULT '',
ip_currencyCode varchar(4) DEFAULT '',
ip_currencySymbol_UTF8 varchar(5) DEFAULT '',
PRIMARY KEY (ip_ip),
KEY countryCode_index (ip_countryCode)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4`
CREATE TABLE country` ( co_id char(2) COLLATE utf8mb4_unicode_ci NOT NULL,
co_re_id smallint(6) DEFAULT NULL,
co_flag_id char(4) COLLATE utf8mb4_unicode_ci NOT NULL,
co_english_name varchar(40) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (co_id),
KEY fk_country_region1_idx (co_re_id),
CONSTRAINT fk_country_region1 FOREIGN KEY (co_re_id)
REFERENCES region (re_id) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Currently you're doing a full table scan for every query. There are a couple of things you can try.
Store INET_ATON(s_ip) in the table so it's not computed during the query. Same for e_ip.
Add an index that has these two new columns, and the country code.
Change the query to select only the country code, and use the two new columns.
Use EXPLAIN to make sure the DB uses the index for the query.
The optimizer does not know that you have a set of non-overlapping ranges that it could do some optimizations based on it. So, you have work harder to optimize the queries.
Instead of doing table scans, the code described here will do typical queries 'instantly'.
To put it bluntly, you cannot optimize the query without restructuring the data. I'm speaking also to all who have provided Answers and Comments.
(critique of schema)
ip is awfully bulky. Suggest moving city and all the fields after it to another table in order to 'normalize' that data.
It is 'wrong' to have both a ..code and ..name in the same table (except for the normalization table).
Several fields can (and should) be ascii, not utf8mb4. Example: countryCode.
On another topic... How will you handle AOL IP addresses? As I understand it, these are shared among its customers. That is, a "violator" will move around, tainting all of the AOL IPs.
10., 11., 172.16., 192.168. all come from behind a NAT, and cannot be associated with a given country, nor a given computer.
Related
My host has been sending me messages over the last few months saying that my site is using way too many MySQL minutes. They also send some logs showing which queries use up the most time on occasion. Some of the queries are kind of long and complicated, so I understand why they would be an issue. But a few have me scratching my head. The one I want to focus on next is this:
UPDATE parentmessages SET views=views+1 WHERE parentid='11308'
The number is just an example, it could be any parentid. The parentmessages table has parentid as the primary key, so I would think it would be indexed and easily found. There are about 11,000 records in the table, which is not really that many. Here are the numbers my host gave me for how long this query took over 6 instances yesterday:
Taking 0.126455 , 1.472929 , 1.638743 , 3.040538 , 7.130041 , 112.498037 seconds to complete
The 112 could be a random glitch I suppose, but why would it take 3, 7 seconds sometimes?! My best bet is because I have a lot of indices on the table but I don't know enough about MySQL to know if that would matter. And why would it sometimes be 1/10th of a second and sometimes many seconds?
Here is the show create table:
CREATE TABLE `parentmessages` (
`parentid` int(7) NOT NULL AUTO_INCREMENT,
`active` tinyint(1) NOT NULL,
`level` int(2) NOT NULL,
`type` varchar(10) NOT NULL,
`hidden` tinyint(1) DEFAULT NULL,
`sticky` tinyint(1) NOT NULL,
`poll` tinyint(1) NOT NULL,
`topic` varchar(120) DEFAULT NULL,
`message` varchar(30000) NOT NULL,
`views` int(6) NOT NULL,
`replies` int(5) NOT NULL,
`userid` int(7) NOT NULL,
`datetimecalc` int(11) NOT NULL,
`lastreplycalc` int(11) NOT NULL,
`lastreplyuser` int(7) NOT NULL,
`editedcalc` int(11) DEFAULT NULL,
`editeduser` int(7) DEFAULT NULL,
`realediteduser` int(7) DEFAULT NULL,
`altint` int(7) DEFAULT NULL,
`imageurl` varchar(125) DEFAULT NULL,
`locked` tinyint(1) NOT NULL,
`tempid` int(12) NOT NULL,
PRIMARY KEY (`parentid`),
KEY `useridindex` (`userid`),
KEY `datetimecalcindex` (`datetimecalc`),
KEY `activeindex` (`active`),
KEY `lastreplycalcindex` (`lastreplycalc`),
KEY `levelindex` (`level`),
KEY `stickyindex` (`sticky`)
) ENGINE=MyISAM AUTO_INCREMENT=11716 DEFAULT CHARSET=latin1
One reason could be, that another slow query is blocking the table and your update is just waiting for the other query to finish.
Don't use MyISAM. I forget who said it, maybe PeterZ but "using myisam means you don't care about your data". The easiest way is to check for table locking is to look at the processlist. Dumps, inserts, updates etc will all lock the table. MyISAM is all but deprecated in 5.6 for good reason.
I have a legacy Access front end connected to a mySQL database. The legacy app has a lot of dangerous macros assigned to onclose triggers. I also have a web application under development running on the same database. There are a couple of modules in the web app that are in production use. My testing is being done on a separate development machine with a separate dedicated development version of the database.
A new module I'm installing into my web app comes with it's own set of tables. It will happily exist in the same database but want's it's own copy of the data in it's own tables. I hesitate to extensively modify the new tables or code base for that module.
There are a total of 6 tables that hold similar data for different objects in the legacy database. I am only working on the 2 most important of those tables now. The below represents only a very small subset of the columns in these 2 tables.
CREATE TABLE IF NOT EXISTS `agent` (
`age_id` int(11) NOT NULL AUTO_INCREMENT,
`age_agent_email_address` varchar(255) DEFAULT NULL,
`age_welcome_email_sent_y_or_n` varchar(255) DEFAULT 'No',
`age_status` varchar(255) DEFAULT 'Active',
PRIMARY KEY (`age_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC AUTO_INCREMENT=1854 ;
And
CREATE TABLE IF NOT EXISTS `prospecting_contacts` (
`psp_prospect_id` varchar(255) NOT NULL DEFAULT '',
`psp_prospecting_status` varchar(255) DEFAULT 'Active',
`psp_prospect_email_address` varchar(255) DEFAULT NULL,
`psp_remove_from_email_marketing` varchar(255) DEFAULT 'No',
`psp_id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`psp_id`) USING BTREE,
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC AUTO_INCREMENT=2050793 ;
There are several related tables that came with the new module. I believe only one of them needs to be updated.
CREATE TABLE IF NOT EXISTS `phplist_user_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(255) CHARACTER SET latin1 NOT NULL,
`confirmed` tinyint(4) DEFAULT '0',
`blacklisted` tinyint(4) DEFAULT '0',
`bouncecount` int(11) DEFAULT '0',
`entered` datetime DEFAULT NULL,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`uniqid` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`htmlemail` tinyint(4) DEFAULT '0',
`subscribepage` int(11) DEFAULT NULL,
`rssfrequency` varchar(100) CHARACTER SET latin1 DEFAULT NULL,
`password` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`passwordchanged` date DEFAULT NULL,
`disabled` tinyint(4) DEFAULT '0',
`extradata` text CHARACTER SET latin1,
`foreignkey` varchar(100) CHARACTER SET latin1 DEFAULT NULL,
`optedin` tinyint(4) DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`),
KEY `foreignkey` (`foreignkey`),
KEY `idx_phplist_user_user_uniqid` (`uniqid`),
KEY `emailidx` (`email`),
KEY `enteredindex` (`entered`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=11 ;
The php_list_user_user table would include data that is a result of this query:
SELECT `age_agent_email_address` AS `email` FROM `agent`
WHERE `age_status` = 'Active'
UNION DISTINCT
SELECT `psp_prospect_email_address` FROM `prospecting_contacts`
WHERE `psp_prospecting_status` = 'Active'
The legacy access application updates the agent and prospecting_contacts tables. The new module updates the php_list_user_user table. I believe I can copy the information back and forth using TRIGGER. But, I'm looking for a way that doesn't duplicate data.
I had thought of CREATE VIEW, but the mysql manual says that unions and joins break it's update ability. http://dev.mysql.com/doc/refman/5.1/en/view-updatability.html
So, is there a way to update these 3 tables without duplicating data? Or should I just duplicate the email addresses and use TRIGGERs on INSERT and UPDATE?
You might be able to do something clever with foreign keys though they are more attuned to keeping tables consistent rather than preventing duplicates. http://dev.mysql.com/doc/refman/5.1/en/innodb-foreign-key-constraints.html
It may seem counter-intuitive but another solution would be to maintain a lookup table that indicated where a specific value could be found. You could join with all three of the (sub)tables to prevent duplicates.
A trigger would work too.
This question expects a generic answer to the wide problematic of indexes creation on MySQL database.
Let's take this table example :
CREATE TABLE IF NOT EXISTS `article` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`published` tinyint(1) NOT NULL DEFAULT '0',
`author_id` int(11) unsigned NOT NULL,
`modificator_id` int(11) unsigned DEFAULT NULL,
`category_id` int(11) unsigned DEFAULT NULL,
`title` varchar(200) COLLATE utf8_unicode_ci NOT NULL,
`headline` text COLLATE utf8_unicode_ci NOT NULL,
`content` text COLLATE utf8_unicode_ci NOT NULL,
`url_alias` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`priority` mediumint(11) unsigned NOT NULL DEFAULT '50',
`publication_date` datetime NOT NULL,
`creation_date` datetime NOT NULL,
`modification_date` datetime NOT NULL,
PRIMARY KEY (`id`)
);
Over such a sample there is a wide range of queries that could be performed on different criterions :
category_id
published
publication_date
e.g.:
SELECT id FROM article WHERE NOT published AND category_id = '2' ORDER BY publication_date;
On many tables you can see a wide range of state fields (like published here), date fields or reference fields (like author_id or category_id). What strategy should be picked to make indexes ?
Which can be developed under the following points:
Make an index on every fields that can be used in query (either as where argument or order by) even if this can lead to have a lot of indexes per table ?
Also make an index on fields that have only a small set of values like boolean or enum, this just does reduce the scope size of the scan by a n factor (assuming n being the number of inputs and every value homogeneously used) ?
I've read that MySQL prior to 5.0 used only one index per request how do the system picks it ? (by choosing the more restrictive one ?)
How does a OR statement is processed ?
How much does this is going to slow insert ?
Does InnoDB/MyISAM change anything to this problem ?
I know the EXPLAIN statement could be used to know whether a request is optimized or not, but a bit of concrete theoretical stuff would really be more constructive than a purely empirical approach !
I'm trying to dedup a table, where I know there are 'close' (but not exact) rows that need to be removed.
I have a single table, with 22 fields, and uniqueness can be established through comparing 5 of those fields. Of the remaining 17 fields, (including the unique key), there are 3 fields that cause each row to be unique, meaning the dedup proper method will not work.
I was looking at the multi table delete method outlined here: http://blog.krisgielen.be/archives/111 but I can't make sense of the final line of code (AND M1.cd*100+M1.track > M2.cd*100+M2.track) as I am unsure what the cd*100 part achieves...
Can anyone assist me with this? I suspect I could do better exporting the whole thing to python, doing something with it, then re-importing it, but then (1)I'm stuck with knowing how to dedup the string anyway! and (2) I had to break the record into chunks to be able to import it into mysql as it was timing out after 300 seconds so it turned into a whole debarkle to get into mysql in the first place.... (I am very novice at both mysql and python)
The table is a dump of some 40 log files from some testing. The test set for each log is some 20,000 files. The repeating values are either the test conditions, the file name/parameters or the results of the tests.
CREATE SHOW TABLE:
CREATE TABLE `t1` (
`DROID_V` int(1) DEFAULT NULL,
`Sig_V` varchar(7) DEFAULT NULL,
`SPEED` varchar(4) DEFAULT NULL,
`ID` varchar(7) DEFAULT NULL,
`PARENT_ID` varchar(10) DEFAULT NULL,
`URI` varchar(10) DEFAULT NULL,
`FILE_PATH` varchar(68) DEFAULT NULL,
`NAME` varchar(17) DEFAULT NULL,
`METHOD` varchar(10) DEFAULT NULL,
`STATUS` varchar(14) DEFAULT NULL,
`SIZE` int(10) DEFAULT NULL,
`TYPE` varchar(10) DEFAULT NULL,
`EXT` varchar(4) DEFAULT NULL,
`LAST_MODIFIED` varchar(10) DEFAULT NULL,
`EXTENSION_MISMATCH` varchar(32) DEFAULT NULL,
`MD5_HASH` varchar(10) DEFAULT NULL,
`FORMAT_COUNT` varchar(10) DEFAULT NULL,
`PUID` varchar(15) DEFAULT NULL,
`MIME_TYPE` varchar(24) DEFAULT NULL,
`FORMAT_NAME` varchar(10) DEFAULT NULL,
`FORMAT_VERSION` varchar(10) DEFAULT NULL,
`INDEX` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`INDEX`)
) ENGINE=MyISAM AUTO_INCREMENT=960831 DEFAULT CHARSET=utf8
The only unique field is the PriKey, 'index'.
Unique records can be established by looking at DROID_V,Sig_V,SPEED.NAME and PUID
Of the ¬900,000 rows, I have about 10,000 dups that are either a single duplicate of a record, or have upto 6 repetitions of the record.
Row examples: As Is
5;"v37";"slow";"10266";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/7";"image/tiff";"Tagged Ima";"3";"191977"
5;"v37";"slow";"10268";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/8";"image/tiff";"Tagged Ima";"4";"191978"
5;"v37";"slow";"10269";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/9";"image/tiff";"Tagged Ima";"5";"191979"
5;"v37";"slow";"10270";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/10";"image/tiff";"Tagged Ima";"6";"191980"
5;"v37";"slow";"12766";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/7";"image/tiff";"Tagged Ima";"3";"193977"
5;"v37";"slow";"12768";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/8";"image/tiff";"Tagged Ima";"4";"193978"
5;"v37";"slow";"12769";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/9";"image/tiff";"Tagged Ima";"5";"193979"
5;"v37";"slow";"12770";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/10";"image/tiff";"Tagged Ima";"6";"193980"
Row Example: As It should be
5;"v37";"slow";"10266";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/7";"image/tiff";"Tagged Ima";"3";"191977"
5;"v37";"slow";"10268";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/8";"image/tiff";"Tagged Ima";"4";"191978"
5;"v37";"slow";"10269";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/9";"image/tiff";"Tagged Ima";"5";"191979"
5;"v37";"slow";"10270";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/10";"image/tiff";"Tagged Ima";"6";"191980"
Please note, you can see from the index column at the end that I have cut out some other rows - I have only idenitified a very small set of repeating rows. Please let me know if you need any more 'noise' from the rest of the DB
Thanks.
I figured out a fix - using the count function, I was using a COUNT(*) that just returned everything in the table, by using a COUNT (distinct NAME) function I am able to weed out the dup rows that fit the dup critera (as set out by the field selection in a WHERE clause)
Example:
SELECT `PUID`,`DROID_V`,`SIG_V`,`SPEED`, COUNT(distinct NAME) as Hit FROM sourcelist, main_small WHERE sourcelist.SourcePUID = 'MyVariableHere' AND main_small.NAME = sourcelist.SourceFileName
GROUP BY `PUID`,`DROID_V`,`SIG_V`,`SPEED` ORDER BY `DROID_V` ASC, `SIG_V` ASC, `SPEED`;
The following query is using temporary and filesort. I'd like to avoid that if possible.
SELECT lib_name, description, count(seq_id), floor(avg(size))
FROM libraries l JOIN sequence s ON (l.lib_id=s.lib_id)
WHERE s.is_contig=0 and foreign_seqs=0 GROUP BY lib_name;
The EXPLAIN says:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,s,ref,libseq,contigs,contigs,4,const,28447,Using temporary; Using filesort
1,SIMPLE,l,eq_ref,PRIMARY,PRIMARY,4,s.lib_id,1,Using where
The tables look like this:
libraries
CREATE TABLE `libraries` (
`lib_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lib_name` varchar(30) NOT NULL,
`method_id` int(10) unsigned DEFAULT NULL,
`lib_efficiency` decimal(4,2) unsigned DEFAULT NULL,
`insert_avg` decimal(5,2) DEFAULT NULL,
`insert_high` decimal(5,2) DEFAULT NULL,
`insert_low` decimal(5,2) DEFAULT NULL,
`amtvector` decimal(4,2) unsigned DEFAULT NULL,
`description` text,
`foreign_seqs` tinyint(1) NOT NULL DEFAULT '0' COMMENT '1 means the sequences in this library are not ours',
PRIMARY KEY (`lib_id`),
UNIQUE KEY `lib_name` (`lib_name`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=latin1;
sequence
CREATE TABLE `sequence` (
`seq_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`seq_name` varchar(40) NOT NULL DEFAULT '',
`lib_id` int(10) unsigned DEFAULT NULL,
`size` int(10) unsigned DEFAULT NULL,
`add_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`sequencing_date` date DEFAULT '0000-00-00',
`comment` text DEFAULT NULL,
`is_contig` int(10) unsigned NOT NULL DEFAULT '0',
`fasta_seq` longtext,
`primer` varchar(15) DEFAULT NULL,
`gc_count` int(10) DEFAULT NULL,
PRIMARY KEY (`seq_id`),
UNIQUE KEY `seq_name` (`seq_name`),
UNIQUE KEY `libseq` (`lib_id`,`seq_id`),
KEY `primer` (`primer`),
KEY `sgitnoc` (`seq_name`,`is_contig`),
KEY `contigs` (`is_contig`,`seq_name`) USING BTREE,
CONSTRAINT `FK_sequence_1` FOREIGN KEY (`lib_id`) REFERENCES `libraries` (`lib_id`)
) ENGINE=InnoDB AUTO_INCREMENT=61508 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
Are there any changes I can do to make the query go faster? If not, when (for a web application) is it worth putting the results of a query like the above into a MEMORY table?
First strategy: make it faster for mySQL to locate the records you want summarized.
You've already got an index on sequence.is_contig. You might try indexing on libraries.foreign_seqs. I don't know if that will help, but it's worth a try.
Second strategy: see if you can get your sort to run in memory, rather than in a file. Try making the sort_buffer_size parameter bigger. This will consume RAM on your server, but that's what RAM is for.
Third strategy: IF your application needs to do this query a lot but updates the underlying data only a little, take your own suggestion and create a summary table. Perhaps use an EVENT to remake the summary table., and run it once every few minutes. If you're going to follow that strategy, start by creating a view with this table in it and have your app retrieve information from the view. Then get the summary table stuff working, drop the view, and give the summary table the same name as the view. That way your data model work and your application design work can proceed independently of each other.
Final suggestion: If this is truly slowly-changing summary data, switch to myISAM. It's a little faster for this kind of data wrangling.