I've got a query which seems to be impossible to optimise further (with regards to execution time). It's a plain simple query, indexes are in place, I've tried to configure InnoDB settings...but nothing really seems to help.
Tables
The query is a JOIN between the three tables trk, auf and paf.
trk : temporary table holding id's representing tracks.
auf : table representing audio files associated with the tracks.
paf : table holding the id's of published audio files. Acts as a "filter".
// 'trk' table
CREATE TEMPORARY TABLE auf_713340 (
`id` char(36),
PRIMARY KEY (id)
) ENGINE=MEMORY);
// 'auf' table
CREATE TABLE `file` (
`id` char(36) NOT NULL,
`track_id` char(36) NOT NULL,
`type` varchar(3) DEFAULT NULL,
`quality` int(1) DEFAULT '0',
`size` int(20) DEFAULT '0',
`duration` float DEFAULT '0',
`bitrate` int(6) DEFAULT '0',
`samplerate` int(5) DEFAULT '0',
`tagwritten` datetime DEFAULT NULL,
`tagwriteattempts` int(3) NOT NULL DEFAULT '0',
`audiodataread` datetime DEFAULT NULL,
`audiodatareadattempts` int(3) NOT NULL DEFAULT '0',
`converted` datetime DEFAULT NULL,
`convertattempts` int(3) NOT NULL DEFAULT '0',
`waveformgenerated` datetime DEFAULT NULL,
`waveformgenerationattempts` int(3) NOT NULL DEFAULT '0',
`flag` int(1) NOT NULL DEFAULT '0',
`status` int(1) NOT NULL DEFAULT '0',
`updated` datetime NOT NULL DEFAULT '2000-01-01 00:00:00',
PRIMARY KEY (`id`),
KEY `FK_file_track` (`track_id`),
KEY `file_size` (`size`),
KEY `file_type` (`type`),
KEY `file_quality` (`quality`),
CONSTRAINT `file_ibfk_1` FOREIGN KEY (`track_id`) REFERENCES `track` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
// 'paf' table
CREATE TABLE `publishedfile` (
`file_id` varchar(36) NOT NULL,
`data` varchar(255) DEFAULT NULL,
`file_updated` datetime NOT NULL,
PRIMARY KEY (`file_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
The query usually takes between 1500 ms and 2500 ms to execute with somewhere between 50 and 100 ids in the trk table.The auf table holds about 1.1 million rows, and the paf table holds about 900.000 rows.
The MySQL server runs on a 4GB Rackspace Cloud Server instance.
The Query
SELECT auf.*
FROM auf_713340 trk
INNER JOIN file auf
ON auf.track_id = trk.id
INNER JOIN publishedfile paf
ON auf.id = paf.file_id
The Query w/EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE trk ALL NULL NULL NULL NULL 60
1 SIMPLE auf ref PRIMARY,FK_file_track FK_file_track 108 func 1 Using where
1 SIMPLE paf eq_ref PRIMARY PRIMARY 110 trackerdatabase_development.auf.id 1 Using where; Using index
The InnoDB configuration
[mysqld]
# The size of memory used to cache table data and indexes. The larger
# this value is, the less I/O is needed to access data in tables.
# Default value is 8MB. Recommendations point towards 70% - 80% of
# available system memory.
innodb_buffer_pool_size=2850M
# Recommendations point towards using O_DIRECT to avoid double buffering.
# innodb_flush_method=O_DIRECT
# Recommendations point towards using 256M.
# #see http://www.mysqlperformanceblog.com/2006/07/03/choosing-proper-innodb_log_file_size/
innodb_log_file_size=256M
# The size in bytes of the buffer that InnoDB uses to write to the log files
# on disk. Recommendations point towards using 4MB.
innodb_log_buffer_size=4M
# The size of the buffer used for MyISAM index blocks.
key_buffer_size=128M
Now, the question is; what can I do to get the query to perform better? After all, the tables in question are not that big and indexes are in place..?
In auf table make id field as int(11) and make it auto increment. all int field length which are >11 , edit them into 11.
Thanks
Ripa Saha
Try this:
SELECT auf.*
FROM file auf
WHERE EXISTS
( SELECT *
FROM auf_713340 trk
WHERE auf.track_id = trk.id
)
AND EXISTS
( SELECT *
FROM publishedfile paf
WHERE auf.id = paf.file_id
) ;
I would also test and compare efficiency with the temporary table defined with InnoDB engine or with the (Primary) index as a BTREE index. Memory tables have HASH indices by default, not Btree if I remember correctly.
Related
I have an Aurora MySQL cluster and when running queries against the reader I see a degradation in performance over time. A reboot of the reader results in query performance that matches the writer. But after going a week without a reboot queries take 25x as long to run.
The replication lag for the reader instance is 20ms and none of the monitoring metrics are showing issues. The highest I have seen the CPU is 40%. I tried a suggestion to set block_nested_loop to off but that had no effect.
The reader does not get much activity so load should not be an issue. We do need to run a complex query against it that returns a lot of data which is used for analytics. I have found that queries that return a small number of records that are retrieved by an index do NOT have the performance problem. But a similar query that returns the same small number of records and requires a table scan does have the performance problem.
The rate of degradation seems consistent, so it seems like a resource issue related to replication, but I have not had any luck finding anything online documenting the issue.
Any help would be much appreciated.
Update: Additional details
Query execution plans
-- Fast query
explain select cpv.SHORT_TEXT_VALUE, c.UIDPK, c.GUID, c.SHARED_ID, cpv.*
from TCUSTOMERPROFILEVALUE cpv
inner join TCUSTOMER c on cpv.CUSTOMER_UID = c.UIDPK
where LOCALIZED_ATTRIBUTE_KEY = 'CP_EMAIL' and cpv.SHORT_TEXT_VALUE = 'some-email#gmail.com';
-- Slow query, using function to prevent use of index for email match
explain select cpv.SHORT_TEXT_VALUE, c.UIDPK, c.GUID, c.SHARED_ID, cpv.*
from TCUSTOMERPROFILEVALUE cpv
inner join TCUSTOMER c on cpv.CUSTOMER_UID = c.UIDPK
where LOCALIZED_ATTRIBUTE_KEY = 'CP_EMAIL' and LOWER(cpv.SHORT_TEXT_VALUE) = 'some-email#gmail.com';
Table definitions
CREATE TABLE `TCUSTOMERPROFILEVALUE` (
`UIDPK` bigint(20) NOT NULL,
`ATTRIBUTE_UID` bigint(20) NOT NULL,
`ATTRIBUTE_TYPE` int(11) NOT NULL,
`LOCALIZED_ATTRIBUTE_KEY` varchar(255) NOT NULL,
`SHORT_TEXT_VALUE` varchar(255) DEFAULT NULL,
`LONG_TEXT_VALUE` mediumtext,
`INTEGER_VALUE` int(11) DEFAULT NULL,
`DECIMAL_VALUE` decimal(19,2) DEFAULT NULL,
`BOOLEAN_VALUE` int(11) DEFAULT '0',
`DATE_VALUE` datetime DEFAULT NULL,
`CUSTOMER_UID` bigint(20) DEFAULT NULL,
`LAST_MODIFIED_DATE` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`CREATION_DATE` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`UIDPK`),
KEY `I_CPV_ATTR_UID` (`ATTRIBUTE_UID`),
KEY `I_CPV_CUID_ATTKEY` (`CUSTOMER_UID`,`LOCALIZED_ATTRIBUTE_KEY`),
KEY `I_CPV_STV_ATTVALUE` (`SHORT_TEXT_VALUE`),
KEY `I_CPV_ATTKEY_SHORTTEXT` (`LOCALIZED_ATTRIBUTE_KEY`,`SHORT_TEXT_VALUE`),
CONSTRAINT `FK_PROFILE_CUSTOMER` FOREIGN KEY (`CUSTOMER_UID`) REFERENCES `TCUSTOMER` (`UIDPK`) ON DELETE CASCADE,
CONSTRAINT `TCUSTOMERPROFILEVALUE_FK_1` FOREIGN KEY (`ATTRIBUTE_UID`) REFERENCES `TATTRIBUTE` (`UIDPK`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='values associated with customer profiles.'
CREATE TABLE `TCUSTOMER` (
`UIDPK` bigint(20) NOT NULL,
`PREF_BILL_ADDRESS_UID` bigint(20) DEFAULT NULL,
`PREF_SHIP_ADDRESS_UID` bigint(20) DEFAULT NULL,
`CREATION_DATE` datetime NOT NULL,
`LAST_EDIT_DATE` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`GUID` varchar(64) NOT NULL,
`STATUS` int(11) NOT NULL,
`AUTHENTICATION_UID` bigint(20) DEFAULT NULL,
`STORECODE` varchar(64) DEFAULT NULL,
`IS_FIRST_TIME_BUYER` tinyint(4) DEFAULT '1',
`CUSTOMER_TYPE` varchar(64) NOT NULL,
`SHARED_ID` varchar(255) NOT NULL,
`PARENT_CUSTOMER_GUID` varchar(64) DEFAULT NULL,
`DTYPE` varchar(40) DEFAULT 'ExtCustomerImpl',
`LAST_SESSION_DATE` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`UIDPK`),
UNIQUE KEY `TCUSTOMER_UNIQUE` (`GUID`),
UNIQUE KEY `TCUSTOMER_SHARED_ID_TYPE_UNIQ` (`SHARED_ID`,`CUSTOMER_TYPE`),
UNIQUE KEY `I_CUST_AUTH_UID` (`AUTHENTICATION_UID`),
UNIQUE KEY `SHARED_ID` (`SHARED_ID`,`STORECODE`),
KEY `I_CUST_CR_DATE` (`CREATION_DATE`),
KEY `I_CUST_STORE_CODE` (`STORECODE`),
KEY `I_TYPE_LAST_EDIT` (`CUSTOMER_TYPE`,`LAST_EDIT_DATE`),
KEY `I_CUSTOMER_SHAREDID` (`SHARED_ID`),
KEY `I_CUSTOMER_PARENT` (`PARENT_CUSTOMER_GUID`),
CONSTRAINT `CUSTOMER_STORECODE_FK` FOREIGN KEY (`STORECODE`) REFERENCES `TSTORE` (`STORECODE`),
CONSTRAINT `TCUSTOMER_PARENT_GUID_FK` FOREIGN KEY (`PARENT_CUSTOMER_GUID`) REFERENCES `TCUSTOMER` (`GUID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='customer account information.'
Indexes
Well, I can't explain the shift in performance unless the Optimizer is randomly shifting between different query plans.
I do see that you are using the notorious Entity-Attribute-Value schema design. And doing it in a rather bulky and complex way -- with multiple columns for different datatypes.
I do see a few things that can probably help performance in general as the dataset grows. (I assume it will grow.)
The primary key, UIDPK of the attribute table TCUSTOMERPROFILEVALUE probably has no use. This will probably be better: PRIMARY KEY(CUSTOMER_UID, ATTRIBUTE_UID). Or maybe that should be LOCALIZED_ATTRIBUTE_KEY??? Why are there two columns for the attribute?
When changing the PK, this KEY I_CPV_ATTKEY_SHORTTEXT (LOCALIZED_ATTRIBUTE_KEY,SHORT_TEXT_VALUE) would implicitly have CUSTOMER_UID added on the end, thereby benefiting your JOIN.
BIGINT is usually overkill; consider using a smaller datatype.
Do you have another attribute table - TATTRIBUTE?
Having 5 UNIQUE keys for a table slows down inserts. Perhaps you can have fewer?
INDEX(SHARED_ID) is redundant since there are other keys starting with that column.
Have your tried removing the LOWER(xxxx) from the SLOW QUERY?
If this corrects the problem, and your results are the same, you were just wasting time with the LOWER(xxx) manipulation.
I'm trying to denormalize a few MySQL tables I have into a new table that I can use to speed up some complex queries with lots of business logic. The problem that I'm having is that there are 2.3 million records I need to add to the new table and to do that I need to pull data from several tables and do a few conversions too. Here's my query (with names changed)
INSERT INTO database_name.log_set_logs
(offload_date, vehicle, jurisdiction, baselog_path, path,
baselog_index_guid, new_location, log_set_name, index_guid)
(
select STR_TO_DATE(logset_logs.offload_date, '%Y.%m.%d') as offload_date,
logset_logs.vehicle, jurisdiction, baselog_path, path,
baselog_trees.baselog_index_guid, new_location, logset_logs.log_set_name,
logset_logs.index_guid
from
(
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 7), '/', -1) as offload_date,
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle,
SUBSTRING_INDEX(path, '/', 9) as baselog_path, index_guid,
path, log_set_name
FROM database_name.baselog_and_amendment_guid_to_path_mappings
) logset_logs
left join database_name.log_trees baselog_trees
ON baselog_trees.original_location = logset_logs.baselog_path
left join database_name.baselog_offload_location location
ON location.baselog_index_guid = baselog_trees.baselog_index_guid);
The query itself works because I was able to run it using a filter on log_set_name however that filter's condition will only work for less than 1% of the total records because one of the values for log_set_name has 2.2 million records in it which is the majority of the records. So there is nothing else I can use to break this query up into smaller chunks from what I can see. The problem is that the query is taking too long to run on the rest of the 2.2 million records and it ends up timing out after a few hours and then the transaction is rolled back and nothing is added to the new table for the 2.2 million records; only the 0.1 million records were able to be processed and that was because I could add a filter that said where log_set_name != 'value with the 2.2 million records'.
Is there a way to make this query more performant? Am I trying to do too many joins at once and perhaps I should populate the row's columns in their own individual queries? Or is there some way I can page this type of query so that MySQL executes it in batches? I already got rid of all my indexes on the log_set_logs table because I read that those will slow down inserts. I also jacked my RDS instance up to a db.r4.4xlarge write node. I am also using MySQL Workbench so I increased all of it's timeout values to their maximums giving them all nines. All three of these steps helped and were necessary in order for me to get the 1% of the records into the new table but it still wasn't enough to get the 2.2 million records without timing out. Appreciate any insights as I'm not adept to this type of bulk insert from a select.
'CREATE TABLE `log_set_logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`purged` tinyint(1) NOT NULL DEFAUL,
`baselog_path` text,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`new_location` text,
`offload_date` date NOT NULL,
`jurisdiction` varchar(20) DEFAULT NULL,
`vehicle` varchar(20) DEFAULT NULL,
`index_guid` varchar(36) NOT NULL,
`path` text NOT NULL,
`log_set_name` varchar(60) NOT NULL,
`protected_by_retention_condition_1` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_2` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_3` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_4` tinyint(1) NOT NULL DEFAULT ''1'',
`general_comments_about_this_log` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1736707 DEFAULT CHARSET=latin1'
'CREATE TABLE `baselog_and_amendment_guid_to_path_mappings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`path` text NOT NULL,
`index_guid` varchar(36) NOT NULL,
`log_set_name` varchar(60) NOT NULL,
PRIMARY KEY (`id`),
KEY `log_set_name_index` (`log_set_name`),
KEY `path_index` (`path`(42))
) ENGINE=InnoDB AUTO_INCREMENT=2387821 DEFAULT CHARSET=latin1'
...
'CREATE TABLE `baselog_offload_location` (
`baselog_index_guid` varchar(36) NOT NULL,
`jurisdiction` varchar(20) NOT NULL,
KEY `baselog_index` (`baselog_index_guid`),
KEY `jurisdiction` (`jurisdiction`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'
'CREATE TABLE `log_trees` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`original_location` text NOT NULL, -- This is what I have to join everything on and since it's text I cannot index it and the largest value is above 255 characters so I cannot change it to a vachar then index it either.
`new_location` text,
`distcp_returncode` int(11) DEFAULT NULL,
`distcp_job_id` text,
`distcp_stdout` text,
`distcp_stderr` text,
`validation_attempt` int(11) NOT NULL DEFAULT ''0'',
`validation_result` tinyint(1) NOT NULL DEFAULT ''0'',
`archived` tinyint(1) NOT NULL DEFAULT ''0'',
`archived_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`dir_exists` tinyint(1) NOT NULL DEFAULT ''0'',
`random_guid` tinyint(1) NOT NULL DEFAULT ''0'',
`offload_date` date NOT NULL,
`vehicle` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `baselog_index_guid` (`baselog_index_guid`)
) ENGINE=InnoDB AUTO_INCREMENT=1028617 DEFAULT CHARSET=latin1'
baselog_offload_location has not PRIMARY KEY; what's up?
GUIDs/UUIDs can be terribly inefficient. A partial solution is to convert them to BINARY(16) to shrink them. More details here: http://localhost/rjweb/mysql/doc.php/uuid ; (MySQL 8.0 has similar functions.)
It would probably be more efficient if you have a separate (optionally redundant) column for vehicle rather than needing to do
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle
Why JOIN baselog_offload_location? Three seems to be no reference to columns in that table. If there, be sure to qualify them so we know what is where. Preferably use short aliases.
The lack of an index on baselog_index_guid may be critical to performance.
Please provide EXPLAIN SELECT ... for the SELECT in your INSERT and for the original (slow) query.
SELECT MAX(LENGTH(original_location)) FROM .. -- to see if it really is too big to index. What version of MySQL are you using? The limit increased recently.
For the above item, we can talk about having a 'hash'.
"paging the query". I call it "chunking". See http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks . That talks about deleting, but it can be adapted to INSERT .. SELECT since you want to "chunk" the select. If you go with chunking, Javier's comment becomes moot. Your code would be chunking the selects, hence batching the inserts:
Loop:
INSERT .. SELECT .. -- of up to 1000 rows (see link)
End loop
I have 3 tables. The first one is called map_life, the second one is called scripts and the third one is called npc_data.
I'm running the following query to get all the properties from map_life while also getting the script column from scripts and the storage_cost column from npc_data if the ids match.
SELECT life.*
, script.script
, npc.storage_cost
FROM map_life life
LEFT
JOIN scripts script
ON script.objectid = life.lifeid
AND script.script_type = 'npc'
LEFT
JOIN npc_data npc
ON npc.npcid = life.lifeid
As you can see, map_life id is lifeid, while scripts id is objectid and npc_data id is npcid.
This query is taking about 5 seconds to execute, and I have no idea why. Here's the CREATE statements for all those 3 tables, maybe I'm missing something?
CREATE TABLE `mcdb83`.`map_life` (
`id` bigint(21) unsigned NOT NULL AUTO_INCREMENT,
`mapid` int(11) NOT NULL,
`life_type` enum('npc','mob','reactor') NOT NULL,
`lifeid` int(11) NOT NULL,
`life_name` varchar(50) DEFAULT NULL COMMENT 'For reactors, specifies a handle so scripts may interact with them; for NPC/mob, this field is useless',
`x_pos` smallint(6) NOT NULL DEFAULT '0',
`y_pos` smallint(6) NOT NULL DEFAULT '0',
`foothold` smallint(6) NOT NULL DEFAULT '0',
`min_click_pos` smallint(6) NOT NULL DEFAULT '0',
`max_click_pos` smallint(6) NOT NULL DEFAULT '0',
`respawn_time` int(11) NOT NULL DEFAULT '0',
`flags` set('faces_left') NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `lifetype` (`mapid`,`life_type`)
) ENGINE=InnoDB AUTO_INCREMENT=32122 DEFAULT CHARSET=latin1;
CREATE TABLE `mcdb83`.`scripts` (
`script_type` enum('npc','reactor','quest','item','map_enter','map_first_enter') NOT NULL,
`helper` tinyint(3) NOT NULL DEFAULT '-1' COMMENT 'Represents the quest state for quests, and the index of the script for NPCs (NPCs may have multiple scripts).',
`objectid` int(11) NOT NULL DEFAULT '0',
`script` varchar(30) NOT NULL DEFAULT '',
PRIMARY KEY (`script_type`,`helper`,`objectid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Lists all the scripts that belong to NPCs/reactors/etc. ';
CREATE TABLE `mcdb83`.`npc_data` (
`npcid` int(11) NOT NULL,
`storage_cost` int(11) NOT NULL DEFAULT '0',
`flags` set('maple_tv','is_guild_rank') NOT NULL DEFAULT '',
PRIMARY KEY (`npcid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
For this query:
SELECT l.*, s.script, npc.storage_cost
FROM map_life l LEFT JOIN
scripts s
ON s.objectid = l.lifeid AND
s.script_type = 'npc' LEFT JOIN
npc_data npc
ON npc.npcid = l.lifeid;
You want indexes on: scripts(object_id, script_type, script) and npc_data(npcid, storage_cost). The order of the columns in these indexes is important.
map_life.lifeid does not have any indexes defined, therefore the joins will result in full table scans. Define an index on map_life.lifeid field.
In scripts table the primary key is defined on the following fields in that order: script_type, helper, objectid. The join is done on objectid and there is a constant filter criterion on script_type. Because the order of the fields in the index is wrong, MySQL cannot use the primary key for this query. For this query the order of the fields in the index should b: objectid, script_type, helper.
The above will significantly speed up the joins. You may further increase the speed of the query if your indexes actually cover all fields that are in the query because in this case MySQL does not even have to touch the tables.
Consider adding an index with the following fields and order to the scripts table: object_id, script_type, script and npcid, storage_cost index to npc_data table. However, these indexes may slow down insert / update / delete statements, so do some performance testing before adding these indexes to production environment.
i have this table :
CREATE TABLE `messenger_contacts` (
`number` varchar(15) NOT NULL,
`has_telegram` tinyint(1) NOT NULL DEFAULT '0',
`geo_state` int(11) NOT NULL DEFAULT '0',
`geo_city` int(11) NOT NULL DEFAULT '0',
`geo_postal` int(11) NOT NULL DEFAULT '0',
`operator` tinyint(1) NOT NULL DEFAULT '0',
`type` tinyint(1) NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `messenger_contacts`
ADD PRIMARY KEY (`number`),
ADD KEY `geo_city` (`geo_city`),
ADD KEY `geo_postal` (`geo_postal`),
ADD KEY `type` (`type`),
ADD KEY `type1` (`operator`),
ADD KEY `has_telegram` (`has_telegram`),
ADD KEY `geo_state` (`geo_state`);
with about 11 million records.
A simple count select on this table takes about 30 to 60 seconds to complete witch seems very high.
select count(number) from messenger_contacts where geo_state=1
I am not a Database pro so beside setting indexes i don't know what else i can do to make the query faster?
UPDATE:
OK , i made some changes to column type and size:
CREATE TABLE IF NOT EXISTS `messenger_contacts` (
`number` bigint(13) unsigned NOT NULL,
`has_telegram` tinyint(1) NOT NULL DEFAULT '0' ,
`geo_state` int(2) NOT NULL DEFAULT '0',
`geo_city` int(4) NOT NULL DEFAULT '0',
`geo_postal` int(10) NOT NULL DEFAULT '0',
`operator` tinyint(1) NOT NULL DEFAULT '0' ,
`type` tinyint(1) NOT NULL DEFAULT '0' ,
PRIMARY KEY (`number`),
KEY `has_telegram` (`has_telegram`,`geo_state`),
KEY `geo_city` (`geo_city`),
KEY `geo_postal` (`geo_postal`),
KEY `type` (`type`),
KEY `type1` (`operator`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Now the query only takes 4 to 5 seconds with * and number
Tanks every one for your help, even the guy that gave me -1. this would be good enough for now considering that my server is a low end hardware and i will be caching the select count results.
Maybe
select count(geo_state) from messenger_contacts where geo_state=1
as it will give the same result but will not use number column from the clustered index?
If this does not help, I would try to change number column into INT type, which should reduce the index size, or try to increase amount of memory MySQL could use for caching indexes.
You did not change the datatypes. INT(11) == INT(2) == INT(100) -- each is a 4-byte signed integer. You probably want 1-byte unsigned TINYINT UNSIGNED or 2-byte SMALLINT UNSIGNED.
It is a waste to index "flags", which I assume type and has_telegram are. The optimizer will never use them because it will less efficient than simply doing a table scan.
The standard coding pattern is:
select count(*)
from messenger_contacts
where geo_state=1
unless you need to not count NULLs, which is what COUNT(geo_state) implies.
Once you have the index on geo_state (or an index starting with geo_state), the query will scan the index (which is a separate BTree structure) starting with the first occurrence of geo_state=1 until the last, counting as it goes. That is, it will touch 1.1 millions index entries. So, a few seconds is to be expected. Counting a 'rare' geo_state will run much faster.
The reason for 30-60 seconds versus 4-5 seconds is very likely to be caching. The former had to read stuff from disk; the latter did not. Run the query twice.
Using the geo_state index will be faster for that query than using the PRIMARY KEY unless there are caching differences.
INDEX(number,geo_state) is virtually useless for any of the SELECTs mentioned -- geo_state should be first. This is an example of a "covering" index for the select count(number)... case.
More on building indexes.
I've read heaps of posts here on stackoverflow, blog posts, tutorials and more, but I still fail to resolve a rather nasty performance issue with my MySQL db. Keep in mind that I'm a novice when it comes to large MySQL databases.
I have a table with approx. 11.000.000 rows (will increase to say 20.000.000 or more). Here's the layout:
CREATE TABLE `myTable` (
`intcol1` int(11) DEFAULT NULL,
`charcol1` char(25) DEFAULT NULL,
`intcol2` int(11) DEFAULT NULL,
`charcol2` char(50) DEFAULT NULL,
`charcol3` char(50) DEFAULT NULL,
`charcol4` char(50) DEFAULT NULL,
`intcol3` int(11) DEFAULT NULL,
`charcol5` char(50) DEFAULT NULL,
`intcol4` int(20) DEFAULT NULL,
`intcol5` int(20) DEFAULT NULL,
`intcol6` int(20) DEFAULT NULL,
`intcol7` int(11) DEFAULT NULL,
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
FULLTEXT KEY `idx` (`charcol2`,`charcol3`)
) ENGINE=MyISAM AUTO_INCREMENT=11665231 DEFAULT CHARSET=latin1;
A select statement like
SELECT * from myTable where charchol2='bogus' AND charcol3='bogus2';
takes 25 seconds or so to execute. That's too slow, and will be even slower as the table grows.
The table will not have any inserts or updates at all (so to speak), and will be primarily used for outputting searches on the char-columns.
I've tried to make indexing work (playing around with FULLTEXT, as you can see), but it seems that I'm missing something. Any takes on how to speed up the performance?
Please note: Im currently running MySQL on my Macbook Air (1.7 GHz i5, 4GB RAM). If this is the only answer to my performance issues, I'll move the database to something appropriate ;-)
EDIT: Explain table
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE myTable ALL NULL NULL NULL NULL 11596725 Using where
You don't need to create FULLTEXT indexes for such requests, where equality operator is used. Just create an index on every char field, that will be used in WHERE condition, and remove the fulltext index:
DROP INDEX idx;
ALTER TABLE myTable ADD INDEX charchol_idx (charchol2, charchol3);