MySQL - ordering by column with many different values in big table - mysql

Probably through poor database design, the following really simple query is taking ~1.5 minutes to run.
SELECT s.title, t.name AS team_name
FROM stories AS s
JOIN teams AS t ON s.team_id = t.id
WHERE s.pubdate >= "1970-01-01 00:00"
ORDER BY s.hits /* <-- here's the problem */
LIMIT 3 OFFSET 0
The problem is the stories table is fairly big, with ~1.5m rows, and there's a ton of unique values for hits (this column logs the hits to each story.)
Take out the order clause and it resolves almost instantly.
Question: what can I do to optimise for queries like this? Presumably I shouldn't apply an index to hits since direct no look-ups take place on that column.
[UPDATE]
SHOW CREATE TABLE for all tables concerned:
CREATE TABLE stories (
`id` varchar(11) NOT NULL,
`link` text NOT NULL,
`title` varchar(255) CHARACTER SET utf8 NOT NULL,
`description` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`pubdate` datetime NOT NULL,
`source_id` varchar(11) NOT NULL,
`team_id` varchar(11) NOT NULL,
`hits` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `Unique combo (title + date)` (`title`,`pubdate`),
KEY `team (FK)` (`team_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
CREATE TABLE teams (
`id` varchar(11) NOT NULL,
`is_live` enum('y') DEFAULT NULL,
`name` varchar(50) NOT NULL,
`short_name` varchar(12) DEFAULT NULL,
`server` varchar(11) DEFAULT NULL,
`url_token` varchar(255) NOT NULL,
`league` varchar(11) NOT NULL,
`away_game_id` varchar(255) DEFAULT NULL,
`digest_list_id` varchar(25) DEFAULT NULL,
`twitter_handle` varchar(255) DEFAULT NULL,
`no_official_news` enum('y') DEFAULT NULL,
`alt_names` varchar(255) DEFAULT NULL,
`no_use_nickname` enum('y') DEFAULT NULL,
`official_hashtag` varchar(30) DEFAULT NULL,
`merge_news_and_fans` enum('y') DEFAULT NULL,
`colour_1` varchar(6) NOT NULL,
`colour_2` varchar(6) DEFAULT NULL,
`colour_3` varchar(6) DEFAULT NULL,
`link_colour_modifier` float DEFAULT NULL,
`alt_link_colour_modifier` float DEFAULT NULL,
`title_shade` enum('dark','light') NOT NULL,
`shirt_style` enum('vert_stripes','horiz_stripes','vert_stripes_thin','horiz_stripes_thin','vert_split','horiz_split') DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `URL token` (`url_token`),
KEY `league (FK)` (`league`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

Consider removing the filter on pubdate if the user does not need it. It confuses the optimizer.
INDEX(hits, pubdate, title)
will probably help the query the most. It is "covering".
The reason why removing ORDER BY runs fast: Without it, it gives you any 3 rows. With it, and without a useful index, it needs to sort the 1.5M rows to discover the 3 with the least number of hits.
Perhaps you wanted ORDER BY s.hits DESC? -- to get those with the most hits.

Related

Long running Mysql Query on Indexes and sort by clause

I have a very long running MySql query. The query simply joins two tables which are very huge
bizevents - Nearly 34 Million rows
bizevents_actions - Nearly 17 million rows
Here is the query:
select
bizevent0_.id as id1_37_,
bizevent0_.json as json2_37_,
bizevent0_.account_id as account_3_37_,
bizevent0_.createdBy as createdB4_37_,
bizevent0_.createdOn as createdO5_37_,
bizevent0_.description as descript6_37_,
bizevent0_.iconCss as iconCss7_37_,
bizevent0_.modifiedBy as modified8_37_,
bizevent0_.modifiedOn as modified9_37_,
bizevent0_.name as name10_37_,
bizevent0_.version as version11_37_,
bizevent0_.fired as fired12_37_,
bizevent0_.preCreateFired as preCrea13_37_,
bizevent0_.entityRefClazz as entityR14_37_,
bizevent0_.entityRefIdAsStr as entityR15_37_,
bizevent0_.entityRefIdType as entityR16_37_,
bizevent0_.entityRefName as entityR17_37_,
bizevent0_.entityRefType as entityR18_37_,
bizevent0_.entityRefVersion as entityR19_37_
from
BizEvent bizevent0_
left outer join BizEvent_actions actions1_ on
bizevent0_.id = actions1_.BizEvent_id
where
bizevent0_.createdOn >= '1969-12-31 19:00:01.0'
and (actions1_.action <> 'SoftLock'
and actions1_.targetRefClazz = 'com.biznuvo.core.orm.domain.org.EmployeeGroup'
and actions1_.targetRefIdAsStr = '1'
or actions1_.action <> 'SoftLock'
and actions1_.objectRefClazz = 'com.biznuvo.core.orm.domain.org.EmployeeGroup'
and actions1_.objectRefIdAsStr = '1')
order by
bizevent0_.createdOn;
Below are the table definitions -- As you see i have defined the indexes well enough on these two tables on all the search columns plus the sort column. But still my queries are running for very very long time. Appreciate any more ideas either with respective indexing.
-- bizevent definition
CREATE TABLE `bizevent` (
`id` bigint(20) NOT NULL,
`json` longtext,
`account_id` int(11) DEFAULT NULL,
`createdBy` varchar(50) NOT NULL,
`createdon` datetime(3) DEFAULT NULL,
`description` varchar(255) DEFAULT NULL,
`iconCss` varchar(50) DEFAULT NULL,
`modifiedBy` varchar(50) NOT NULL,
`modifiedon` datetime(3) DEFAULT NULL,
`name` varchar(255) NOT NULL,
`version` int(11) NOT NULL,
`fired` bit(1) NOT NULL,
`preCreateFired` bit(1) NOT NULL,
`entityRefClazz` varchar(255) DEFAULT NULL,
`entityRefIdAsStr` varchar(255) DEFAULT NULL,
`entityRefIdType` varchar(25) DEFAULT NULL,
`entityRefName` varchar(255) DEFAULT NULL,
`entityRefType` varchar(50) DEFAULT NULL,
`entityRefVersion` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `IDXk9kxuuprilygwfwddr67xt1pw` (`createdon`),
KEY `IDXsf3ufmeg5t9ok7qkypppuey7y` (`entityRefIdAsStr`),
KEY `IDX5bxv4g72wxmjqshb770lvjcto` (`entityRefClazz`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-- bizevent_actions definition
CREATE TABLE `bizevent_actions` (
`BizEvent_id` bigint(20) NOT NULL,
`action` varchar(255) DEFAULT NULL,
`objectBizType` varchar(255) DEFAULT NULL,
`objectName` varchar(255) DEFAULT NULL,
`objectRefClazz` varchar(255) DEFAULT NULL,
`objectRefIdAsStr` varchar(255) DEFAULT NULL,
`objectRefIdType` int(11) DEFAULT NULL,
`objectRefVersion` int(11) DEFAULT NULL,
`targetBizType` varchar(255) DEFAULT NULL,
`targetName` varchar(255) DEFAULT NULL,
`targetRefClazz` varchar(255) DEFAULT NULL,
`targetRefIdAsStr` varchar(255) DEFAULT NULL,
`targetRefIdType` int(11) DEFAULT NULL,
`targetRefVersion` int(11) DEFAULT NULL,
`embedJson` longtext,
`actions_ORDER` int(11) NOT NULL,
PRIMARY KEY (`BizEvent_id`,`actions_ORDER`),
KEY `IDXa21hhagjogn3lar1bn5obl48gll` (`action`),
KEY `IDX7agsatk8u8qvtj37vhotja0ce77` (`targetRefClazz`),
KEY `IDXa7tktl678kqu3tk8mmkt1mo8lbo` (`targetRefIdAsStr`),
KEY `IDXa22eevu7m820jeb2uekkt42pqeu` (`objectRefClazz`),
KEY `IDXa33ba772tpkl9ig8ptkfhk18ig6` (`objectRefIdAsStr`),
CONSTRAINT `FKr9qjs61id11n48tdn1cdp3wot` FOREIGN KEY (`BizEvent_id`) REFERENCES `bizevent` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;>
By the way we are using Amazon RDS 5.7.33 MySql version. 16 GB RAM and 4 vCPU.
I also did a Explain Extended on the query and below is what it shows. Appreciate any help.
Initially the search of the bizevent_actions didn;t have the indexes defined. I have defined the indexes for them and tried the query but of no use.
One technique that worked for me in a similar situation was abandoning the idea of JOIN completely and switching to queries by PK. More detailed: find out which table in join would give less rows on average if you use only that table and related filter to query; get the primary keys from that table and then query the other one using WHERE pk IN ().
In your case one example would be:
SELECT
bizevent0_.id as id1_37_,
bizevent0_.json as json2_37_,
bizevent0_.account_id as account_3_37_,
...
FROM BizEvent bizevent0_
WHERE
bizevent0_.createdOn >= '1969-12-31 19:00:01.0'
AND bizevent0_.id IN (
SELECT BizEvent_id
FROM BizEvent_actions actions1_
WHERE
actions1_.action <> 'SoftLock'
and actions1_.targetRefClazz = 'com.biznuvo.core.orm.domain.org.EmployeeGroup'
and actions1_.targetRefIdAsStr = '1'
or actions1_.action <> 'SoftLock'
and actions1_.objectRefClazz = 'com.biznuvo.core.orm.domain.org.EmployeeGroup'
and actions1_.objectRefIdAsStr = '1')
ORDER BY
bizevent0_.createdOn;
This assumes that you're not actually willing to select 33+ Mio rows from BizEvent though - your code with LEFT OUTER JOIN would have done exactly this.

Mysql JOIN query apparently slow

I have 2 tables. The first, called stazioni, where I store live weather data from some weather station, and the second called archivio2, where are stored archived day data. The two tables have in common the ID station data (ID on stazioni, IDStazione on archvio2).
stazioni (1,743 rows)
CREATE TABLE `stazioni` (
`ID` int(10) NOT NULL,
`user` varchar(100) NOT NULL,
`nome` varchar(100) NOT NULL,
`email` varchar(50) NOT NULL,
`localita` varchar(100) NOT NULL,
`provincia` varchar(50) NOT NULL,
`regione` varchar(50) NOT NULL,
`altitudine` int(10) NOT NULL,
`stazione` varchar(100) NOT NULL,
`schermo` varchar(50) NOT NULL,
`installazione` varchar(50) NOT NULL,
`ubicazione` varchar(50) NOT NULL,
`immagine` varchar(100) NOT NULL,
`lat` double NOT NULL,
`longi` double NOT NULL,
`file` varchar(255) NOT NULL,
`url` varchar(255) NOT NULL,
`temperatura` decimal(10,1) DEFAULT NULL,
`umidita` decimal(10,1) DEFAULT NULL,
`pressione` decimal(10,1) DEFAULT NULL,
`vento` decimal(10,1) DEFAULT NULL,
`vento_direzione` decimal(10,1) DEFAULT NULL,
`raffica` decimal(10,1) DEFAULT NULL,
`pioggia` decimal(10,1) DEFAULT NULL,
`rate` decimal(10,1) DEFAULT NULL,
`minima` decimal(10,1) DEFAULT NULL,
`massima` decimal(10,1) DEFAULT NULL,
`orario` varchar(16) DEFAULT NULL,
`online` int(1) NOT NULL DEFAULT '0',
`tipo` int(1) NOT NULL DEFAULT '0',
`webcam` varchar(255) DEFAULT NULL,
`webcam2` varchar(255) DEFAULT NULL,
`condizioni` varchar(255) DEFAULT NULL,
`Data2` datetime DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
archivio2 (2,127,347 rows)
CREATE TABLE `archivio2` (
`ID` int(10) NOT NULL,
`IDStazione` int(4) NOT NULL DEFAULT '0',
`localita` varchar(100) NOT NULL,
`temp_media` decimal(10,1) DEFAULT NULL,
`temp_minima` decimal(10,1) DEFAULT NULL,
`temp_massima` decimal(10,1) DEFAULT NULL,
`pioggia` decimal(10,1) DEFAULT NULL,
`pressione` decimal(10,1) DEFAULT NULL,
`vento` decimal(10,1) DEFAULT NULL,
`raffica` decimal(10,1) DEFAULT NULL,
`records` int(10) DEFAULT NULL,
`Data2` datetime DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
The indexes that I set
-- Indexes for table `archivio2`
--
ALTER TABLE `archivio2`
ADD PRIMARY KEY (`ID`),
ADD KEY `IDStazione` (`IDStazione`),
ADD KEY `Data2` (`Data2`);
-- Indexes for table `stazioni`
--
ALTER TABLE `stazioni`
ADD PRIMARY KEY (`ID`),
ADD KEY `Tipo` (`Tipo`);
ALTER TABLE `stazioni` ADD FULLTEXT KEY `localita` (`localita`);
On a map, I call by a calendar the date to search data on archive2 table, by this INNER JOIN query (I put an example date):
SELECT *, c.pioggia AS rain, c.raffica AS raff, c.vento AS wind, c.pressione AS press
FROM stazioni as o
INNER JOIN archivio2 as c ON o.ID = c.IDStazione
WHERE c.Data2 LIKE '2019-01-01%'
All works fine, but the time needed to show result are really slow (4/5 seconds), even if the query execution time seems to be ok (about 0.5s/1.0s).
I tried to execute the query on PHPMyadmin, and the results are the same. Execution time quickly, but time to show result extremely slow.
EXPLAIN query result
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE o ALL PRIMARY,ID NULL NULL NULL 1743 NULL
1 SIMPLE c ref IDStazione,Data2 IDStazione 4 sccavzuq_rete.o.ID 1141 Using where
UPDATE: the query goes fine if I remove the index from 'IDStazione'. But in this way I lost all advantages and speed on other queries... why only that query become slow if I put index on that field?
In your WHERE clause
WHERE c.Data2 LIKE '2019-01-01%'
the value of Data2 must be casted to a string. No index can be used for that condition.
Change it to
WHERE c.Data2 >= '2019-01-01' AND c.Data2 < '2019-01-01' + INTERVAL 1 DAY
This way the engine should be able to use the index on (Data2).
Now check the EXPLAIN result. I would expect, that the table order is swapped and the key column will show Data2 (for c) and ID (for o).
(Fixing the DATE is the main performance solution; here is a less critical issue.)
The tables are much bigger than necessary. Size impacts disk space and, to some extent, speed.
You have 1743 stations, yet the datatype is a 32-bit (4-byte) number (INT). SMALLINT UNSIGNED would allow for 64K stations and use only 2 bytes.
Does it get really, really, hot there? Like 999999999.9 degrees? DECIMAL(10.1) takes 5 bytes; DECIMAL(4,1) takes only 3 and allows up to 999.9 degrees. DECIMAL(3,1) has a max of 99.9 and takes only 2 bytes.
What is "localita varchar(100)" doing in the big table? Seems like you could JOIN to the stations table when you need it? Removing that might cut the table size in half.

How to optimize mysql query even it already used index

query is simple, as below:
select count(1) from ec_account a join ec_card b on a.id = b.AccountId
there are 2.5 million rows in either ec_account and ec_card.(InnoDB)
here is the execution plan:
execution plan
as you see,
it already added index and used it, but the query still costed almost 60 seconds, is there any way could optimize it except changing database(mariadb has no such choke point as far as i know).
here is table DDL,ec_ccount:
CREATE TABLE `ec_account` (
`Id` varchar(64) NOT NULL,
`AccountType` varchar(32) NOT NULL,
`Name` varchar(32) NOT NULL,
`Status` tinyint(3) unsigned NOT NULL,
`IDCardType` varchar(32) DEFAULT NULL,
`IDCardNo` varchar(64) DEFAULT NULL,
`Password` varchar(256) DEFAULT NULL,
`PasswordHalt` varchar(128) DEFAULT NULL,
`Sex` varchar(8) DEFAULT NULL,
`BirthDay` datetime NOT NULL,
`Mobile` varchar(16) DEFAULT NULL,
`Address` varchar(64) DEFAULT NULL,
`Linkman` varchar(32) DEFAULT NULL,
`LinkmanRelation` varchar(16) DEFAULT NULL,
`LinkmanTel` varchar(16) DEFAULT NULL,
`Remark` varchar(128) DEFAULT NULL,
`Nationality` varchar(32) DEFAULT NULL,
`Nation` varchar(32) DEFAULT NULL,
`MaritalStatus` varchar(8) DEFAULT NULL,
`NativePlace` varchar(64) DEFAULT NULL,
`Occupation` varchar(32) DEFAULT NULL,
`BloodType` varchar(8) DEFAULT NULL,
`Education` varchar(8) DEFAULT NULL,
`LinkmanAddress` varchar(64) DEFAULT NULL,
`HomeAddress` varchar(128) DEFAULT NULL,
`Email` varchar(64) DEFAULT NULL,
`CompanyName` varchar(64) DEFAULT NULL,
`CompanyAddress` varchar(128) DEFAULT NULL,
`CompanyTel` varchar(16) DEFAULT NULL,
`Creator` char(36) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
`CreateTime` datetime NOT NULL,
`LastModifier` char(36) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
`LastModifyTime` datetime DEFAULT NULL,
`Avatar` longblob,
PRIMARY KEY (`Id`),
KEY `IX_Name` (`Name`) USING HASH,
KEY `Idx_IDCard_Account` (`IDCardType`,`IDCardNo`) USING HASH,
KEY `Idx_Mobile` (`Mobile`) USING HASH,
KEY `Idx_CreateTime` (`CreateTime`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
and ec_card :
CREATE TABLE `ec_card` (
`Id` char(36) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '',
`AccountId` varchar(64) NOT NULL,
`CardType` varchar(32) NOT NULL,
`CardNo` varchar(32) NOT NULL,
`Status` tinyint(3) unsigned NOT NULL,
`IsPasswordAuth` tinyint(1) NOT NULL,
PRIMARY KEY (`Id`),
UNIQUE KEY `Idx_Unique_AccountId_CardType` (`AccountId`,`CardType`) USING HASH,
UNIQUE KEY `Idx_Unique_CardType_CardNo` (`CardType`,`CardNo`) USING HASH,
KEY `Idx_Uniques_AccountId` (`AccountId`) USING BTREE,
CONSTRAINT `FK_ec_card_ec_account_AccountId` FOREIGN KEY (`AccountId`) REFERENCES `ec_account` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Not without fundamentally changing the query.
There are no conditions on your query! It selects all 2.5 million rows from ec_card, as well as every matching row from ec_account. Reading all this data from disk and sending it over the network is the bottleneck; there is no way to change that without changing what the query does.
Here is a workaround for you. I think it would run much faster, and get the same result.
Calculate the total count of ec_account:
SELECT count(1) AS total_count FROM ec_account;
Calculate the amount of records those existed in ec_account but not existed in ec_card:
SELECT count(1) AS missing_count
FROM ec_account a LEFT JOIN ec_card b on a.id = b.AccountId
WHERE b.AccountId IS NULL;
Matched count = total_count - missing_count
The core problem here is that you combined two large table together, it requires a lot of memory and it apparently needs a lot of time to finish.
try it using correlated subquery. This might help:
select count(1) from ec_account a where exists (select * from ec_card b
where b.AccountId=a.id)
Also, other than indexing following strategies generally help:
- Denormalization
- Caching results
- Using a NoSQL database

Data Structure causing impossible joins

Tables:
nodes
data_texts
data_profiles
data_locations
data_profiles
data_media
data_products
data_metas
categories
tags
categories_nodes
tags_nodes
This question is a generalized question and is on the back of another question
Explanation:
Each of the "data" tables has a node_id that refers back to the id of the nodes table (hasMany/belongsTo association).
A "Node" can be anything - a TV Show, a Movie, a Person, an Article...etc (all generated via a CMS, so the user can control what type of "Nodes" they want).
When pulling data, I want to be able to query against certain fields. For example if they do a search, I want to be able to pull nodes that have data_texts.title = '%george%' or order by the datetime field in data_locations.
The problem is, when I do a join on all seven data tables (or more), the query has to hit so many combined rows that it just times out (even with a nearly empty database.... total 200 rows across the entire database).
I realize I can determine IF I need a join depending on what I'm doing - but even with five or six joins (once the database gets to 10k+ records), it's going to be horribly slow, if it works at all. Per this question, the query I'm using just doing a join on these tables times out completely.
Each node can have multiple rows of each data type (for multi-language reasons among others).
I'm completely defeated - I'm at the point where I think I need to restructure the entire thing, but don't have the time for that. I've thought about combining all into one table, but aren't sure how....etc
nodes
CREATE TABLE `nodes` (
`id` CHAR(36) NOT NULL,
`name` VARCHAR(100) NOT NULL,
`slug` VARCHAR(100) NOT NULL,
`node_type_id` CHAR(36) NOT NULL,
`site_id` CHAR(36) NOT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `nodeTypeId` (`node_type_id`),
INDEX `slug` (`slug`),
INDEX `nodeId` (`id`)
)
COLLATE='latin1_swedish_ci'
ENGINE=MyISAM;
data_texts:
CREATE TABLE `data_texts` (
`id` CHAR(36) NOT NULL,
`title` VARCHAR(250) NULL DEFAULT NULL,
`subtitle` VARCHAR(500) NULL DEFAULT NULL,
`content` LONGTEXT NULL,
`byline` VARCHAR(250) NULL DEFAULT NULL,
`language_id` CHAR(36) NULL DEFAULT NULL,
`foreign_key` CHAR(36) NULL DEFAULT NULL,
`model` VARCHAR(40) NULL DEFAULT NULL,
`node_id` CHAR(36) NULL DEFAULT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `nodeId` (`node_id`),
INDEX `languageId_nodeId` (`language_id`, `node_id`),
INDEX `foreignKey_model` (`foreign_key`, `model`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
data_profiles
CREATE TABLE `data_profiles` (
`id` CHAR(36) NOT NULL,
`name` VARCHAR(80) NULL DEFAULT NULL,
`email_personal` VARCHAR(100) NULL DEFAULT NULL,
`email_business` VARCHAR(100) NULL DEFAULT NULL,
`email_other` VARCHAR(100) NULL DEFAULT NULL,
`title` VARCHAR(100) NULL DEFAULT NULL,
`description` LONGTEXT NULL,
`prefix` VARCHAR(40) NULL DEFAULT NULL,
`phone_home` VARCHAR(40) NULL DEFAULT NULL,
`phone_business` VARCHAR(40) NULL DEFAULT NULL,
`phone_mobile` VARCHAR(40) NULL DEFAULT NULL,
`phone_other` VARCHAR(40) NULL DEFAULT NULL,
`foreign_key` CHAR(36) NULL DEFAULT NULL,
`model` VARCHAR(40) NULL DEFAULT NULL,
`node_id` CHAR(36) NULL DEFAULT NULL,
`language_id` CHAR(36) NULL DEFAULT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
`user_id` CHAR(36) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `nodeId` (`node_id`),
INDEX `languageId_nodeId` (`node_id`, `language_id`),
INDEX `foreignKey_model` (`foreign_key`, `model`)
)
COLLATE='latin1_swedish_ci'
ENGINE=MyISAM;
categories
CREATE TABLE `categories` (
`id` CHAR(36) NOT NULL,
`name` VARCHAR(100) NOT NULL,
`node_type_id` CHAR(36) NOT NULL,
`site_id` CHAR(36) NOT NULL,
`slug` VARCHAR(100) NOT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `nodeTypeId` (`node_type_id`),
INDEX `slug` (`slug`)
)
COMMENT='Used to categorize nodes'
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
categories_nodes
CREATE TABLE `categories_nodes` (
`id` CHAR(36) NOT NULL,
`category_id` CHAR(36) NOT NULL,
`node_id` CHAR(36) NOT NULL,
PRIMARY KEY (`id`),
INDEX `categoryId_nodeId` (`category_id`, `node_id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
node_tags
CREATE TABLE `node_tags` (
`id` CHAR(36) NOT NULL,
`name` VARCHAR(40) NOT NULL,
`site_id` CHAR(36) NOT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `siteId` (`site_id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
nodes_node_tags
CREATE TABLE `nodes_node_tags` (
`id` CHAR(36) NOT NULL,
`node_id` CHAR(36) NOT NULL,
`node_tag_id` CHAR(36) NOT NULL,
PRIMARY KEY (`id`),
INDEX `node_id_node_tag_id` (`node_id`, `node_tag_id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
MySQL:
SELECT `Node`.`id`, `Node`.`name`, `Node`.`slug`, `Node`.`node_type_id`, `Node`.`site_id`, `Node`.`created`, `Node`.`modified`
FROM `mysite`.`nodes` AS `Node`
LEFT JOIN `mysite`.`data_date_times` AS `DataDateTime` ON (`DataDateTime`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_locations` AS `DataLocation` ON (`DataLocation`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_media` AS `DataMedia` ON (`DataMedia`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_metas` AS `DataMeta` ON (`DataMeta`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_profiles` AS `DataProfile` ON (`DataProfile`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_products` AS `DataProduct` ON (`DataProduct`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_texts` AS `DataText` ON (`DataText`.`node_id` = `Node`.`id`)
WHERE 1=1
GROUP BY `Node`.`id`
Firstly, try InnoDB, not MyISAM.
Secondly, remove the group by, see how well it runs then, and how many rows are involved. Shouldn't be that many, but it's interesting.
You don't need the 'nodeId' index on node (as you already have it as a primary key). Again, shouldn't make any difference.
The where clause is irrelevant. You can remove it with no effect one way or another.
Thirdly, well, something is seriously broken.
Have a quick look on how to start profiling (e.g. http://dev.mysql.com/doc/refman/5.0/en/show-profile.html) , and run a profile command to see where all the time is going. Post it here if it doesn't immediately show that something is broken.
I'm unfortunately not in a position where I can do any tests right now. I'll just throw out some ideas. I might be able to do some tests later.
Be suspicious of different collations.
Some of your ids are useless. For example, you should drop the column categories_nodes.id, and put a primary key constraint on {category_id, node_id} instead.
Be suspicious of any design that requires joining all the tables at run time. There are better ways.
Use innodb and foreign key constraints.

mysql performance issue

I have table named as contacts which has nearly 1.2 million records we use
MyIsam engine whenever we query this table mysql hangs down so now we are trying our hands with Innodb engine so that if it slows down, but it will not hang up for others
So we want make fast with Myisam we tried many indexes on this table but it goes down and hangs the system
What should be done to make it more faster and it should not hang up the system
This is the table:
CREATE TABLE `contacts` (
`id` varchar(36) NOT NULL,
`deleted` tinyint(1) NOT NULL default '0',
`date_entered` datetime NOT NULL default '0000-00-00 00:00:00',
`date_modified` datetime NOT NULL default '0000-00-00 00:00:00',
`modified_user_id` varchar(36) default NULL,
`assigned_user_id` varchar(36) default NULL,
`created_by` varchar(36) default NULL,
`team_id` varchar(36) default NULL,
`salutation` varchar(5) default NULL,
`first_name` varchar(100) default '',
`last_name` varchar(100) default '',
`username` varchar(25) default '',
`lead_source` varchar(100) default NULL,
`title` varchar(50) default NULL,
`department` varchar(100) default NULL,
`reports_to_id` varchar(36) default NULL,
`birthdate` date default NULL,
`do_not_call` char(3) default '0',
`phone_home` varchar(25) default NULL,
`phone_mobile` varchar(25) default NULL,
`phone_work` varchar(25) default '',
`phone_other` varchar(25) default NULL,
`phone_fax` varchar(25) default '',
`email1` varchar(100) default '',
`email2` varchar(100) default NULL,
`assistant` varchar(75) default NULL,
`assistant_phone` varchar(25) default NULL,
`email_opt_out` char(3) default 'off',
`primary_address_street` varchar(150) default NULL,
`primary_address_city` varchar(100) default NULL,
`primary_address_state` varchar(100) default NULL,
`primary_address_postalcode` varchar(20) default NULL,
`primary_address_country` varchar(100) default NULL,
`alt_address_street` varchar(150) default NULL,
`alt_address_city` varchar(100) default NULL,
`alt_address_state` varchar(100) default NULL,
`alt_address_postalcode` varchar(20) default NULL,
`alt_address_country` varchar(100) default NULL,
`description` text,
`portal_name` varchar(255) default NULL,
`portal_active` tinyint(1) NOT NULL default '0',
`portal_app` varchar(255) default NULL,
`salesforceid` varchar(36) default NULL,
`phone_direct` varchar(25) default NULL,
`invalid_email` tinyint(1) default '0',
`parent_is_lead` char(3) default 'no',
`advisory_board_member` varchar(25) default NULL,
`direct_marketing` varchar(25) default NULL,
`efx_id` varchar(36) default NULL,
`fax_opt_out` char(3) default 'off',
`ppc_keyword` varchar(50) default NULL,
`status` varchar(25) default NULL,
`web_form` varchar(50) default NULL,
`efx_export_date` datetime default NULL,
`bmtn` varchar(225) default '',
`employee_location` varchar(50) default NULL,
`pronunciation` varchar(250) default NULL,
`duplicate_of` varchar(36) default NULL,
`job_category` varchar(50) default NULL,
`last_ska_upload_key` varchar(50) default NULL,
`persid` varchar(36) default NULL,
`last_web_upload_key` varchar(50) default NULL,
`last_webinar_upload_key` varchar(50) default NULL,
`primary_address_latitude` float default NULL,
`primary_address_longitude` float default NULL,
`first_name_soundex` varchar(30) default NULL,
`last_name_soundex` varchar(30) default NULL,
`primary_address_street_soundex` varchar(30) default NULL,
`campaign_id` varchar(36) default NULL,
`portal_password` varchar(32) default NULL,
`pss_branch` varchar(40) default NULL,
`pss_id` int(12) default NULL,
`source_detail` varchar(100) default NULL,
`source` varchar(100) default NULL,
`pss_region` varchar(30) default NULL,
`source_added` datetime default NULL,
`terminated_user` char(3) default 'off',
`invite_opt_out` char(3) default 'off',
`newsletter_opt_out` char(3) default 'off',
`stream_opt_out` char(3) default 'off',
PRIMARY KEY (`id`),
KEY `idx_contacts_del_last` (`deleted`,`last_name`),
KEY `idx_cont_del_reports` (`deleted`,`reports_to_id`,`last_name`),
KEY `idx_contact_del_team` (`deleted`,`team_id`),
KEY `idx_contact_salesforceid` (`salesforceid`),
KEY `idx_contacts_username` (`username`),
KEY `idx_email_opt_out` (`email_opt_out`),
KEY `idx_primary_address_street` (`primary_address_street`),
KEY `idx_primary_address_city` (`primary_address_city`),
KEY `idx_primary_address_state` (`primary_address_state`),
KEY `idx_primary_address_postalcode` (`primary_address_postalcode`),
KEY `idx_primary_address_country` (`primary_address_country`),
KEY `idx_modified_user_id` (`modified_user_id`),
KEY `idx_assigned_user_id` (`assigned_user_id`),
KEY `idx_created_by` (`created_by`),
KEY `idx_team_id` (`team_id`),
KEY `idx_reports_to_id` (`reports_to_id`),
KEY `idx_contacts_efx_id` (`efx_id`),
KEY `idx_contacts_title1` (`title`,`deleted`),
KEY `idx_contacts_email1` (`email1`),
KEY `idx_contacts_email2` (`email2`),
KEY `idx_contacts_job_category` (`job_category`),
KEY `idx_contacts_first_name_sdx` (`first_name_soundex`),
KEY `idx_contacts_primary_street_sdx` (`primary_address_street_soundex`),
KEY `idx_contacts_last_name_sdx` (`last_name_soundex`),
KEY `idx_contacts_portal_name` (`portal_name`),
KEY `idx_contacts_portal_active` (`portal_active`),
KEY `idx_contacts_del_last_first` (`deleted`,`last_name`,`first_name`),
KEY `idx_contacts_del_first` (`deleted`,`first_name`),
KEY `idx_pss_id` (`pss_id`),
KEY `idx_phone_work_last_name_first_name_deleted` (`phone_work`,`last_name`,`first_name`,`deleted`),
KEY `idx_phone_work_last_name_first_name_deleted_sdx` (`phone_work`,`last_name_soundex`,`first_name_soundex`,`deleted`),
KEY `idx_email1_last_name_first_name_deleted` (`email1`,`last_name`,`first_name`,`deleted`),
KEY `idx_email1_last_name_first_name_deleted_sdx` (`email1`,`last_name_soundex`,`first_name_soundex`,`deleted`),
KEY `idx_phone_fax_last_name_first_name_deleted` (`phone_fax`,`last_name`,`first_name`,`deleted`),
KEY `idx_phone_fax_last_name_first_name_deleted_sdx` (`phone_fax`,`last_name_soundex`,`first_name_soundex`,`deleted`),
KEY `idx_phone_work_last_name_deleted` (`phone_work`,`last_name`,`deleted`),
KEY `idx_phone_work_last_name_deleted_sdx` (`phone_work`,`last_name_soundex`,`deleted`),
KEY `idx_email1_last_name_deleted` (`email1`,`last_name`,`deleted`),
KEY `idx_email1_last_name_deleted_sdx` (`email1`,`last_name_soundex`,`deleted`),
KEY `idx_phone_fax_last_name_deleted` (`phone_fax`,`last_name`,`deleted`),
KEY `idx_phone_fax_last_name_deleted_sdx` (`phone_fax`,`last_name_soundex`,`deleted`),
KEY `idx_email1_first_name_deleted` (`email1`,`first_name`,`deleted`),
KEY `idx_email1_first_name_deleted_sdx` (`email1`,`first_name_soundex`,`deleted`),
KEY `idx_phone_fax_first_name_deleted` (`phone_fax`,`first_name`,`deleted`),
KEY `idx_phone_fax_first_name_deleted_sdx` (`phone_fax`,`first_name_soundex`,`deleted`),
KEY `idx_email1_deleted` (`email1`,`deleted`),
KEY `idx_last_name_first_name_deleted_sdx` (`last_name_soundex`,`first_name_soundex`,`deleted`),
KEY `idx_phone_mobile_deleted` (`phone_mobile`,`deleted`,`id`),
KEY `idx_first_name_bmtn` (`first_name`,`bmtn`),
KEY `idx_first_name_bmtn_email1` (`first_name`,`bmtn`,`email1`),
KEY `idx_bmtn_email1` (`bmtn`,`email1`),
KEY `idx_deleted` (`deleted`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
-
SELECT acc.id, acc.name, con_reports_to.first_name, con_reports_to.last_name
from contacts
left join accounts_contacts a_c on a_c.contact_id = '9802f40d-78bb-8dd4-dfaa-43f1064ccd5e' and a_c.deleted=0
left join accounts acc on a_c.account_id = acc.id and acc.deleted=0
left join contacts con_reports_to on con_reports_to.id = contacts.reports_to_id
where contacts.id = '9802f40d-78bb-8dd4-dfaa-43f1064ccd5e'
I suspect the assertion "whenever we query this table mysql hangs down" is an overbid -- for example, with MyISAM, SELECT COUNT(*) FROM TheTable should be very fast, essentially "no matter what". Sure, some queries will be slow -- especially if the table is not indexed properly for the queries, or if MySQL's alleged optimizer is picking the wrong strategy (but you could give it hints).
Why don't you show us the CREATE TABLE (including indices), a couple of the queries that take too long, ideally a precise measure of how long they take, and the output of EXPLAIN SELECT (&c) for those couple queries -- I bet we could really be of some help then!
Edit: the CREATE TABLE essentially shows that the table is just too "broad" -- far too many columns -- to expect decent performance (even though no queries were shown). The schema needs a redesign, breaking up chunks of this huge monolithic table (e.g., the address-related information) into other auxiliary tables. Exactly how to best do it depends entirely on the queries that are most important to optimize, so, not knowing the queries in question, I'm not even going to attempt the task.
Edit again: so the query has been posted and uses other tables, accounts and account_contacts, as well as the hugely broad contacts one described; the query as posted (trying to make sense of it by formatting &c) is:
SELECT acc.id, acc.name, con_reports_to.first_name, con_reports_to.last_name
FROM contacts
LEFT JOIN accounts_contacts a_c
ON a_c.contact_id = '9802f40d-78bb-8dd4-dfaa-43f1064ccd5e' AND
a_c.deleted=0
LEFT JOIN accounts acc
ON a_c.account_id = acc.id AND
acc.deleted=0
LEFT JOIN contacts con_reports_to
ON con_reports_to.id = contacts.reports_to_id
WHERE contacts.id = '9802f40d-78bb-8dd4-dfaa-43f1064ccd5e'
Why the LEFT JOINs here instead of normal INNER joins? Is it possible in each case that there's no corresponding row on the right-hand-side table? For example, if there's no line in a_c with the given values for contact_id and deleted, then all the fields of a_c in the first LEFT JOIN will be NULL, so there can be no correspondence for acc either: is it important to emit NULL, NULL as the first two columns in this case? Moreover the JOIN conditions for a_c an acc make no reference at all to contacts, so this will be a cartesian product: every line selected from acc, if any, will pair up with every line selected from con_reports_to. So the a_c/acc query could be entirely separated from the one on contacts and con_reports, presumably ligthtening the query considerably (the two logically separate results could of course easily be put together again in the client).
What does EXPLAIN SELECT say for this complex query and what does it say for the two lighter-weight separate ones I'm suggesting? What indices are on the accounts and account_contact tables?
horizontal splitting? though i guess 1.2 million records are not that much to introduce horizontal splitting.. try to locate the bottom neck... also the problem may lie with your hardware as well for example harddisk almost full etc.