MYSQL INNER JOIN is slow with index - mysql

this is my simple inner join:
SELECT
SUM(ASSNZ.assenzeDidattiche) AS TotaleAssenze,
SUM(ASSNZ.ore) AS totale_parziale,
FLOOR(((SUM(ASSNZ.assenzeDidattiche) / SUM(ASSNZ.ore)) * 100)) AS andamento,
MAX(ASSNZ.dataLezione) AS ultima_lezione,
ASSNZ.idServizio,
ASSNZ.idUtente
FROM
ciac_corsi_assenze AS ASSNZ
INNER JOIN
ciac_serviziAcquistati_ITA AS ACQ
ON ACQ.idContatto = ASSNZ.idUtente
AND ACQ.idServizio = ASSNZ.idServizio
AND ACQ.stato_allievo <> 'ritirato'
GROUP BY
ASSNZ.idServizio,
ASSNZ.idUtente
table "ASSNZ" has 213886 rows with index "idUtente", "idServizio"
table "ACQ" has 8950 rows with index "idContatto", "idServizio"
ASSNZ table:
CREATE TABLE `ciac_corsi_assenze` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`idUtente` int(11) DEFAULT NULL,
`idServizio` int(11) DEFAULT NULL,
`idCorso` int(11) DEFAULT NULL,
`idCalendario` int(11) DEFAULT NULL,
`modalita` varchar(255) DEFAULT NULL,
`ore` int(11) DEFAULT NULL,
`assenzeDidattiche` float DEFAULT NULL,
`assenzeAmministrative` float DEFAULT NULL,
`dataLezione` date DEFAULT NULL,
`ora_inizio` varchar(8) DEFAULT NULL,
`ora_fine` varchar(8) DEFAULT NULL,
`dataFineStage` date DEFAULT NULL,
`giustificata` varchar(128) DEFAULT NULL,
`motivazione` longtext,
`grouped` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idUtente` (`idUtente`) USING BTREE,
KEY `idServizio` (`idServizio`) USING BTREE,
KEY `dataLezione` (`dataLezione`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=574582 DEFAULT CHARSET=utf8;
ACQ table:
CREATE TABLE `ciac_serviziacquistati_ita` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`idServizio` int(11) NOT NULL,
`idContatto` int(11) NOT NULL,
`idAzienda` int(11) NOT NULL,
`idSede` int(11) NOT NULL,
`tipoPersona` int(11) NOT NULL,
`num_registro` int(11) NOT NULL,
`codice` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`dal` date NOT NULL,
`al` date NOT NULL,
`ore` int(11) NOT NULL,
`costoOrario` decimal(10,0) NOT NULL,
`annoFormativo` varchar(128) CHARACTER SET latin1 NOT NULL,
`stato_attuale` int(11) NOT NULL,
`datore_attuale` int(11) NOT NULL,
`stato_allievo` varchar(64) CHARACTER SET latin1 NOT NULL DEFAULT 'corsista',
`data_ritiro` date DEFAULT NULL,
`crediti_formativi` int(11) NOT NULL,
`note` longtext CHARACTER SET latin1 NOT NULL,
`valore_economico` decimal(10,2) NOT NULL,
`dataInserimento` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idServizio` (`idServizio`) USING BTREE,
KEY `idAzienda` (`idAzienda`) USING BTREE,
KEY `idContatto` (`idContatto`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=9542 DEFAULT CHARSET=utf8;
this is my EXPLAIN of the select
Now because the query is slow, during 1.5s / 2.0s??
Something wrong?
UPDATE
added new index (with the John Bollinger's answer) to the table ciac_corsi_assenze:
PRIMARY KEY (`id`),
KEY `dataLezione` (`dataLezione`) USING BTREE,
KEY `test` (`idUtente`,`idServizio`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=574582 DEFAULT CHARSET=utf8;
added new index to the table ciac_serviziAcquistati_ITA:
PRIMARY KEY (`id`),
KEY `idAzienda` (`idAzienda`) USING BTREE,
KEY `test2` (`idContatto`,`idServizio`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=9542 DEFAULT CHARSET=utf8;
New EXPLAIN:
But it's always slow :(

Your tables have separate indexes on various columns of interest, but MySQL will use at most one index per table to perform your query. This particular query would probably be sped by table ciac_corsi_assenze having an index on (idUtente, idServizio) (and such an index would supersede the existing one on (idUtente) alone). That should allow MySQL to avoid sorting the result rows to perform the grouping, and it will help more in performing the join than any of the existing indexes do.
The query would probably be sped further by table ciac_serviziAcquistati_ITA having an index on (idContatto, idServizio), or even on (idContatto, idServizio, ritirato). Either of those would supersede the existing index on just (idContatto).

John went the right direction. However the order of columns in the composite index needs changing.
For the GROUP BY, this order is needed (on ASSNZ):
INDEX(idServizio, idUtente)
(and that should replace KEY(idServizio), but not KEY(idUtente))
Then ACQ needs, in this order:
INDEX(idContatto, idServizio, stato_allievo)
replacing only KEY(idContatto).

Related

Optimize MySql Query?

I have the following query in an old database (MySql 5.7.16) that takes almost 45 seconds to run.
The table tbl_flightno has some 5 million records, the tbl_airline around 12,000. It seems the database is a bit at the limit, and every now and then there are some orphan records generated. I haven't found the culprit for that yet.
So I'm currently checking every now and then for those orphans and then fix them. I am wondering now, if there is a better way to search for those orphans.
SELECT COUNT(DISTINCT N.World, N.AirlineCode) AS 'Orphans', COUNT(FlightNoID) AS 'Flights'
FROM tbl_flightno N
LEFT JOIN tbl_airline A ON A.World = N.World AND A.AirlineCode = N.AirlineCode
WHERE A.Airline IS NULL
However I'm not sure there is another, or better way.
Yes, updating the MySql version might benefit, also throwing more hardware would improve, but that would create much more work.
Thanks in advance for any hints.
EDIT: Added the additional information below:
Here is the EXPLAIN for the query.
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE N index World_Airline 81 5217525 100 Using index
1 SIMPLE A eq_ref PRIMARY,VUnique,vWorld,vAirline,vReadOnly PRIMARY 81 as.N.AirlineCode,as.N.World 1 10 Using where; Not exists
-- ----------------------------
-- Table structure for tbl_airline
-- ----------------------------
DROP TABLE IF EXISTS `tbl_airline`;
CREATE TABLE `tbl_airline` (
`AirlineCode` int(8) NOT NULL,
`World` varchar(25) NOT NULL,
`Airline` varchar(255) NOT NULL,
`Last_update` datetime DEFAULT NULL,
`Destinations` int(8) DEFAULT NULL,
`NoFlights` int(8) DEFAULT NULL,
`CityPairs` int(8) DEFAULT NULL,
`Headquarter` varchar(3) DEFAULT NULL,
`TZ` varchar(6) DEFAULT NULL,
`ReadOnly` int(1) NOT NULL DEFAULT '0',
`Code` varchar(10) DEFAULT NULL,
`Alliance` varchar(255) DEFAULT NULL,
`Stock` varchar(10) DEFAULT NULL,
`Country` varchar(255) DEFAULT NULL,
`LegalHome` varchar(255) DEFAULT NULL,
`Parent` varchar(255) DEFAULT NULL,
`Director` varchar(100) DEFAULT NULL,
`Founded` date DEFAULT NULL,
`Rating` varchar(5) DEFAULT NULL,
PRIMARY KEY (`AirlineCode`,`World`),
UNIQUE KEY `VUnique` (`World`,`AirlineCode`) USING BTREE,
KEY `vWorld` (`World`) USING BTREE,
KEY `vAirline` (`AirlineCode`) USING BTREE,
KEY `vReadOnly` (`World`,`ReadOnly`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
SET FOREIGN_KEY_CHECKS=1;
-- ----------------------------
-- Table structure for tbl_flightno
-- ----------------------------
DROP TABLE IF EXISTS `tbl_flightno`;
CREATE TABLE `tbl_flightno` (
`FlightNoID` bigint(8) unsigned NOT NULL AUTO_INCREMENT,
`FlightID` bigint(8) unsigned NOT NULL,
`World` varchar(25) NOT NULL,
`AirlineCode` int(8) NOT NULL,
`FlightNo` varchar(10) NOT NULL,
`Days` varchar(7) NOT NULL,
`TimeDep` time NOT NULL,
`TimeArr` time NOT NULL,
`ActType` varchar(3) NOT NULL,
`ActLink` varchar(6) NOT NULL,
`Operator` varchar(255) NOT NULL,
`Remarks` varchar(50) DEFAULT NULL,
`Validity` varchar(11) DEFAULT NULL,
`Distance` int(10) DEFAULT NULL,
`Duration` time DEFAULT NULL,
`Speed` int(10) DEFAULT NULL,
`Via` int(1) DEFAULT '0',
`AptFromC` varchar(3) DEFAULT NULL,
`AptDestC` varchar(3) DEFAULT NULL,
PRIMARY KEY (`FlightNoID`),
UNIQUE KEY `FlightNoID` (`FlightNoID`) USING BTREE,
KEY `World_Airline` (`World`,`AirlineCode`) USING BTREE,
KEY `DepTimes` (`TimeDep`,`FlightID`) USING BTREE,
KEY `FlightID` (`FlightID`) USING BTREE,
KEY `Distance` (`World`,`AirlineCode`,`Distance`) USING BTREE,
KEY `ActType` (`ActType`) USING BTREE,
KEY `Via` (`Via`) USING BTREE,
KEY `Remarks` (`World`,`Remarks`) USING BTREE,
KEY `ActLink` (`ActLink`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=25879501 DEFAULT CHARSET=utf8;
SET FOREIGN_KEY_CHECKS=1;
You can try the following query:
SELECT COUNT(DISTINCT N.World, N.AirlineCode) AS Orphans
, COUNT(CASE WHEN NOT EXISTS (
SELECT 1 FROM tbl_airline A
WHERE A.World = N.World
AND A.AirlineCode = N.AirlineCode
) THEN 1
END) AS Flights
FROM tbl_flightno N;
Your indexes are optimal and your query formulation is optimal. The problem is that it needs 5M * 12K checks.
If it is I/O bound then, please provide table sizes and value of innodb_buffer_pool_size and the size of RAM. with these, I may have advice on how to cut back on I/O.
[An aside] There are several redundant indexes, but this won't impact the speed of that SELECT
PRIMARY KEY (`AirlineCode`,`World`),
UNIQUE KEY `VUnique` (`World`,`AirlineCode`) USING BTREE,
KEY `vWorld` (`World`) USING BTREE,
KEY `vAirline` (`AirlineCode`) USING BTREE,
KEY `vReadOnly` (`World`,`ReadOnly`) USING BTREE
-->
PRIMARY KEY (`AirlineCode`,`World`),
KEY (`World`,`AirlineCode`) USING BTREE,
KEY `vReadOnly` (`World`,`ReadOnly`) USING BTREE
In the other table, toss these two:
UNIQUE KEY `FlightNoID` (`FlightNoID`) USING BTREE,
KEY `World_Airline` (`World`,`AirlineCode`) USING BTREE,
"Rules":
In MySQL PRIMARY KEY is a UNIQUE key.
When you have INDEX(a,b), INDEX(a) is unnecessary.

MySQL use separate indices for JOIN and GROUP BY

I am trying to execute following query
SELECT
a.sessionID AS `sessionID`,
firstSeen, birthday, gender,
isAnonymous, LanguageCode
FROM transactions AS trx
INNER JOIN actions AS a ON a.sessionID = trx.SessionID
WHERE a.ActionType = 'PURCHASE'
GROUP BY trx.TransactionNumber
Explain provides the following output
1 SIMPLE trx ALL TransactionNumber,SessionID NULL NULL NULL 225036 Using temporary; Using filesort
1 SIMPLE a ref sessionID sessionID 98 infinitiExport.trx.SessionID 1 Using index
The problem is that I am trying to use one field for join and different field for GROUP BY.
How can I tell MySQL to use different indices for same table?
CREATE TABLE `transactions` (
`SessionID` varchar(32) NOT NULL DEFAULT '',
`date` datetime DEFAULT NULL,
`TransactionNumber` varchar(32) NOT NULL DEFAULT '',
`CustomerECommerceTrackID` int(11) DEFAULT NULL,
`SKU` varchar(45) DEFAULT NULL,
`AmountPaid` double DEFAULT NULL,
`Currency` varchar(10) DEFAULT NULL,
`Quantity` int(11) DEFAULT NULL,
`Name` tinytext NOT NULL,
`Category` varchar(45) NOT NULL DEFAULT '',
`customerInfoXML` text,
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
KEY `TransactionNumber` (`TransactionNumber`),
KEY `SessionID` (`SessionID`)
) ENGINE=InnoDB AUTO_INCREMENT=212007 DEFAULT CHARSET=utf8;
CREATE TABLE `actions` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`sessionActionDate` datetime DEFAULT NULL,
`actionURL` varchar(255) DEFAULT NULL,
`sessionID` varchar(32) NOT NULL DEFAULT '',
`ActionType` varchar(64) DEFAULT NULL,
`CustomerID` int(11) DEFAULT NULL,
`IPAddressID` int(11) DEFAULT NULL,
`CustomerDeviceID` int(11) DEFAULT NULL,
`customerInfoXML` text,
PRIMARY KEY (`id`),
KEY `ActionType` (`ActionType`),
KEY `CustomerDeviceID` (`CustomerDeviceID`),
KEY `sessionID` (`sessionID`)
) ENGINE=InnoDB AUTO_INCREMENT=15042833 DEFAULT CHARSET=utf8;
Thanks
EDIT 1: My indexes were broken, I had to add (SessionID, TransactionNumber) index to transactions table, however now, when I try to include trx.customerInfoXML table mysql stops using index
EDIT 2 Another answer does not really solved my problem because it's not standard sql syntax and generally not a good idea to force indices.
For ORM users such syntax is a unattainable luxury.
EDIT 3 I updated my indices and it solved the problem, see EDIT 1

MySQL: Optimizing JOINs to find non-matching records

We have a host management system (let's call it CMDB), and a DNS system, each using different tables. The former syncs to the latter, but manual changes cause them to get out of sync. I would like to craft a query to find aliases in CMDB that do NOT have a matching entry in DNS (either no entry, or the name/IP is different)
Because of the large size of the tables, and the need for this query to run frequently, optimizing the query is very important.
Here's what the tables look like:
cmdb_record: id, ipaddr
cmdb_alias: record_id, host_alias
dns_entry: name, ipaddr
cmdb_alias.record_id is a foreign key from cmdb_record.id, so that one IP address can have multiple aliases.
So far, here's what I've come up with:
SELECT cmdb_alias.host_alias, cmdb_record.ipaddr
FROM cmdb_record
INNER JOIN cmdb_alias ON cmdb_alias.record_id = cmdb_record.id
LEFT JOIN dns_entry
ON dns_entry.ipaddr = cmdb_record.ipaddr
AND dns_entry.name = cmdb_alias.host_alias
WHERE dns_entry.ipaddr IS NULL OR dns_entry.name IS NULL
This seems to work, but takes a very long time to run. Is there a better way to do this? Thanks!
EDIT: As requested, here are the SHOW CREATE TABLEs. There are lots of extra fields that aren't particularly relevant, but included for completeness.
Create Table: CREATE TABLE `cmdb_record` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`ip_version` int(11) DEFAULT NULL,
`ipaddr` varchar(40) DEFAULT NULL,
`ipaddr_numeric` decimal(40,0) DEFAULT NULL,
`block_id` int(11) NOT NULL,
`record_commented` tinyint(1) NOT NULL,
`mod_time` datetime NOT NULL,
`deleted` tinyint(1) NOT NULL,
`deleted_date` datetime DEFAULT NULL,
`record_owner` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ipaddr` (`ipaddr`),
KEY `cmdb_record_fe30f0f7` (`ipaddr`),
KEY `cmdb_record_2b8b575` (`ipaddr_numeric`),
KEY `cmdb_record_45897ef2` (`block_id`),
CONSTRAINT `block_id_refs_id_ed6ed320` FOREIGN KEY (`block_id`) REFERENCES `cmdb_block` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=104427 DEFAULT CHARSET=latin1
Create Table: CREATE TABLE `cmdb_alias` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host_alias` varchar(255) COLLATE latin1_general_cs NOT NULL,
`record_id` int(11) NOT NULL,
`record_order` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `cmdb_alias_fcffc3bb` (`record_id`),
KEY `alias_lookup` (`host_alias`),
CONSTRAINT `record_id_refs_id_8169fc71` FOREIGN KEY (`record_id`) REFERENCES `cmdb_record` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=155433 DEFAULT CHARSET=latin1 COLLATE=latin1_general_cs
Create Table: CREATE TABLE `dns_entry` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`rec_grp_id` varchar(40) NOT NULL,
`parent_id` int(11) NOT NULL,
`domain_id` int(11) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`type` varchar(6) DEFAULT NULL,
`ipaddr` varchar(255) DEFAULT NULL,
`ttl` int(11) DEFAULT NULL,
`prio` int(11) DEFAULT NULL,
`status` varchar(20) NOT NULL,
`op` varchar(20) NOT NULL,
`mod_time` datetime NOT NULL,
`whodunit` varchar(50) NOT NULL,
`comments` longtext NOT NULL,
PRIMARY KEY (`id`),
KEY `dns_entry_a2431ea` (`domain_id`),
KEY `dns_entry_52094d6e` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=49437 DEFAULT CHARSET=utf8
If you don't have one already, create an index on dns_entry(ipaddr, name). This might be all you need to speed the query.

Why do i HAVE to optimize tables?

I have a pretty big table with contains about 3 million records.
When running a very simple query, joining this table on a few others (all with indexes and/or primary keys), the query will take about 25 seconds to complete!
The value of "Handler_read_next" is about 7 million!
Number of requests to read the next row in key order, incremented if you are querying an index column with a range constraint or if you are doing an index scan.
This problem have only started since this table began to grow big.
Now if I do an "optimize tables" on this table, the query will run in about 0.02 seconds and "Handler_read_next" will have a value of about 1500.
How can the difference be so extreme, and do I really have to setup a scheduled query, optimizing this table once a week or so? Even so, I would like to know the meaning behind this and why mysql behaves like this. Sure, rows are deleted and updated pretty much in this table, but should it get so badly fragmented in only one week that the query goes from 0.02 sec to 25 sec?
Edit: After request, here comes the query in question:
SELECT *
FROM budget_expenses
JOIN budget_categories
ON budget_categories.BudgetAreaId = budget_expenses.BudgetAreaId
AND budget_categories.BudgetCategoryId = budget_expenses.BudgetCategoryId
LEFT JOIN budget_types
ON budget_types.BudgetAreaId = budget_expenses.BudgetAreaId
AND budget_types.BudgetCategoryId = budget_expenses.BudgetCategoryId
AND budget_types.BudgetTypeId = budget_expenses.BudgetTypeId
WHERE budget_expenses.BudgetId = 1
AND budget_expenses.ExpenseDate >= '2012-11-25'
AND budget_expenses.ExpenseDate <= '2012-12-24'
AND budget_expenses.BudgetAreaId = 2
ORDER BY budget_expenses.ExpenseDate DESC,
budget_expenses.ExpenseTime IS NULL ASC,
budget_expenses.ExpenseTime DESC
(BudgetAreaId, BudgetCategoryId) is the primary key in budget_categories and (BudgetAreaId, BudgetCategoryId, BudgetTypeId) is the primary key in budget_types. In budget_expenses these 3 keys are indexes and also ExpenseDate has an index. This query returns about 20 rows.
Show create table:
CREATE TABLE `budget_areas` (
`BudgetAreaId` int(11) NOT NULL,
`Name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`BudgetAreaId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `budget_categories` (
`BudgetAreaId` int(11) NOT NULL,
`BudgetCategoryId` int(11) NOT NULL AUTO_INCREMENT,
`Name` varchar(255) DEFAULT NULL,
`SortOrder` int(11) DEFAULT NULL,
PRIMARY KEY (`BudgetAreaId`,`BudgetCategoryId`),
KEY `BudgetAreaId` (`BudgetAreaId`,`BudgetCategoryId`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
CREATE TABLE `budget_types` (
`BudgetAreaId` int(11) NOT NULL,
`BudgetCategoryId` int(11) NOT NULL,
`BudgetTypeId` int(11) NOT NULL,
`Name` varchar(255) DEFAULT NULL,
`SortId` int(11) DEFAULT NULL,
PRIMARY KEY (`BudgetAreaId`,`BudgetCategoryId`,`BudgetTypeId`),
KEY `BudgetAreaId` (`BudgetAreaId`,`BudgetCategoryId`,`BudgetTypeId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `budget_expenses` (
`ExpenseId` int(11) NOT NULL AUTO_INCREMENT,
`BudgetId` int(11) NOT NULL,
`TempId` int(11) DEFAULT NULL,
`BudgetAreaId` int(11) DEFAULT NULL,
`BudgetCategoryId` int(11) DEFAULT NULL,
`BudgetTypeId` int(11) DEFAULT NULL,
`Company` varchar(255) DEFAULT NULL,
`ImportCompany` varchar(255) DEFAULT NULL,
`Sum` double(50,2) DEFAULT NULL,
`ExpenseDate` date DEFAULT NULL,
`ExpenseTime` time DEFAULT NULL,
`Inserted` datetime DEFAULT NULL,
`Changed` datetime DEFAULT NULL,
`InsertType` int(1) DEFAULT NULL,
`AccountId` int(11) DEFAULT NULL,
`BankCardId` int(11) DEFAULT NULL,
PRIMARY KEY (`ExpenseId`),
KEY `BudgetId` (`BudgetId`),
KEY `AccountId` (`AccountId`),
KEY `Company` (`Company`) USING BTREE,
KEY `ExpenseDate` (`ExpenseDate`),
KEY `BudgetAreaId` (`BudgetAreaId`),
KEY `BudgetCategoryId` (`BudgetCategoryId`),
KEY `BudgetTypeId` (`BudgetTypeId`),
CONSTRAINT `budget_expenses_ibfk_1` FOREIGN KEY (`BudgetId`) REFERENCES `budgets` (`BudgetId`)
) ENGINE=InnoDB AUTO_INCREMENT=3604462 DEFAULT CHARSET=latin1
After I copy pasted this I changed from MyIsam to Innodb on the budget_categories table.
Edit: The change from myisam to innodb didn't make any difference. The query is now very slow, just 12 hours after i optimized the budget_expenses table!
Here is the explain for the query which now takes about 9 seconds:
http://jsfiddle.net/dmVPY/1/
Ahhh MyISAM....
Try changing the table type (aka 'storage engine') to InnoDB instead.
If you do this, make sure innodb_buffer_pool_size in your my.cnf is a sensible value - the default is too small.

Data Structure causing impossible joins

Tables:
nodes
data_texts
data_profiles
data_locations
data_profiles
data_media
data_products
data_metas
categories
tags
categories_nodes
tags_nodes
This question is a generalized question and is on the back of another question
Explanation:
Each of the "data" tables has a node_id that refers back to the id of the nodes table (hasMany/belongsTo association).
A "Node" can be anything - a TV Show, a Movie, a Person, an Article...etc (all generated via a CMS, so the user can control what type of "Nodes" they want).
When pulling data, I want to be able to query against certain fields. For example if they do a search, I want to be able to pull nodes that have data_texts.title = '%george%' or order by the datetime field in data_locations.
The problem is, when I do a join on all seven data tables (or more), the query has to hit so many combined rows that it just times out (even with a nearly empty database.... total 200 rows across the entire database).
I realize I can determine IF I need a join depending on what I'm doing - but even with five or six joins (once the database gets to 10k+ records), it's going to be horribly slow, if it works at all. Per this question, the query I'm using just doing a join on these tables times out completely.
Each node can have multiple rows of each data type (for multi-language reasons among others).
I'm completely defeated - I'm at the point where I think I need to restructure the entire thing, but don't have the time for that. I've thought about combining all into one table, but aren't sure how....etc
nodes
CREATE TABLE `nodes` (
`id` CHAR(36) NOT NULL,
`name` VARCHAR(100) NOT NULL,
`slug` VARCHAR(100) NOT NULL,
`node_type_id` CHAR(36) NOT NULL,
`site_id` CHAR(36) NOT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `nodeTypeId` (`node_type_id`),
INDEX `slug` (`slug`),
INDEX `nodeId` (`id`)
)
COLLATE='latin1_swedish_ci'
ENGINE=MyISAM;
data_texts:
CREATE TABLE `data_texts` (
`id` CHAR(36) NOT NULL,
`title` VARCHAR(250) NULL DEFAULT NULL,
`subtitle` VARCHAR(500) NULL DEFAULT NULL,
`content` LONGTEXT NULL,
`byline` VARCHAR(250) NULL DEFAULT NULL,
`language_id` CHAR(36) NULL DEFAULT NULL,
`foreign_key` CHAR(36) NULL DEFAULT NULL,
`model` VARCHAR(40) NULL DEFAULT NULL,
`node_id` CHAR(36) NULL DEFAULT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `nodeId` (`node_id`),
INDEX `languageId_nodeId` (`language_id`, `node_id`),
INDEX `foreignKey_model` (`foreign_key`, `model`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
data_profiles
CREATE TABLE `data_profiles` (
`id` CHAR(36) NOT NULL,
`name` VARCHAR(80) NULL DEFAULT NULL,
`email_personal` VARCHAR(100) NULL DEFAULT NULL,
`email_business` VARCHAR(100) NULL DEFAULT NULL,
`email_other` VARCHAR(100) NULL DEFAULT NULL,
`title` VARCHAR(100) NULL DEFAULT NULL,
`description` LONGTEXT NULL,
`prefix` VARCHAR(40) NULL DEFAULT NULL,
`phone_home` VARCHAR(40) NULL DEFAULT NULL,
`phone_business` VARCHAR(40) NULL DEFAULT NULL,
`phone_mobile` VARCHAR(40) NULL DEFAULT NULL,
`phone_other` VARCHAR(40) NULL DEFAULT NULL,
`foreign_key` CHAR(36) NULL DEFAULT NULL,
`model` VARCHAR(40) NULL DEFAULT NULL,
`node_id` CHAR(36) NULL DEFAULT NULL,
`language_id` CHAR(36) NULL DEFAULT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
`user_id` CHAR(36) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `nodeId` (`node_id`),
INDEX `languageId_nodeId` (`node_id`, `language_id`),
INDEX `foreignKey_model` (`foreign_key`, `model`)
)
COLLATE='latin1_swedish_ci'
ENGINE=MyISAM;
categories
CREATE TABLE `categories` (
`id` CHAR(36) NOT NULL,
`name` VARCHAR(100) NOT NULL,
`node_type_id` CHAR(36) NOT NULL,
`site_id` CHAR(36) NOT NULL,
`slug` VARCHAR(100) NOT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `nodeTypeId` (`node_type_id`),
INDEX `slug` (`slug`)
)
COMMENT='Used to categorize nodes'
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
categories_nodes
CREATE TABLE `categories_nodes` (
`id` CHAR(36) NOT NULL,
`category_id` CHAR(36) NOT NULL,
`node_id` CHAR(36) NOT NULL,
PRIMARY KEY (`id`),
INDEX `categoryId_nodeId` (`category_id`, `node_id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
node_tags
CREATE TABLE `node_tags` (
`id` CHAR(36) NOT NULL,
`name` VARCHAR(40) NOT NULL,
`site_id` CHAR(36) NOT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `siteId` (`site_id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
nodes_node_tags
CREATE TABLE `nodes_node_tags` (
`id` CHAR(36) NOT NULL,
`node_id` CHAR(36) NOT NULL,
`node_tag_id` CHAR(36) NOT NULL,
PRIMARY KEY (`id`),
INDEX `node_id_node_tag_id` (`node_id`, `node_tag_id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
MySQL:
SELECT `Node`.`id`, `Node`.`name`, `Node`.`slug`, `Node`.`node_type_id`, `Node`.`site_id`, `Node`.`created`, `Node`.`modified`
FROM `mysite`.`nodes` AS `Node`
LEFT JOIN `mysite`.`data_date_times` AS `DataDateTime` ON (`DataDateTime`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_locations` AS `DataLocation` ON (`DataLocation`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_media` AS `DataMedia` ON (`DataMedia`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_metas` AS `DataMeta` ON (`DataMeta`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_profiles` AS `DataProfile` ON (`DataProfile`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_products` AS `DataProduct` ON (`DataProduct`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_texts` AS `DataText` ON (`DataText`.`node_id` = `Node`.`id`)
WHERE 1=1
GROUP BY `Node`.`id`
Firstly, try InnoDB, not MyISAM.
Secondly, remove the group by, see how well it runs then, and how many rows are involved. Shouldn't be that many, but it's interesting.
You don't need the 'nodeId' index on node (as you already have it as a primary key). Again, shouldn't make any difference.
The where clause is irrelevant. You can remove it with no effect one way or another.
Thirdly, well, something is seriously broken.
Have a quick look on how to start profiling (e.g. http://dev.mysql.com/doc/refman/5.0/en/show-profile.html) , and run a profile command to see where all the time is going. Post it here if it doesn't immediately show that something is broken.
I'm unfortunately not in a position where I can do any tests right now. I'll just throw out some ideas. I might be able to do some tests later.
Be suspicious of different collations.
Some of your ids are useless. For example, you should drop the column categories_nodes.id, and put a primary key constraint on {category_id, node_id} instead.
Be suspicious of any design that requires joining all the tables at run time. There are better ways.
Use innodb and foreign key constraints.