Make LEFT JOIN query more efficient - mysql

The following query with LEFT JOIN is drawing too much memory (~4GB), but the host only allows about 120MB for this process.
SELECT grades.grade, grades.evaluation_id, evaluations.evaluation_name, evaluations.value, evaluations.maximum FROM grades LEFT JOIN evaluations ON grades.evaluation_id = evaluations.evaluation_id WHERE grades.registrar_id = ?
Create table syntax for grades:
CREATE TABLE `grades` (
`grade_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`evaluation_id` int(10) unsigned DEFAULT NULL,
`registrar_id` int(10) unsigned DEFAULT NULL,
`grade` float unsigned DEFAULT NULL,
`entry_date` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`grade_id`),
KEY `registrarGrade_key` (`registrar_id`),
KEY `evaluationKey` (`evaluation_id`),
KEY `grades_id_index` (`grade_id`),
KEY `eval_id_index` (`evaluation_id`),
KEY `grade_index` (`grade`),
CONSTRAINT `evaluationKey` FOREIGN KEY (`evaluation_id`) REFERENCES `evaluations` (`evaluation_id`),
CONSTRAINT `registrarGrade_key` FOREIGN KEY (`registrar_id`) REFERENCES `registrar` (`reg_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1627 DEFAULT CHARSET=utf8;
evaluations table:
CREATE TABLE `evaluations` (
`evaluation_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`instance_id` int(11) unsigned DEFAULT NULL,
`evaluation_col` varchar(255) DEFAULT NULL,
`evaluation_name` longtext,
`evaluation_method` enum('class','email','online','lab') DEFAULT NULL,
`evaluation_deadline` date DEFAULT NULL,
`maximum` int(11) unsigned DEFAULT NULL,
`value` float DEFAULT NULL,
PRIMARY KEY (`evaluation_id`),
KEY `instanceID_key` (`instance_id`),
KEY `eval_name_index` (`evaluation_name`(3)),
KEY `eval_method_index` (`evaluation_method`),
KEY `eval_deadline_index` (`evaluation_deadline`),
KEY `maximum` (`maximum`),
KEY `value_index` (`value`),
KEY `eval_id_index` (`evaluation_id`),
CONSTRAINT `instanceID_key` FOREIGN KEY (`instance_id`) REFERENCES `course_instance` (`instance_id`)
) ENGINE=InnoDB AUTO_INCREMENT=72 DEFAULT CHARSET=utf8;
The Php code to pull the data:
$sql = "SELECT grades.grade, grades.evaluation_id, evaluations.evaluation_name, evaluations.value, evaluations.maximum FROM grades LEFT JOIN evaluations ON grades.evaluation_id = evaluations.evaluation_id WHERE grades.registrar_id = ? AND YEAR(entry_date) = YEAR(CURDATE())";
$result = $mysqli->prepare($sql);
if($result === FALSE)
die($mysqli->error);
$result->bind_param('i',$reg_ids[$i]);
$result->execute();
$result->bind_result($grade, $eval_id, $evalname, $evalval, $max);
while($result->fetch()){
And the fatal error message
Is there a way to drastically reduce the memory load on this query?
Thanks!
Curiously, changing the MySQL query did not change the amount of memory attempted to be allocated

Please provide SHOW CREATE TABLE for both tables; I want to see if you have anything like these:
grades: INDEX(registration_id)
evaluations: PRIMARY KEY(evaluation_id)
Edit
You now have redundant indexes in both table -- probably because of my suggestion. That is, you already have both the indexes that would help with the query.
Since you have LONGTEXT and it is trying to allocate exactly 4GB, the max size of LONGTEXT, I guess that is the problem. Suggest you ALTER that column to be TEXT (64KB max) or MEDIUMTEXT (16MB max). I have not seen this behavior before in PHP, but then I rarely use anything bigger than TEXT.

Related

MariaDB INNER JOIN with foreign keys are MUCH slower than without them

Please help me, I'm stuck with the strange behaviour of MariaDB server.
I have 3 tables.
CREATE TABLE `default_work` (
`add_date` datetime(6) NOT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`keywords` varchar(255) DEFAULT NULL,
`short_text` longtext DEFAULT NULL,
`downloads` int(10) unsigned NOT NULL,
`published` tinyint(1) NOT NULL,
`subject_id` int(11) NOT NULL,
`work_type_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `default_work_subject_id_IDX` (`subject_id`) USING BTREE,
KEY `default_work_work_type_id_IDX` (`work_type_id`) USING BTREE,
CONSTRAINT `default_work_FK` FOREIGN KEY (`subject_id`) REFERENCES `default_subject` (`id`),
CONSTRAINT `default_work_FK_1` FOREIGN KEY (`work_type_id`) REFERENCES `default_worktype` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=210673 DEFAULT CHARSET=utf8
CREATE TABLE `default_subject` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`subject` varchar(255) NOT NULL,
`old_id` int(10) unsigned NOT NULL,
`subject_literal` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=43 DEFAULT CHARSET=utf8
CREATE TABLE `default_worktype` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`work_type` varchar(250) NOT NULL,
`description` longtext DEFAULT NULL,
`old_id` int(10) unsigned NOT NULL,
`work_type_literal` varchar(250) NOT NULL,
`title` varchar(255) NOT NULL,
`multiple` varchar(255) NOT NULL,
`keywords` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `default_worktype_old_id_a8b508fe_uniq` (`old_id`),
UNIQUE KEY `default_worktype_work_type_literal_1e609434_uniq` (`work_type_literal`)
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=utf8
These tables were created by Django ORM but it seems to be ok.
The default_work table has about 200,000 records, default_subject - 42, and default_worktype - 12.
After I was making a request in Django admin with simple joins between those tables I've got about 9 secs of query time.
Looked in SQL log I've found a raw query:
SELECT `default_work`.`id`, `default_work`.`title`, `default_worktype`.`work_type`,`default_subject`.`subject`
FROM `default_work`
INNER JOIN `default_subject` ON (`default_work`.`subject_id` = `default_subject`.`id`)
INNER JOIN `default_worktype` ON (`default_work`.`work_type_id` = `default_worktype`.`id`)
ORDER BY `default_work`.`id` DESC LIMIT 100
The explain showing:
Explain result of the query with indexes
And this is a bit confusing because when I deleted all indexes on table default_work except the primary key, the results were completely different. The request time was about 3.4 msec and explain shows the all primary keys are used correctly.
Explain result of the query without indexes
PS. I'm tried to reproduce this situation on PostgreSQL and got a 1.3 msec with the request with indexes and foreign keys.
Looking at your EXPLAIN results you can see that when the foreign keys are turned on the system is using that key in the join, instead of choosing to use the primary key in the target table. (row 2)
As there will be many records with the same value it massivley increases the records that are being evaluated.
I don't know why it's choosing to do that. You may find that rewriting the select statement in a different order will change how it chooses the indexes. You may find the choice is different if in the ON clause you secify target table first then the source table (default_subject.id = default_work.subject_id)

Mysql Percona crushes the table while trying to create a unique index on fields where there is a generated one

CREATE TABLE tasks (
`user_id` INT(10) UNSIGNED NOT NULL,
`name` VARCHAR(255) NOT NULL,
`code` VARCHAR(255) NOT NULL,
`params` JSON,
`hash` VARCHAR(32) GENERATED ALWAYS AS (MD5(`params`)),
`created_at` TIMESTAMP NULL DEFAULT NULL,
`updated_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
CONSTRAINT `tasks_user_id_users_id_fk`
FOREIGN KEY (`user_id`) REFERENCES `users`(`id`)
)ENGINE=INNODB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
then i try to add a unique index on user_id and hash, i tried to do it during the table create, but there were a problem with the foreign key, so i decided to do it separately.
ALTER TABLE tasks
ADD UNIQUE INDEX `tasks_user_id_hash_unique_idx` (`user_id`, `hash`);
Then i get strange problem error 1146
Table '<dbname>.tasks' doesn't exist
Since then strange errors
1. I cannot delete it while it exists and i cannot do with it anything, cause it does not exist. After mysql reload I get the table deleted but if i try to create it again:
Error Code: 1813
Tablespace '<dbname>.tasks' exists.
I found the decision for unique field, i simply genereate md5( hash + user_id) UNIQUE. But How can I get rid of that problem which still exists and what is going on. It looks like a BUG in mysql Percona 5.7.14-7
so the right way to create the table
CREATE TABLE tasks (
`user_id` INT(10) UNSIGNED NOT NULL,
`name` VARCHAR(255) NOT NULL,
`code` VARCHAR(255) NOT NULL,
`params` JSON,
`hash` VARCHAR(32) GENERATED ALWAYS AS (MD5(CONCAT(`params`,`user_id`))) UNIQUE,
`created_at` TIMESTAMP NULL DEFAULT NULL,
`updated_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
CONSTRAINT `tasks_user_id_users_id_fk`
FOREIGN KEY (`user_id`) REFERENCES `users`(`id`)
)ENGINE=INNODB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
But, again, the question with the ghosts still exists after the first tries!

SELECT ... INTO OUTFILE performance

I am trying to optimize the export process of a query.
I have the following tables (I omit some irrelevant fields):
CREATE TABLE _termsofuse (
ID int(11) NOT NULL AUTO_INCREMENT, TTC_ART_ID int(11) DEFAULT NULL,
TTC_TYP_ID int(11) DEFAULT NULL,
TERM_OF_USE_NAME varchar(200) DEFAULT NULL,
TERM_OF_USE_VALUE varchar(200) DEFAULT NULL,
PRIMARY KEY (ID)
) ENGINE=InnoDB AUTO_INCREMENT=185905671 DEFAULT CHARSET=utf8;
CREATE TABLE vehicle (
ID mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
TTC_TYP_ID int(11) unsigned NOT NULL,
PRIMARY KEY (ID),
UNIQUE KEY TTC_TYP_ID_UNIQUE (TTC_TYP_ID)
) ENGINE=InnoDB AUTO_INCREMENT=44793 DEFAULT CHARSET=utf8;
CREATE TABLE part (
ID int(11) unsigned NOT NULL AUTO_INCREMENT,
TTC_ART_ID int(11) unsigned NOT NULL,
PRIMARY KEY (ID),
UNIQUE KEY TTC_ART_ID_UNIQUE (TTC_ART_ID)
) ENGINE=InnoDB AUTO_INCREMENT=3732260 DEFAULT CHARSET=utf8;
CREATE TABLE term_of_use_name (
ID smallint(5) unsigned NOT NULL AUTO_INCREMENT,
ID_Lang tinyint(3) unsigned NOT NULL,
Name varchar(200) NOT NULL,
PRIMARY KEY (ID, ID_Lang),
UNIQUE KEY Name_Lang_UNIQUE (Name, ID_Lang),
KEY fk_term_of_use_name_lang_id_lang_idx (ID_Lang),
CONSTRAINT fk_term_of_use_name_lang_id_lang FOREIGN KEY (ID_Lang)
REFERENCES lang (ID) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=732 DEFAULT CHARSET=utf8;
CREATE TABLE term_of_use_value (
ID mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
ID_Lang tinyint(3) unsigned NOT NULL,
Value varchar(200) NOT NULL,
PRIMARY KEY (ID,ID_Lang),
UNIQUE KEY Value_Lang_UNIQUE (Value,ID_Lang),
KEY fk_term_of_use_value_lang_id_lang_idx (ID_Lang),
CONSTRAINT fk_term_of_use_value_lang_id_lang FOREIGN KEY (ID_Lang)
REFERENCES lang (ID) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=887502 DEFAULT CHARSET=utf8;
Now I try to select some columns to a csv file. Afterwards i will import the file to a database table, but I suspect this should not take too much time.
My Select statement is the following:
SELECT DISTINCT vehicle.ID, part.ID, term_of_use_name.ID, term_of_use_value.ID FROM _termsofuse
INNER JOIN vehicle ON vehicle.TTC_TYP_ID = _termsofuse.TTC_TYP_ID
INNER JOIN part ON part.TTC_ART_ID = _termsofuse.TTC_ART_ID
INNER JOIN term_of_use_name ON term_of_use_name.Name = _termsofuse.TERM_OF_USE_NAME AND term_of_use_name.ID_Lang = 2
INNER JOIN term_of_use_value ON term_of_use_value.Value = _termsofuse.TERM_OF_USE_VALUE AND term_of_use_value.ID_Lang = 2
INTO OUTFILE 'termsofuse.csv'
CHARACTER SET utf8
FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n';
This query takes longer than 8 hours on my laptop (I have 4 GB of RAM).
I tried to see the explain of the SELECT part and it shows the following:
I do not understand where exactly is the bottleneck. I have exported a similar (about 95 Million records) query in less than 1h. Also breaking the results into multiple tables using limit does not seem to help much...
Please have a look and any additional info you require just tell me.
Thank you in advance.
EDIT 15/01/2016
Results of Explain Select
Why have an ID when you have a perfectly good UNIQUE INT that could be the PK?
Seriously -- Having to reach through a secondary key slows things down. If each lookup slows it down by a factor of 2, that could add up.
How much RAM do you have? What is the value of innodb_buffer_pool_size? It should be about 70% of available RAM.
Let's see the EXPLAIN SELECT ...; there may be more clues there.

What is the cause for MySQL: Errorno 150

Can someone please explain the cause for the following error, 'Can't create table 'Activities' (errno: 150)'
I'm under the understading that the data types and lengths have to be the same, does is have anything to do with the auto increment?
Create Table `LinkMemberActivity` (
`LinkID` int(11) unsigned NOT NULL AUTO_INCREMENT,
`MID` int(11) unsigned NOT NULL,
`AID` int(11) unsigned NOT NULL,
PRIMARY KEY (`LinkID`),
FOREIGN KEY (`MID`) REFERENCES Members(`MID`)) ENGINE=InnoDB DEFAULT CHARSET=latin1;
)
CREATE TABLE `Activities` (
`AID` int(11) unsigned NOT NULL AUTO_INCREMENT,
`Name` varchar(25) DEFAULT NULL,
`MaxCapacity` int(25) DEFAULT NULL,
`StartTime` time DEFAULT NULL,
`EndTime` time DEFAULT NULL,
PRIMARY KEY (`AID`),
FOREIGN KEY (`AID`) REFERENCES LinkMemberActivity(`AID`))
ENGINE=InnoDB DEFAULT CHARSET=latin1 );
You are trying to make a primary key column a foreign key dependent field. This is not only unusual but makes no sense in a datamodel, unless it is part of a composite key. Common practice has a column foreign key dependent on another tables primary key. Not sure what reasons you have for the way you designed your datamodel this way, but you can fix this problem by creating a not null autoincrement column named ID and make this column the primary key. Next remove autoincrement from aid.

SQL: Refactoring a multi-join query

I have a query that should be quite simple and yet it causes me a lot of headaches.
I have a simple ads system that requires filtering ads according to a few variables.
I need to limit the number of views/clicks per day and the total number of views/clicks for a given ad. Also each ad is linked to one or more slots in which the ad can appear. I have a table that saves the statistics that I need about each ad. Note that the statistics table changes very frequently.
These are the tables that I'm using:
CREATE TABLE `t_ads` (
`id` int(10) unsigned NOT NULL auto_increment,
`name` varchar(255) NOT NULL,
`content` text NOT NULL,
`is_active` tinyint(1) unsigned NOT NULL,
`start_date` date NOT NULL,
`end_date` date NOT NULL,
`max_views` int(10) unsigned NOT NULL,
`type` tinyint(3) unsigned NOT NULL default '0',
`refresh` smallint(5) unsigned NOT NULL default '0',
`max_clicks` int(10) unsigned NOT NULL,
`max_daily_clicks` int(10) unsigned NOT NULL,
`max_daily_views` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `t_ad_slots` (
`id` int(10) unsigned NOT NULL auto_increment ,
`name` varchar(255) NOT NULL,
`width` int(10) unsigned NOT NULL,
`height` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `t_ads_to_slots` (
`ad_id` int(10) unsigned NOT NULL,
`slot_id` int(10) unsigned NOT NULL,
`value` int(10) unsigned NOT NULL,
PRIMARY KEY (`ad_id`,`slot_id`),
KEY `slot_id` (`slot_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `t_ads_to_slots`
ADD CONSTRAINT `t_ads_to_slots_ibfk_1` FOREIGN KEY (`ad_id`) REFERENCES `t_ads` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
ADD CONSTRAINT `t_ads_to_slots_ibfk_2` FOREIGN KEY (`slot_id`) REFERENCES `t_ad_slots` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION;
CREATE TABLE `t_ad_stats` (
`ad_id` int(10) unsigned NOT NULL,
`slot_id` int(10) unsigned NOT NULL,
`date` date NOT NULL COMMENT,
`views` int(10) unsigned NOT NULL,
`unique_views` int(10) unsigned NOT NULL,
`clicks` int(10) unsigned NOT NULL default '0',
PRIMARY KEY (`ad_id`,`slot_id`,`date`),
KEY `slot_id` (`slot_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `t_ad_stats`
ADD CONSTRAINT `t_ad_stats_ibfk_1` FOREIGN KEY (`ad_id`) REFERENCES `t_ads` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
ADD CONSTRAINT `t_ad_stats_ibfk_2` FOREIGN KEY (`slot_id`) REFERENCES `t_ad_slots` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION;
This is the query that I use to get ads for a given slot (Note that in this example I hard coded 20 as the slot id and 0,1,2 as the ad type, I get this data from a php script which invokes this query)
SELECT `ads`.`content`, `slots`.`value`, `ads`.`id`, `ads`.`refresh`, `ads`.`type`,
SUM(`total_stats`.`views`) AS "total_views",
SUM(`total_stats`.`clicks`) AS "total_clicks"
FROM (`t_ads` AS `ads`,
`t_ads_to_slots` AS `slots`)
LEFT JOIN `t_ad_stats` AS `total_stats`
ON `total_stats`.`ad_id` = `ads`.`id`
LEFT JOIN `t_ad_stats` AS `daily_stats`
ON (`daily_stats`.`ad_id` = `ads`.`id`) AND
(`daily_stats`.`date` = CURDATE())
WHERE (`ads`.`id` = `slots`.`ad_id`) AND
(`ads`.`type` IN(0,1,2)) AND
(`slots`.`slot_id` = 20) AND
(`ads`.`is_active` = 1) AND
(`ads`.`end_date` >= NOW()) AND
(`ads`.`start_date` <= NOW()) AND
((`ads`.`max_views` = 0) OR
(`ads`.`max_views` > "total_views")) AND
((`ads`.`max_clicks` = 0) OR
(`ads`.`max_clicks` > "total_clicks")) AND
((`ads`.`max_daily_clicks` = 0) OR
(`ads`.`max_daily_clicks` > IFNULL(`daily_stats`.`clicks`,0))) AND
((`ads`.`max_daily_views` = 0) OR
(`ads`.`max_daily_views` > IFNULL(`daily_stats`.`views`,0)))
GROUP BY (`ads`.`id`)
I believe that this query is self explanatory, even though its quite long. Note that the MySQL version that I'm using is: 5.0.51a-community. It seems to me like the big issue here is the double join to the stats table (I did that so that I will be able to get the data from a specific record and from multiple records (sum)).
How would you implement this query in order to get better results? (Note that I can't change from InnoDB).
Hopefully everything is clear about my question, but if that is not the case, please ask and I will clarify.
Thanks in advance,
Kfir
Add indexes to following columns:
t_ads.is_active
t_ads.start_date
t_ads.end_date
Change the order of the primary key on t_ad_stats to:
(`ad_id`,`date`,`slot_id`)
or add a covering index to t_ad_stats
('ad_id', 'date')
Change from 0 meaning "no limit" to 2147483647 meaning no limit, so you can change things like:
((`ads`.`max_views` = 0) OR (`ads`.`max_views` > "total_views"))
to
(`ads`.`max_views` > "total_views")
You could greatly improve this is if you were keeping running totals instead of having to calculate them each time.
Expanding on a comment above I believe that the following columns should be indexed:
ads.id
ads.type
ads.start_date
ads.end_date
daily_stats.date
As well as these:
slots.slot_id
ads.is_active
And these as well:
ads.max_views
ads.max_clicks
ads.max_daily_clicks
ads.max_daily_views
daily_stats.clicks
daily_stats.views
Do note that applying indexes on these columns will speed up your SELECTs but slow down your INSERTs since the indexes will need updating as well. But, you don't have to apply all of this all at once. You can do it incrementally and see how the performance shakes out for selects as well as inserts. If you cannot find a good middleground then I would suggest denormalization.