SQL optimization: Finding first unwatched video using no subselect - mysql

My question. I think I have a performance killing subquery but can't prove it. My first tries using JOIN failed. Can somebody provide a more high performing solution or confirm that this is indeed acceptable as it is?
I have two tables, one that contains a todo-list (joblist) and one that tracks every user's progress (userprogress). A job can but must not be to watch a video. (It's a site for e-learning.)
When videos have been watched they are automatically set to 'finished' on an enum field. Users may also skip videos manually (status = 'skipped').
Table structures are provided below.
To get the first video that a user have not watched at all (no record in userprogress) or has begun to watch (status = 'begun') I am using this query.
I have set indexes on ever field that is being used for selection or ordering. However I am unsure if they are all needed.
The SELECT statement has two parts
An inner subselect, where I fetch all seen or skipped videos
The main statement, where I fetch the first video not among the ones found by (1)
There is a named parameter for PHP (:email), to avoid SQL-injection.
SELECT jl.where_to_do_it FROM joblist AS jl
INNER JOIN userprogress AS up
ON (jl.joblistID = up.joblistID)
WHERE jl.what_to_do = 'video'
AND jl.joblistID NOT IN
(
SELECT injl.joblistID
FROM joblist AS injl
INNER JOIN userprogress AS inup
ON (injl.joblistID = inup.joblistID)
WHERE
(inup.status = 'finished' OR inup.status = 'skipped')
AND
inup.email = :email
AND
injl.what_to_do = 'video'
)
ORDER BY jl.joborder ASC
LIMIT 0,1
This is the output from EXPLAIN, which I need some help understanding.
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY jl ref PRIMARY,what_to_do what_to_do 602 const 9 Using where; Using filesort
1 PRIMARY up ref joblistID joblistID 3 jl.joblistID 1 Using index
2 DEP-SUB injl eq_ref PRIMARY,what_to_do PRIMARY 3 func 1 Using where
2 DEP-SUB inup eq_ref nodup,email,joblistID,status nodup 455 const,func 1 Using where
The create table commands:
CREATE TABLE IF NOT EXISTS `joblist` (
`joblistID` mediumint(10) unsigned NOT NULL AUTO_INCREMENT,
`what_to_do` varchar(200) COLLATE utf8_swedish_ci NOT NULL,
`where_to_do_it` varchar(100) COLLATE utf8_swedish_ci NOT NULL,
`joborder` mediumint(6) NOT NULL,
`track` enum('fast','slow','bonus') COLLATE utf8_swedish_ci NOT NULL DEFAULT 'slow',
`chapter` tinyint(11) unsigned NOT NULL COMMENT 'What book chapter it relates to',
PRIMARY KEY (`joblistID`),
KEY `nodupjobs` (`joborder`,`chapter`),
KEY `what_to_do` (`what_to_do`),
KEY `where_to_do_it` (`where_to_do_it`),
KEY `joborder` (`joborder`),
KEY `track` (`track`),
KEY `chapter` (`chapter`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci COMMENT='Suggested working order';
CREATE TABLE IF NOT EXISTS `userprogress` (
`upID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(150) COLLATE utf8_swedish_ci NOT NULL COMMENT 'user id',
`joblistID` mediumint(9) unsigned NOT NULL COMMENT 'foreign key',
`progressdata` varchar(300) COLLATE utf8_swedish_ci DEFAULT NULL COMMENT 'JSON object describing progress',
`percentage_complete` tinyint(3) unsigned DEFAULT NULL,
`status` enum('begun','skipped','finished') COLLATE utf8_swedish_ci DEFAULT 'begun',
`lastupdate` datetime NOT NULL,
`approved` datetime DEFAULT NULL,
PRIMARY KEY (`upID`),
UNIQUE KEY `nodup` (`email`,`joblistID`),
KEY `email` (`email`),
KEY `joblistID` (`joblistID`),
KEY `status` (`status`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci COMMENT='Keep track of what the user has done';

Yes, you are correct. IN and NOT IN are particularly bad performers in mysql. Here is a revised version:
SELECT jl.where_to_do_it
FROM joblist jl INNER JOIN
userprogress up
ON (jl.joblistID = up.joblistID)
WHERE jl.what_to_do = 'video' and
not exists (
(SELECT 1
FROM joblist injl INNER JOIN
userprogress inup
ON (injl.joblistID = inup.joblistID)
WHERE (inup.status = 'finished' OR inup.status = 'skipped') and
inup.email = :email and
injl.what_to_do = 'video' and
ini1.joblistid = j1.joblistid
)
ORDER BY jl.joborder ASC
LIMIT 0,1

Looks like you're running in circles... Your subquery is looking for Videos with status finished or skipped and then in the outter query lookin for the ones that don't have that status, I would change that for a condition like this
SELECT jl.where_to_do_it FROM joblist AS jl
INNER JOIN userprogress AS up
ON (jl.joblistID = up.joblistID)
WHERE jl.what_to_do = 'video'
AND up.status <> 'finished' AND inup.status <> 'skipped'
AND up.email = :email
AND jl.what_to_do = 'video'
Or maybe I understand wrong, anyway the problem seems to be the NOT IN (i will not suggest to use this ever) instead try changing the subquery in the condition and do a Left join with it and add a condition And SQ.joblistID IS NULL, something like this
SELECT jl.where_to_do_it FROM joblist AS jl
INNER JOIN userprogress AS up
ON (jl.joblistID = up.joblistID)
LEFT JOIN (
SELECT injl.joblistID
FROM joblist AS injl
INNER JOIN userprogress AS inup
ON (injl.joblistID = inup.joblistID)
WHERE
(inup.status = 'finished' OR inup.status = 'skipped')
AND
inup.email = :email
AND
injl.what_to_do = 'video'
) SQ ON jl.joblistID = SQ.joblistID
WHERE jl.what_to_do = 'video'
AND SQ.joblistID IS NULL
ORDER BY jl.joborder ASC
But i think the first option will work...
Hope it helps

Related

MySQL query - taking ORDER BY off making query 100 times faster

I have found very long query in my system. The MySQL Slow Log says the following:
# Time: 2018-07-08T18:47:02.273314Z
# User#Host: server[server] # localhost [] Id: 1467
# Query_time: 97.251247 Lock_time: 0.000210 Rows_sent: 50 Rows_examined: 41646378
SET timestamp=1531075622;
SELECT n1.full_name AS sender_full_name, s1.email AS sender_email,
e.subject, e.body, e.attach, e.date, e.id, r.status,
n2.full_name AS receiver_full_name, s2.email AS receiver_email,
r.basket,
FROM email_routing r
JOIN email e ON e.id = r.message_id
JOIN people_emails s1 ON s1.id = r.sender_email_id
JOIN people n1 ON n1.id = s1.people_id
JOIN people_emails s2 ON s2.id = r.receiver_email_id
JOIN people n2 ON n2.id = s2.people_id
WHERE r.sender_email_id = 21897 ORDER BY e.date desc LIMIT 0, 50;
The EXPLAIN query shows no full table scan and the query using indexes:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE s1 NULL const PRIMARY PRIMARY 4 const 1 100.00 Using temporary; Using filesort
1 SIMPLE n1 NULL const PRIMARY,ppl PRIMARY 4 const 1 100.00 NULL
1 SIMPLE n2 NULL index PRIMARY,ppl ppl 771 NULL 1 100.00 Using index
1 SIMPLE s2 NULL index PRIMARY s2 771 NULL 3178 10.00 Using where; Using index; Using join buffer (Block Nested Loop)
1 SIMPLE r NULL ref bk1,bk2,msgid bk1 4 server.s2.id 440 6.60 Using where; Using index
1 SIMPLE e NULL eq_ref PRIMARY PRIMARY 4 server.r.message_id 1 100.00 NULL
Here is my SHOW CREATE TABLE queries for the used tables:
CREATE TABLE `email_routing` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`message_id` int(11) NOT NULL,
`sender_email_id` int(11) NOT NULL,
`receiver_email_id` int(11) NOT NULL,
`basket` int(11) NOT NULL,
`status` int(11) NOT NULL,
`popup` int(11) NOT NULL DEFAULT '0',
`tm` int(11) NOT NULL DEFAULT '0',
KEY `id` (`id`),
KEY `bk1` (`receiver_email_id`,`status`,`sender_email_id`,`message_id`,`basket`),
KEY `bk2` (`sender_email_id`,`tm`),
KEY `msgid` (`message_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1055796 DEFAULT CHARSET=utf8
-
CREATE TABLE `email` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`subject` text NOT NULL,
`body` text NOT NULL,
`date` datetime NOT NULL,
`attach` text NOT NULL,
`attach_dir` varchar(255) CHARACTER SET cp1251 DEFAULT NULL,
`attach_subject` varchar(255) DEFAULT NULL,
`attach_content` longtext,
`sphinx_synced` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `Index_2` (`attach_dir`),
KEY `dt` (`date`)
) ENGINE=InnoDB AUTO_INCREMENT=898001 DEFAULT CHARSET=utf8
-
CREATE TABLE `people_emails` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`nick` varchar(255) NOT NULL,
`email` varchar(255) NOT NULL,
`key_name` varchar(255) NOT NULL,
`people_id` int(11) NOT NULL,
`status` int(11) NOT NULL DEFAULT '0',
`activity` int(11) NOT NULL,
`internal_user_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `s2` (`email`,`people_id`)
) ENGINE=InnoDB AUTO_INCREMENT=22146 DEFAULT CHARSET=utf8
-
CREATE TABLE `people` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fname` varchar(255) CHARACTER SET cp1251 NOT NULL,
`lname` varchar(255) CHARACTER SET cp1251 NOT NULL,
`patronymic` varchar(255) CHARACTER SET cp1251 NOT NULL,
`gender` tinyint(1) NOT NULL,
`full_name` varchar(255) NOT NULL DEFAULT ' ',
`category` int(11) NOT NULL,
`people_type_id` int(255) DEFAULT NULL,
`tags` varchar(255) CHARACTER SET cp1251 NOT NULL,
`job` varchar(255) CHARACTER SET cp1251 NOT NULL,
`post` varchar(255) CHARACTER SET cp1251 NOT NULL,
`profession` varchar(255) CHARACTER SET cp1251 DEFAULT NULL,
`zip` varchar(16) CHARACTER SET cp1251 NOT NULL,
`country` int(11) DEFAULT NULL,
`region` varchar(10) NOT NULL,
`city` varchar(255) CHARACTER SET cp1251 NOT NULL,
`address` varchar(255) CHARACTER SET cp1251 NOT NULL,
`address_date` date DEFAULT NULL,
`last_update_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `ppl` (`id`,`full_name`)
) ENGINE=InnoDB AUTO_INCREMENT=415040 DEFAULT CHARSET=utf8
Here is the SHOW TABLE STATUS output for those 4 tables:
Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment
email InnoDB 10 Dynamic 753748 12079 9104785408 0 61112320 4194304 898167
email_routing InnoDB 10 Dynamic 900152 61 55132160 0 69419008 6291456 1056033
people InnoDB 10 Dynamic 9538 386 3686400 0 2785280 4194304 415040
people_emails InnoDB 10 Dynamic 3178 752 2392064 0 98304 4194304 22146
MySQL Version 5.7.22 Ubuntu 16.04
However I have noticed one thing - if I take ORDER BY out of the query, but leaving the LIMIT, then query runs almost instantly taking not more than 0.2 seconds. So I have started to think to run query with no ORDER BY and do sorting by PHP means like that but eventually that seems to complicated as using the LIMIT with no ORDER BY I get wronge range to sort.
Is there anything else I could do to speed up or optimize that query?
AS AN ALTERNATIVE I could do sorting and paging by my PHP code. I add addtional columnt into the SELECT ..., UNIX_TIMESTAMP(e.date) as ts and then do:
<?php
...
$main_query = $server->query($query);
$emails_list = $main_query->fetch_all(MYSQLI_ASSOC);
function cmp($a, $b) {
return strcmp($a['ts'], $b['ts']);
}
$emails_sorted = usort($emails_list, "cmp");
for ($i=$start;$i<$lenght;$i++)
{
$singe_email = $emails_sorted[$i]
// Format the output
}
But when I do that I get
Fatal error: Allowed memory size of 134217728 bytes exhausted
at line of $emails_sorted = usort($emails_list, "cmp");
Warning, I'm not very familiar with MySQL, in fact I'm mostly projecting MSSQL experience on top of things I (mostly) read about MySQL.
1) Potential workaround: is it safe to assume that email.id and email.date are always in the same order? From a functional point of view this seems logical as emails get added to the table over time and thus have an ever increasing auto-number... But maybe the initial load of the data was in a different/random order? Anyway, if it is, what happens if you ORDER BY e.id instead of ORDER BY e.date ?
2) Does adding a composite index on email (id, date) (in that order!) help?
3) If all of that does not help, splitting the query into 2 parts might help out the optimizer. (You may need to fix the syntax for MySQL)
-- Locate what we want first
CREATE TEMPORARY TABLE results (
SELECT e.id
r.basket
FROM email_routing r
JOIN email e ON e.id = r.message_id
WHERE r.sender_email_id = 21897
ORDER BY e.date desc LIMIT 0, 50 );
-- Again, having an index on email (id, date) seems like a good idea to me
-- (As a test you may want to add an index on results (id) here, shouldn't take long and
-- in MSSQl it would help build a better query plan, can't tell with MySQL)
-- return actual results
SELECT n1.full_name AS sender_full_name,
s1.email AS sender_email,
e.subject, e.body, e.attach, e.date, e.id, r.status,
n2.full_name AS receiver_full_name,
s2.email AS receiver_email,
r.basket,
FROM results r
JOIN email e ON e.id = r.message_id
JOIN people_emails s1 ON s1.id = r.sender_email_id
JOIN people n1 ON n1.id = s1.people_id
JOIN people_emails s2 ON s2.id = r.receiver_email_id
JOIN people n2 ON n2.id = s2.people_id
ORDER BY e.date desc
If your data comes back that quickly, how about wrapping it... but how many rows are actually GOING to be returned WITHOUT the LIMIT. Maybe you would still get better performance AFTER such as...
select PQ.*
from ( YourQueryWithoutOrderByAndLimt ) PQ
order by PQ.date desc
LIMIT 0, 50;
I suspect this is a case where the MySQL Join Optimizer overestimates the benefits of Block Nested Loop (BNL) join. You can try to turn off BNL by doing:
set optimizer_switch='block_nested_loop=off';
Hopefully this will provide a better join order. You could also try:
set optimizer_prune_level = 0;
to force the join optimizer to explore all possible join orders.
Another option is to use STRAIGHT_JOIN to force a particular join order. In this case, it seems the order as specified in the query text would be good. Hence, to force this particular join order you could write
SELECT STRAIGHT_JOIN ...
Note that whatever you do, you can not expect the query to be as fast as without ORDER BY. As long as you need to find the latest emails from a particular sender, and there is no information about sender in the email table, it is not possible to use an index to avoid sorting without going through all emails from all senders. Things would be different if you had information about date in the email_routing table. Then an index on that table could have been used to avoid sorting.
MySQL cannot use index for order by in your query because
The query joins many tables, and the columns in the ORDER BY are not
all from the first nonconstant table that is used to retrieve rows.
(This is the first table in the EXPLAIN output that does not have a
const join type.)
MySQL Order By Optimization

Mysql query with multiple selects results in high CPU load

I'm trying to do a link exchange script and run into a bit of trouble.
Each link can be visited by an IP address a number of x times (frequency in links table). Each visit costs a number of credits (spend limit given in limit in links table)
I've got the following tables:
CREATE TABLE IF NOT EXISTS `contor` (
`key` varchar(25) NOT NULL,
`uniqueHandler` varchar(30) DEFAULT NULL,
`uniqueLink` varchar(30) DEFAULT NULL,
`uniqueUser` varchar(30) DEFAULT NULL,
`owner` varchar(50) NOT NULL,
`ip` varchar(15) DEFAULT NULL,
`credits` float NOT NULL,
`tstamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`key`),
KEY `uniqueLink` (`uniqueLink`),
KEY `uniqueHandler` (`uniqueHandler`),
KEY `uniqueUser` (`uniqueUser`),
KEY `owner` (`owner`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `links` (
`unique` varchar(30) NOT NULL DEFAULT '',
`url` varchar(1000) DEFAULT NULL,
`frequency` varchar(5) DEFAULT NULL,
`limit` float NOT NULL DEFAULT '0',
PRIMARY KEY (`unique`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I've got the following query:
$link = MYSQL_QUERY("
SELECT *
FROM `links`
WHERE (SELECT count(key) FROM contor WHERE ip = '$ip' AND contor.uniqueLink = links.unique) <= `frequency`
AND (SELECT sum(credits) as cost FROM contor WHERE contor.uniqueLink = links.unique) <= `limit`")
There are 20 rows in the table links.
The problem is that whenever there are about 200k rows in the table contor the CPU load is huge.
After applying the solution provided by #Barmar:
Added composite index on (uniqueLink, ip) and droping all other indexes except PRIMARY, EXPLAIN gives me this:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY l ALL NULL NULL NULL NULL 18
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 15
2 DERIVED pop_contor index NULL contor_IX1 141 NULL 206122
Try using a join rather than a correlated subquery.
SELECT l.*
FROM links AS l
LEFT JOIN (
SELECT uniqueLink, SUM(ip = '$ip') AS ip_visits, SUM(credits) AS total_credits
FROM contor
GROUP BY uniqueLink
) AS c
ON c.uniqueLink = l.unique AND ip_visits <= frequency AND total_credits <= limit
If this doesn't help, try adding an index on contor.ip.
The current query is of the form:
SELECT l.*
FROM `links` l
WHERE l.frequency >= ( SELECT COUNT(ck.key)
FROM contor ck
WHERE ck.uniqueLink = l.unique
AND ck.ip = '$ip'
)
AND l.limit >= ( SELECT SUM(sc.credits)
FROM contor sc
WHERE sc.uniqueLink = l.unique
)
Those correlated subqueries are going to each your lunch. And your lunchbox too.
I'd suggest testing an inline view that performs both of the aggregations from contor in one pass, and then join the result from that to the links table.
Something like this:
SELECT l.*
FROM ( SELECT c.uniqueLink
, SUM(c.ip = '$ip' AND c.key IS NOT NULL) AS count_key
, SUM(c.credits) AS sum_credits
FROM `contor` c
GROUP
BY c.uniqueLink
) d
JOIN `links` l
ON l.unique = d.uniqueLink
AND l.frequency >= d.count_key
AND l.limit >= d.sum_credits
For optimal performance of the aggregation inline view query, provide a covering index that MySQL can use to optimize the GROUP BY (avoiding a Using filesort operation)
CREATE INDEX `contor_IX1` ON `contor` (`uniqueLink`, `credits`, `ip`) ;
Adding that index renders the uniqueLink index redundant, so also...
DROP INDEX `uniqueLink` ON `contor` ;
EDIT
Since we have a guarantee that contor.key column is non-NULL (i.e. the NOT NULL constraint), this part of the query above is unneeded AND c.key IS NOT NULL, and can be removed. (I also removed the key column from the covering index definition above.)
SELECT l.*
FROM ( SELECT c.uniqueLink
, SUM(c.ip = '$ip') AS count_key
, SUM(c.credits) AS sum_credits
FROM `contor` c
GROUP
BY c.uniqueLink
) d
JOIN `links` l
ON l.unique = d.uniqueLink
AND l.frequency >= d.count_key
AND l.limit >= d.sum_credits

Improve speed of MySQL query with 5 left joins

Working on a support ticketing system with not a lot of tickets (~3,000). To get a summary grid of ticket information, there are five LEFT JOIN statements on custom field table (j25_field_value) containing about 10,000 records. The query runs too long (~10 seconds) and in cases with a WHERE clause, it runs even longer (up to ~30 seconds or more).
Any suggestions for improving the query to reduce the time to run?
Four tables:
j25_support_tickets
CREATE TABLE `j25_support_tickets` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`category_id` int(11) NOT NULL DEFAULT '0',
`user_id` int(11) DEFAULT NULL,
`email` varchar(50) DEFAULT NULL,
`subject` varchar(255) DEFAULT NULL,
`message` text,
`modified_date` datetime DEFAULT NULL,
`priority_id` tinyint(3) unsigned DEFAULT NULL,
`status_id` tinyint(3) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=3868 DEFAULT CHARSET=utf8
j25_support_priorities
CREATE TABLE `j25_support_priorities` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=14 DEFAULT CHARSET=utf8
j25_support_statuses
CREATE TABLE `j25_support_statuses` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=7 DEFAULT CHARSET=utf8
j25_field_value (id, ticket_id, field_id, field_value)
CREATE TABLE `j25_support_field_value` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`ticket_id` int(11) DEFAULT NULL,
`field_id` int(11) DEFAULT NULL,
`field_value` tinytext,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=10889 DEFAULT CHARSET=utf8
Also, ran this:
SELECT LENGTH(field_value) len FROM j25_support_field_value ORDER BY len DESC LIMIT 1
note: the result = 38
The query:
SELECT DISTINCT t.id as ID
, (select p.title from j25_support_priorities p where p.id = t.priority_id) as Priority
, (select s.title from j25_support_statuses s where s.id = t.status_id) as Status
, t.subject as Subject
, t.email as SubmittedByEmail
, type.field_value AS IssueType
, ver.field_value AS Version
, utype.field_value AS UserType
, cust.field_value AS Company
, refno.field_value AS RefNo
, t.modified_date as Modified
FROM j25_support_tickets AS t
LEFT JOIN j25_support_field_value AS type ON t.id = type.ticket_id AND type.field_id =1
LEFT JOIN j25_support_field_value AS ver ON t.id = ver.ticket_id AND ver.field_id =2
LEFT JOIN j25_support_field_value AS utype ON t.id = utype.ticket_id AND utype.field_id =3
LEFT JOIN j25_support_field_value AS cust ON t.id = cust.ticket_id AND cust.field_id =4
LEFT JOIN j25_support_field_value AS refno ON t.id = refno.ticket_id AND refno.field_id =5
ALTER TABLE j25_support_field_value
ADD INDEX (`ticket_id`,`field_id`,`field_value`(50))
This index will work as a covering index for your query. It will allow the joins to use only this index to look up the values. It should perform massively faster than without this index, since currently your query would have to read every row in the table to find what matches each combination of ticket_id and field_id.
I would also suggest converting your tables to InnoDB engine, unless you have a very explicit reason for using MyISAM.
ALTER TABLE tablename ENGINE=InnoDB
As above - a better index would help. You could probably then simplify your query into something like this too (join to the table only once):
SELECT t.id as ID
, p.title as Priority
, s.title as Status
, t.subject as Subject
, t.email as SubmittedByEmail
, case when v.field_id=1 then v.field_value else null end as IssueType
, case when v.field_id=2 then v.field_value else null end as Version
, case when v.field_id=3 then v.field_value else null end as UserType
, case when v.field_id=4 then v.field_value else null end as Company
, case when v.field_id=5 then v.field_value else null end as RefNo
, t.modified_date as Modified
FROM j25_support_tickets AS t
LEFT JOIN j25_support_field_value v ON t.id = v.ticket_id
LEFT JOIN j25_support_priorities p ON p.id = t.priority_id
LEFT JOIN j25_support_statuses s ON s.id = t.status_id;
You can do away with the subqueries for starters and just get them from another join. You can add an index to j25_support_field_value
alter table j25_support_field_value add key(id, field_type);
I assume there is an index on id in j25_support_tickets - if not and if they are unique, add a unique index alter table j25_support_tickets add unique key(id); If they're not unique, remove the word unique from that statement.
In MySQL, a join usually requires an index on the field(s) that you are using to join on. This will hold up and produce very reasonable results with huge tables (100m+), if you follow that rule, you will not go wrong.
are the ids in j25_support_tickets unique? If they are you can do away with the distinct - if not, or if you are getting exact dupicates in each row, still do away with the distinct and add a group by t.id to the end of this:
SELECT t.id as ID
, p.title as Priority
, s.title as Status
, t.subject as Subject
, t.email as SubmittedByEmail
, type.field_value AS IssueType
, ver.field_value AS Version
, utype.field_value AS UserType
, cust.field_value AS Company
, refno.field_value AS RefNo
, t.modified_date as Modified
FROM j25_support_tickets AS t
LEFT JOIN j25_support_field_value AS type ON t.id = type.ticket_id AND type.field_id =1
LEFT JOIN j25_support_field_value AS ver ON t.id = ver.ticket_id AND ver.field_id =2
LEFT JOIN j25_support_field_value AS utype ON t.id = utype.ticket_id AND utype.field_id =3
LEFT JOIN j25_support_field_value AS cust ON t.id = cust.ticket_id AND cust.field_id =4
LEFT JOIN j25_support_field_value AS refno ON t.id = refno.ticket_id AND refno.field_id =5
LEFT JOIN j25_support_priorities p ON p.id = t.priority_id
LEFT JOIN j25_support_statuses s ON s.id = t.status_id;
Switch to InnoDB.
After switching to InnoDB, make the PRIMARY KEY for j25_support_field_value be (ticket_id, field_id) (and get rid if id). (Tacking on field_value(50) will hurt, not help.)
A PRIMARY KEY is a UNIQUE KEY, so don't have both.
Use VARCHAR(255) instead of the nearly-equivalent TINYTEXT.
EAV schema sucks. My ran on EAV.

Multiple SQL joins difficult query

I have a weird schema but when it was designed it seemed like a good idea at the time. I have one master table, lesson_objects, that has foreign keys linking to the vocabulary, video and quizzes tables.
vocab table:
CREATE TABLE `se_vocab` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`vocab_word` VARCHAR(255) NOT NULL,
`vocab_audio` INT(10) NULL DEFAULT '0',
`vocab_image` INT(10) NULL DEFAULT '0',
PRIMARY KEY (`id`)
)
video table:
CREATE TABLE `se_video` (
`id` INT(10) NOT NULL AUTO_INCREMENT,
`video_name` VARCHAR(255) NOT NULL,
`video_description` MEDIUMTEXT NOT NULL,
`video_file_name` VARCHAR(50) NULL DEFAULT NULL,
`video_url` VARCHAR(255) NOT NULL,
PRIMARY KEY (`id`)
)
quizzes table:
CREATE TABLE `se_quizzes` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`quiz_name` VARCHAR(80) NOT NULL,
`quiz_description` TINYTEXT NULL,
PRIMARY KEY (`id`)
)
lesson objects (contains foreign keys of previous tables)
CREATE TABLE `se_lesson_org` (
`id` INT(10) NOT NULL AUTO_INCREMENT,
`lesson_id` INT(10) NOT NULL,
`section_object_type` ENUM('video','vocabulary','quiz') NOT NULL,
`section_object_id` INT(10) NOT NULL,
PRIMARY KEY (`id`)
)
I'm trying to create a query that returns all the records from lesson_objects but also includes the data in the columns for the type in that record (vocabulary, etc.)
For example:
Only my query returns no rows, while ideally it should turn multiple rows with every record containing SOME empty columns. E.g. if it isn't vocabulary, the columns for quiz and video will be empty.
My attempts are very bad, but here is one for the sake of guidance:
SELECT
lo.id, lo.section_object_type, lo.section_object_id,
vo.id, vo.vocab_text, vo.vocab_image, vo.vocab_audio
vi.id, vi.video_name, vi.video_url,
q.id, q.quiz_name
FROM se_lesson_org lo, se_vocab vo, se_video vi, se_quizzes q
WHERE lo.section_object_id = vo.id
OR lo.section_object_id = vi.id
OR lo.section_object_id = q.id
Any help / comments would be appreciated. Thanks.
Use LEFT JOIN to return all rows from se_lesson_org. Additionally, add a JOIN condition to match for the specific section_object_type
SELECT
lo.*,
vo.*,
vi.*,
q.*
FROM se_lesson_org lo
LEFT JOIN se_vocab vo
ON vo.id = lo.section_object_id
AND lo.section_object_type = 'vocabulary'
LEFT JOIN se_video vi
ON vi.id = lo.section_object_id
AND lo.section_object_type = 'video'
LEFT JOIN se_quizzes q
ON q.id = lo.section_object_id
AND lo.section_object_type = 'quiz'
Note: Avoid using old-style JOIN syntax. Read this article by Aaron Bertrand.
Sounds like you are looking for a LEFT JOIN on each of these tables. That way if the foreign key is valid you will show values and if it isn't valid you will just get NULL for the related columns.
SELECT
lo.id, lo.section_object_type, lo.section_object_id,
vo.id, vo.vocab_text, vo.vocab_image, vo.vocab_audio
vi.id, vi.video_name, vi.video_url,
q.id, q.quiz_name
FROM se_lesson_org lo
LEFT JOIN se_vocab vo ON vo.id = lo.section_object_id
AND lo.section_object_type = 'vocabulary'
LEFT JOIN se_video vi ON vi.id = lo.section_object_id
AND lo.section_object_type = 'video'
LEFT JOIN se_quizzes q ON q.id = lo.section_object_id
AND lo.section_object_type = 'quiz'
Notice also how this syntax makes it clear how each table is being connected into the rest of the query rather than having a whole mess of conditions in the WHERE clause at the end.

LINQ Performance Problem

I have a Joomla site that uses JomSocial. I have a .NET web app that I'm working on that will eventually replace Joomla since I prefer .NET over PHP. Right now I have .NET mobile site that users are using.
LINQ to Entity has made development very speedy, but I'm now in the process of trying to fix performance issues. Sending messages to one another is the #1 activity and there's over 40k messages sent so far. This is also where I have performance issue. Below are the two tables JomSocial uses for storing messages. Below that is my current LINQ code that I'm using, which is returning the results I want, it's just taking two seconds to do it.
I think by the column names you probably can figure out what the data looks like, but if not I can create some and then post that on here in a few as I have to run out for a little bit. I should mention that I'm using the Entity Framework with .NET 3.5 and MySQL w/ the MySQL .NET Connector.
Tables:
delimiter $$
CREATE TABLE `jos_community_msg` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`from` int(10) unsigned NOT NULL,
`parent` int(10) unsigned NOT NULL,
`deleted` tinyint(3) unsigned DEFAULT '0',
`from_name` varchar(45) NOT NULL,
`posted_on` datetime DEFAULT NULL,
`subject` tinytext NOT NULL,
`body` text NOT NULL,
PRIMARY KEY (`id`),
KEY `parent` (`parent`),
KEY `deleted` (`deleted`),
KEY `from` (`from`)
) ENGINE=MyISAM AUTO_INCREMENT=340 DEFAULT CHARSET=utf8$$
delimiter $$
CREATE TABLE `jos_community_msg_recepient` (
`msg_id` int(10) unsigned NOT NULL,
`msg_parent` int(10) unsigned NOT NULL DEFAULT '0',
`msg_from` int(10) unsigned NOT NULL,
`to` int(10) unsigned NOT NULL,
`bcc` tinyint(3) unsigned DEFAULT '0',
`is_read` tinyint(3) unsigned DEFAULT '0',
`deleted` tinyint(3) unsigned DEFAULT '0',
UNIQUE KEY `un` (`msg_id`,`to`),
KEY `msg_id` (`msg_id`),
KEY `to` (`to`),
KEY `idx_isread_to_deleted` (`is_read`,`to`,`deleted`),
KEY `from` (`msg_from`),
KEY `parent` (`msg_parent`),
KEY `deleted` (`deleted`),
KEY `to_deleted` (`deleted`,`to`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8$$
LINQ:
var messages = (
from b in context.jos_community_msg
join i in (
from i in context.jos_community_msg_recepient
join a in context.jos_community_msg on i.msg_id equals a.id
where i.to == userId && && a.deleted == 0
group a by a.parent into g
select g.Max(p => p.id)) on b.id equals i
join a in context.jos_community_msg_recepient on i equals a.msg_id
orderby b.id descending
select new MessageHeaderItem()
{
IsDeleted = false,
IsRead = (a.is_read.Value == 0) ? false : true,
MessageId = b.parent,
Sent = b.posted_on.Value,
Subject = b.subject,
UserId = a.msg_from
});
total = messages.Count();
return messages.Skip(start).Take(max).ToList();
I've tried a bunch of variations, but nothing has made it any quicker. Having the sub select is not good for performance, but I'm not sure how else to get just the last message in the message chain from that table.
Update:
Here's the SQL being generated:
SELECT
`Limit1`.`C1`,
`Limit1`.`C2`,
`Limit1`.`C3`,
`Limit1`.`parent`,
`Limit1`.`posted_on`,
`Limit1`.`subject`,
`Limit1`.`msg_from`,
`Limit1`.`C4`,
`Limit1`.`C5`,
`Limit1`.`C6`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`parent`,
`Extent1`.`posted_on`,
`Extent1`.`subject`,
`Extent6`.`msg_from`,
1 AS `C1`,
cast(0 as decimal(0,0)) AS `C2`,
CASE WHEN (0 = (`Extent6`.`is_read`)) THEN (cast(0 as decimal(0,0))) ELSE (cast(1 as decimal(0,0))) END AS `C3`,
'Test' AS `C4`,
'' AS `C5`,
'' AS `C6`
FROM `jos_community_msg` AS `Extent1` INNER JOIN (SELECT
(SELECT
Max(`Extent5`.`id`) AS `A1`
FROM (SELECT
`jos_community_msg_recepient`.`bcc`,
`jos_community_msg_recepient`.`deleted`,
`jos_community_msg_recepient`.`is_read`,
`jos_community_msg_recepient`.`msg_from`,
`jos_community_msg_recepient`.`msg_id`,
`jos_community_msg_recepient`.`msg_parent`,
`jos_community_msg_recepient`.`to`
FROM `jos_community_msg_recepient` AS `jos_community_msg_recepient`) AS `Extent4` INNER JOIN `jos_community_msg` AS `Extent5` ON (`Extent4`.`msg_id` = `Extent5`.`id`) OR ((`Extent4`.`msg_id` IS NULL) AND (`Extent5`.`id` IS NULL))
WHERE ((`Extent4`.`to` = 62) AND (0 = (`Extent5`.`deleted`))) AND ((`Extent5`.`parent` = `Project2`.`parent`) OR ((`Extent5`.`parent` IS NULL) AND (`Project2`.`parent` IS NULL)))) AS `C1`
FROM (SELECT
62 AS `p__linq__5`,
`Distinct1`.`parent`
FROM (SELECT DISTINCT
`Extent3`.`parent`
FROM (SELECT
`jos_community_msg_recepient`.`bcc`,
`jos_community_msg_recepient`.`deleted`,
`jos_community_msg_recepient`.`is_read`,
`jos_community_msg_recepient`.`msg_from`,
`jos_community_msg_recepient`.`msg_id`,
`jos_community_msg_recepient`.`msg_parent`,
`jos_community_msg_recepient`.`to`
FROM `jos_community_msg_recepient` AS `jos_community_msg_recepient`) AS `Extent2` INNER JOIN `jos_community_msg` AS `Extent3` ON (`Extent2`.`msg_id` = `Extent3`.`id`) OR ((`Extent2`.`msg_id` IS NULL) AND (`Extent3`.`id` IS NULL))
WHERE (`Extent2`.`to` = 62) AND (0 = (`Extent3`.`deleted`))) AS `Distinct1`) AS `Project2`) AS `Project3` ON (`Extent1`.`id` = `Project3`.`C1`) OR ((`Extent1`.`id` IS NULL) AND (`Project3`.`C1` IS NULL)) INNER JOIN (SELECT
`jos_community_msg_recepient`.`bcc`,
`jos_community_msg_recepient`.`deleted`,
`jos_community_msg_recepient`.`is_read`,
`jos_community_msg_recepient`.`msg_from`,
`jos_community_msg_recepient`.`msg_id`,
`jos_community_msg_recepient`.`msg_parent`,
`jos_community_msg_recepient`.`to`
FROM `jos_community_msg_recepient` AS `jos_community_msg_recepient`) AS `Extent6` ON (`Project3`.`C1` = `Extent6`.`msg_id`) OR ((`Project3`.`C1` IS NULL) AND (`Extent6`.`msg_id` IS NULL))
ORDER BY
`id` DESC LIMIT 0,16) AS `Limit1`;
Here's the explain from MySQL:
1 PRIMARY <derived2> ALL 16
2 DERIVED <derived3> ALL 55 Using temporary; Using filesort
2 DERIVED Extent1 eq_ref PRIMARY PRIMARY 4 Project3.C1 1 Using where
2 DERIVED <derived9> ALL 333 Using where; Using join buffer
9 DERIVED jos_community_msg_recepient ALL 333
3 DERIVED <derived6> ALL 55
6 DERIVED <derived7> ALL 55
7 DERIVED <derived8> ALL 333 Using where; Using temporary
7 DERIVED Extent3 eq_ref PRIMARY,deleted PRIMARY 4 Extent2.msg_id 1 Using where; Distinct
8 DERIVED jos_community_msg_recepient ALL 333
4 DEPENDENT SUBQUERY Extent5 ref PRIMARY,parent,deleted parent 4 Project2.parent 2 Using where
4 DEPENDENT SUBQUERY <derived5> ALL 333 Using where; Using join buffer
5 DERIVED jos_community_msg_recepient ALL 333
Your LINQ query looks pretty good. You use a projection to a DTO (MessageHeaderItem) which allows LINQ to Entities to create a very optimal query. You should however use the SQL profiler to check the actual SQL query that is executed. Perhaps LINQ to Entities fires many queries under the covers. It is also possible that you need some index tuning. Copy the executed query from the SQL profiler to the SQL tuning wizard (part of SQL Management Studio) and see what advice it comes up with.