Want to learn to improve slow mysql query - mysql

I have a MySQL query to select all product id's with certain filters applied to the products. This query
works but I want to learn to improve this query. Alternatives for this query are welcome with explanation.
SELECT kkx_products.id from kkx_products WHERE display = 'yes' AND id in
(SELECT product_id FROM `kkx_filters_products` WHERE `filter_id` in
(SELECT id FROM `kkx_filters` WHERE kkx_filters.urlname = "comics" OR kkx_filters.urlname = "comicsgraphicnovels")
group by product_id having count(*) = 2)
ORDER BY kkx_products.id desc LIMIT 0, 24
I've included the structure of the tables being used in the query.
EXPLAINkkx_filters;
Field Type Null Key Default Extra
id int(11) unsigned NO PRI NULL auto_increment
name varchar(50) NO
filtergroup_id int(11) YES MUL NULL
urlname varchar(50) NO MUL NULL
date_modified timestamp NO CURRENT_TIMESTAMP
orderid float(11,2) NO NULL
EXPLAIN kkx_filters_products;
Field Type Null Key Default Extra
filter_id int(11) NO PRI 0
product_id int(11) NO PRI 0
EXPLAIN kkx_products;
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
title varchar(255) NO
urlname varchar(50) NO MUL
description longtext NO NULL
price float(11,2) NO NULL
orderid float(11,2) NO NULL
imageurl varchar(255) NO
date_created datetime NO NULL
date_modified timestamp NO CURRENT_TIMESTAMP
created_by varchar(11) NO NULL
modified_by varchar(11) NO NULL
productnumber varchar(32) NO
instock enum('yes','no') NO yes
display enum('yes','no') NO yes

Instead of using inline queries in your criteria statements, try using the EXISTS block...
http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.html
You will be able to see the difference in your explain plan. Before you had a query executing for each and every record in your result set, and every result in that inline view result set had its own query executing to.
You see how nested inline views can create an exponential increase in cost. EXISTS doesn't work that way.
Example of the use of EXISTS:
Consider tbl1 has columns id and data. tbl2 has columns id, parentid, and data.
SELECT a.*
FROM tbl1 a
WHERE 1 = 1
AND EXISTS (
SELECT NULL
FROM tbl2 b
WHERE b.parentid = a.id
AND b.data = 'SOME CONDITIONAL DATA TO CONSTRAIN ON'
)
1) We can assume the 1 = 1 is some condition that equates to true for every record
2) Doesn't matter what we select in the EXISTS statment really, NULL is fine.
3) It is important to look at b.parentid = a.id, this links our exist statement to the result set

Related

Select all movies that was not watched by the user and limit the result to 20

I need to select all movies that a user has not watched yet.
My SQL query to grab the last 20 movies looks like this:
SELECT movies.* FROM movies, hdd WHERE hdd.id=movies.hdd_id and hdd.status='1' and movies.skip!='1' order by id desc limit 20
The movie table looks like this:
id int(11) Incrément automatique
hdd_id int(20)
tmdb_id int(20) NULL
imdb_id text NULL
file_path text
ftp_path text NULL
file_name text
resolution text NULL
timestamp int(11) NULL
skip int(2)
credits int(2)
title varchar(255) NULL
original_title varchar(255) NULL
adult int(2) NULL
categ text NULL
collection text NULL
companies text NULL
language text NULL
lang text NULL
rating text NULL
mpaa text NULL
tagline text NULL
overview text NULL
budget text NULL
homepage text NULL
popularity text NULL
runtime varchar(255) NULL
revenue varchar(255) NULL
release_date date NULL
vote_average varchar(255) NULL
vote_count varchar(255) NULL
movie_poster_path varchar(255) NULL
movie_poster varchar(255) NULL
movie_backdrop_path varchar(255) NULL
movie_backdrop varchar(255) NULL
This selects the movies only if the HDD status is online and the crawler did not skip it.
Now the problem is that the watch log is in a separate table:
The table looks like this:
id int(11) Incrément automatique
type varchar(255)
ref int(9) NULL
membre int(9) NULL
counter int(9) NULL
duration varchar(255)
currentTime varchar(255)
Membre is the user id and ref is the movie tmdb_id
This is what I tried so fare
SELECT movies.* FROM movies, hdd, watch WHERE hdd.id=movies.hdd_id and hdd.status='1' and movies.skip!='1' and (watch.membre='$_SESSION[id]' and watch.ref=movies.tmdb_id) order by id desc limit 20
But of course this is not working. I think the output is backwards. Instead of returning the unwatched stuff, it's returning the watched movies.
you need to fliter movies from the whole list which is already watched. using left join it can be filtered. try this.
SELECT movies.* FROM movies left join watch on watch.ref=movies.tmdb_id, hdd
WHERE hdd.id=movies.hdd_id and hdd.status='1'and movies.skip!='1' and
watch.membre = '$_SESSION[id]'
order by id desc limit 20
What you need is to LEFT JOIN watch and then check that there is no matching entry in watch. I think this will work:
SELECT movies.*
FROM movies
JOIN hdd ON hdd.id = movies.hdd_id AND movies.skip != 1
LEFT JOIN watch ON watch.ref = movies.tmdb_id AND watch.membre='$_SESSION[id]'
WHERE watch.id IS NULL
ORDER BY id DESC
LIMIT 20
Alternatively to keep it in the same style without JOINs (if that is your preference for whatever reason), you can also do this:
SELECT movies.*
FROM movies, hdd
WHERE hdd.id = movies.hdd_id and hdd.status = '1' and movies.skip != '1' and
NOT EXISTS(SELECT 1 FROM watch WHERE watch.membre = '$_SESSION[id]' and watch.ref = movies.tmdb_id)
ORDER BY id desc
LIMIT 20

Any way to optimize this MySQL query? (Resource intense)

My app needs to run this query pretty often, which gets a list of user data for the app to display. The problem is that subquery about the user_quiz is resource heavy and calculating the rankings are also very CPU intense too.
Benchmark: ~.5 second each run
When it will be run:
When the user want to see their ranking
When the user want to see other people's ranking
Getting a list of user's friends
.5 second it's a really long time considering this query will be run pretty often. Is there anything I could do to optimize this query?
Table for user:
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`firstname` varchar(100) DEFAULT NULL,
`lastname` varchar(100) DEFAULT NULL,
`password` varchar(20) NOT NULL,
`email` varchar(300) NOT NULL,
`verified` tinyint(10) DEFAULT NULL,
`avatar` varchar(300) DEFAULT NULL,
`points_total` int(11) unsigned NOT NULL DEFAULT '0',
`points_today` int(11) unsigned NOT NULL DEFAULT '0',
`number_correctanswer` int(11) unsigned NOT NULL DEFAULT '0',
`number_watchedvideo` int(11) unsigned NOT NULL DEFAULT '0',
`create_time` datetime NOT NULL,
`type` tinyint(1) unsigned NOT NULL DEFAULT '1',
`number_win` int(11) unsigned NOT NULL DEFAULT '0',
`number_lost` int(11) unsigned NOT NULL DEFAULT '0',
`number_tie` int(11) unsigned NOT NULL DEFAULT '0',
`level` int(1) unsigned NOT NULL DEFAULT '0',
`islogined` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=230 DEFAULT CHARSET=utf8;
Table for user_quiz:
CREATE TABLE `user_quiz` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`question_id` int(11) NOT NULL,
`is_answercorrect` int(11) unsigned NOT NULL DEFAULT '0',
`question_answer_datetime` datetime NOT NULL,
`score` int(1) DEFAULT NULL,
`quarter` int(1) DEFAULT NULL,
`game_type` int(1) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=9816 DEFAULT CHARSET=utf8;
Table for user_starter:
CREATE TABLE `user_starter` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`result` int(1) DEFAULT NULL,
`created_date` date DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=456 DEFAULT CHARSET=utf8mb4;
My indexes:
Table: user
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
user 0 PRIMARY 1 id A 32 BTREE
Table: user_quiz
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
user_quiz 0 PRIMARY 1 id A 9462 BTREE
user_quiz 1 user_id 1 user_id A 270 BTREE
Table: user_starter
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
user_starter 0 PRIMARY 1 id A 454 BTREE
user_starter 1 user_id 1 user_id A 227 YES BTREE
Query:
SET #curRank = 0;
SET #lastPlayerPoints = 0;
SELECT
sub.*,
#curRank := IF(#lastPlayerPoints!=points_week, #curRank + 1, #curRank) AS rank,
#lastPlayerPoints := points_week AS db_PPW
FROM (
SELECT u.id,u.firstname,u.lastname,u.email,u.avatar,u.type,u.points_total,u.number_win,u.number_lost,u.number_tie,u.verified,
COALESCE(SUM(uq.score),0) as points_week,
COALESCE(us.number_lost,0) as number_week_lost,
COALESCE(us.number_win,0) as number_week_win,
(select MAX(question_answer_datetime) from user_quiz WHERE user_id = u.id and game_type = 1) as lastFrdFight,
(select MAX(question_answer_datetime) from user_quiz WHERE user_id = u.id and game_type = 2) as lastBotFight
FROM `user` u
LEFT JOIN (SELECT user_id,
count(case when result=1 then 1 else null end) as number_win,
count(case when result=-1 then 1 else null end) as number_lost
from user_starter where created_date BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 05:10:27' ) us ON u.id = us.user_id
LEFT JOIN (SELECT * FROM user_quiz WHERE question_answer_datetime BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 00:00:00') uq on u.id = uq.user_id
GROUP BY u.id ORDER BY points_week DESC, u.lastname ASC, u.firstname ASC
) as sub
EXPLAIN:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 3027 100
2 DERIVED u ALL PRIMARY 32 100 Using temporary; Using filesort
2 DERIVED <derived5> ALL 1 100 Using where; Using join buffer (Block Nested Loop)
2 DERIVED <derived6> ref <auto_key0> <auto_key0> 4 fancard.u.id 94 100
6 DERIVED user_quiz ALL 9461 100 Using where
5 DERIVED user_starter ALL 454 100 Using where
4 DEPENDENT SUBQUERY user_quiz ref user_id user_id 4 func 35 100 Using where
3 DEPENDENT SUBQUERY user_quiz ref user_id user_id 4 func 35 100 Using where
Example output and expected output:
Bench mark: around .5 second
The following index should make the subquery to user_quiz ultra fast.
ALTER TABLE user_quiz
ADD INDEX (`user_id`,`game_type`,`question_answer_datetime`)
Please provide SHOW CREATE TABLE tablename statements for all tables, as that will help with additional optimizations.
Update #1
Alright, I've had some time to look things over, and fortunately there a appears to be a lot of relatively low hanging fruit in terms of optimization.
Here are all the indexes to add:
ALTER TABLE user_quiz
ADD INDEX `userGametypeAnswerDatetimes` (`user_id`,`game_type`,`question_answer_datetime`)
ALTER TABLE user_quiz
ADD INDEX `userAnswerScores` (`user_id`,`question_answer_datetime`,`score`)
ALTER TABLE user_starter
ADD INDEX `userResultDates` (`user_id`,`result`,`created_date`)
Note that the names (such as userGametypeAnswerDatetimes) are optional, and you can name them to whatever makes the most sense to you. But, in general, it's good to put specific names on your custom indexes (simply for organization purposes.)
Now, here is your query that should work will with those new indexes:
SET #curRank = 0;
SET #lastPlayerPoints = 0;
SELECT
sub.*,
#curRank := IF(#lastPlayerPoints!=points_week, #curRank + 1, #curRank) AS rank,
#lastPlayerPoints := points_week AS db_PPW
FROM (
SELECT u.id,
u.firstname,
u.lastname,
u.email,
u.avatar,
u.type,
u.points_total,
u.number_win,
u.number_lost,
u.number_tie,
u.verified,
COALESCE(user_scores.score,0) as points_week,
COALESCE(user_losses.number_lost,0) as number_week_lost,
COALESCE(user_wins.number_win,0) as number_week_win,
(
select MAX(question_answer_datetime)
from user_quiz
WHERE user_id = u.id and game_type = 1
) as lastFrdFight,
(
select MAX(question_answer_datetime)
from user_quiz
WHERE user_id = u.id
and game_type = 2
) as lastBotFight
FROM `user` u
LEFT OUTER JOIN (
SELECT user_id,
COUNT(*) AS number_won
from user_starter
WHERE created_date BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 05:10:27'
AND result = 1
GROUP BY user_id
) user_wins
ON user_wins.user_id = u.user_id
LEFT OUTER JOIN (
SELECT user_id,
COUNT(*) AS number_lost
from user_starter
WHERE created_date BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 05:10:27'
AND result = -1
GROUP BY user_id
) user_losses
ON user_losses.user_id = u.user_id
LEFT OUTER JOIN (
SELECT SUM(score)
FROM user_quiz
WHERE question_answer_datetime
BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 00:00:00'
GROUP BY user_id
) user_scores
ON u.id = user_scores.user_id
ORDER BY points_week DESC, u.lastname ASC, u.firstname ASC
) as sub
Note: This is not necessarily the best result. It depends a LOT on your data set, as to whether this is necessarily the best, and sometimes you need to do a bit of trial and error.
A hint as to what you can use trial and error on is the structure of how we query the lastFrdFight and lastBotFight verses how we query points_week, number_week_lost, number_week_win. All of these could either be done in the select statement (like the first two are in my query) or could be done by joining to a subquery result (like the last three do, in my query.)
Mix and match to see what works best. In general, I've found the joining to a subquery to be fastest when you have a large number of rows in the outer query (in this case, querying the user table.) This is because it only needs to get the results once, and then can just match them up on a user by user basis. Other times, it can be better to have the query just in the SELECT clause - this will run MUCH faster, since there are more constants (the user_id is already known), but has to run for each row. So it's a trade off, and why you sometimes need to use trial and error.
Why do the indexes work?
So, you may be wondering why I made the indexes as I did. If you are familiar with phone books (in this age of smartphones, that's no longer a valid assumption I can make) then we can use that as an analogy:
If you had a composite index of phonebookIndex (lastname,firstname,email) on your user table (example here! you don' actually need to add that index!) you would have a result similar to what a phone book provides. (Using email instead of phone number.)
Each index is an internal copy of the data in the overall table. With this phonebookIndex there would internally be stored a list of all users with their lastname, then their first name, and then their email, and each of these would be ordered, just like a phone book.
Why is that useful? Consider when you know someone's first and last name. You can quickly flip to where their last name is, then quickly go through that list of everyone with their last name, finding the first name you want, so obtaining the email.
Indexes work in exactly the same way, in terms of how the database looks at them.
Consider the userGametypeAnswerDatetimes index I defined above, and how we query that index in the lastFrdFight SELECT subquery.
(
select MAX(question_answer_datetime)
from user_quiz
WHERE user_id = u.id and game_type = 1
) as lastFrdFight
Notice how we have both the user_id (from the outer query) and the game_type as constants. That is exactly like our example earlier, with having the first and last name, and wanting to look up an email/phone number. In this case, we are looking for the MAX of the 3rd value in the index. Still easy: All the values are ordered, so if this index was sitting in front of us, we could just flip to the specific user_id, then look at the section with all game_type=1 and then just pick the last value to find the maximum. Very very fast. Same for the database. It can find this value extremely fast, which is why you saw an 80%+ reduction in your overall query time.
So, that's how indexes work, and why I choose these indexes as I did.
Be aware, that the more indexes you have, the more you'll see slowdowns when doing inserts and updates. But, if you are reading a lot more from your tables than you are writing, this is usually a more than acceptable trade off.
So, give these changes a shot, and let me know how it performs. Please provide the new EXPLAIN plan if you want further optimization help. Also, this should give you quite a bit of tools to use trial and error to see what does work at what doesn't. All my changes are fairly independent of each other, so you can swap them in and out with your original query pieces to see how each one works.

MySQL: If select in table X is empty, do select in table Y

in one query I would like to select information from table X.
however if table X doesn't return any information I would like to retrieve data from table Y.
Apart from each other the queries would look like this:
SELECT * FROM tableY WHERE user_id=1
SELECT * FROM tableX WHERE id=1
I tried the following to combine this, but it doesn't seem to work
SELECT * FROM tableY WHERE user_id=
IF (EXISTS (SELECT * FROM tableX WHERE id=1), 1, 0)
and of course the other way around
SELECT * FROM tableX WHERE id=
IF (EXISTS (SELECT * FROM tableY WHERE user_id=1), 1, 0)
Bot versions will only execute the first query, but not the second.
So I am kinda stuck here.
I also tried this, but as the tables do not have the same rows this shouldn't work... and thats correct it doesn't work:
SELECT *
FROM orbib.billing_address
WHERE user_id=1
UNION ALL
SELECT *
FROM orbib.users
WHERE id=1
AND NOT EXISTS
(SELECT *
FROM orbib.billing_address
WHERE user_id=1
)
Also tried doing this with a procedure as explained here:
However this didn't help as well, besides that it looks like the procedure is saved, causing the user id to always be 1, and this of course varies.
Maybe anybody has an idea how to create a query which does do what I want?
EDITS:
Here are table descriptions:
tableX:
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
username varchar(30) NO UNI NULL
firstname varchar(45) YES NULL
lastname varchar(45) YES NULL
street varchar(45) YES NULL
street_nr varchar(10) YES NULL
zipcode varchar(10) YES NULL
city varchar(45) YES NULL
password varchar(255) NO NULL
salt varchar(255) NO UNI NULL
email varchar(255) NO NULL
create_time datetime NO CURRENT_TIMESTAMP
company varchar(45) YES NULL
branche varchar(45) YES NULL
tableY:
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
user_id int(11) NO NULL
company varchar(45) YES NULL
contact_name varchar(100) YES NULL
street varchar(45) YES NULL
street_nr varchar(10) YES NULL
zipcode varchar(45) YES NULL
city varchar(45) YES NULL
terms_ok tinyint(1) YES NULL
billing_ok tinyint(1) YES NULL
So from the idea from #kickstart I tried to do this:
SELECT
IFNULL(tableY.company, tableX.company) company,
IFNULL(tableY.contact_name, tableX.lastname) contact,
IFNULL(tableY.street, tableX.street) street,
IFNULL(tableY.street_nr, tableX.street_nr) street_nr,
IFNULL(tableY.zipcode, tableX.zipcode) zipcode,
IFNULL(tableY.city, tableX.city) city
FROM (SELECT * FROM tableX) x
LEFT OUTER JOIN tableY ON tableY.user_id=1
LEFT OUTER JOIN tableX ON tableX.id=1
This gave me the error: 1248 Every derived table must have its own alias.
But found the solution I forgot the x in the FROM (SELECT)
After changing this it worked, resulting on two rows however, so I need to change this a bit.
Tnx #kickstarter
Making a major assumption that this is to return a single row, then possibly have a sub query to generate a single row and then LEFT OUTER JOIN the other 2 tables to that row.
Then you can use a load of IF statements to decide which tables values to return.
Efficiency is not likely to be its strong point!
SELECT IF(tableY.user_id IS NULL, tableX.id, tableY.user_id) AS id
IF(tableY.user_id IS NULL, tableX.field2, tableY.other_field2) AS field2,
etc
FROM (SELECT 1 AS dummy) a
LEFT OUTER JOIN tableY ON tableY.user_id = 1
LEFT OUTER JOIN tableX ON tableX.id = 1

MySQL Sum not correct and join

I'm building a holiday system and one of the features is being able to buy extra holiday which you can do at several points of the year, so I'm wanting to see the total number of days holiday, how much has been booked and how much has been bought by each user.
I'm doing a query
SELECT hr_user.name AS username,
hr_user.user_id,
SUM(working_days) AS daysbooked,
sum(hr_buyback.days) AS daysbought
FROM hr_leave
inner join hr_user on hr_user.user_id = hr_leave.user_id
left outer join hr_buyback on hr_buyback.user_id = hr_user.user_id
where active = 'y'
and hr_leave.start_date between '2012-01-01' and '2012-12-31'
and (hr_leave.status = 'approved' OR hr_leave.status = 'pending')
GROUP BY hr_user.name, hr_user.user_id
Now this is bringing back results in the daysbought column waaaay higher than what I was expecting, which is odd because when I get rid of the sum and just have hr_buyback.days it shows all the individual values I'd expect (except I'd much rather they were summed)
Secondly, in MySQL can you do what you can in MSSQL which is
left outer join hr_buyback on (select hr_buyback.user_id where buy_sell = 'buy') = hr_leave.user_id
?
Relevant table definitions (I assume this is what you mean?):
hr_buyback
buyback_id int(11) NO PRI auto_increment
user_id int(11) NO
days int(11) NO
buy_sell varchar(10) NO
status varchar(10) NO pending
year int(11) NO
hr_user
user_id int(11) NO PRI auto_increment
name varchar(40) NO
email varchar(40) NO UNI
level int(5) YES
manager_id int(11) NO
team_id int(11) YES
active varchar(2) NO y
holidays_day int(11) NO
start_date timestamp NO CURRENT_TIMESTAMP
password varchar(60) NO
division_id int(11) YES
day_change int(5) NO 0
priv_hours varchar(2) NO n
po_level int(2) YES 0
po_signoff int(10) YES
hr_leave
leave_id int(11) NO PRI auto_increment
user_id int(11) NO
start_date date NO
end_date date NO
day_type varchar(10) NO
status varchar(20) NO pending
working_days varchar(5) NO
leave_type int(11) NO
cancel int(11) NO 0
date timestamp NO CURRENT_TIMESTAMP
The problem is probably that you will get one copy of each row from hr_buyback for each matching row in hr_leave.
I assume that it is possible to have more than one hr_buyback row per user, and that it is possible to have a hr_buyback row without a hr_leave row. If so, you'll probably want something like this:
SELECT hr_user.name AS username,
hr_user.user_id,
SUM(working_days) AS daysbooked,
(SELECT SUM(days)
FROM hr_buyback
WHERE hr_buyback.user_id = hr_user.user_id) AS daysbought
FROM hr_user
left join hr_leave on hr_user.user_id = hr_leave.user_id
where active = 'y'
and hr_leave.start_date between '2012-01-01' and '2012-12-31'
and (hr_leave.status = 'approved' OR hr_leave.status = 'pending')
GROUP BY hr_user.name, hr_user.user_id

Performance of MySQL Query

I have inherited some code, the original author is not contactable and I would be extremely grateful for any assistance as my own MySQL knowledge is not great.
I have the following query that is taking around 4 seconds to execute, there is only around 20,000 rows of data in all the tables combined so I suspect the query could be made more efficient, perhaps by splitting it into more than one query, here it is:
SELECT SQL_CALC_FOUND_ROWS ci.id AS id, ci.customer AS customer, ci.installer AS installer, ci.install_date AS install_date, ci.registration AS registration, ci.wf_obj AS wf_obj, ci.link_serial AS link_serial, ci.sim_serial AS sim_serial, sc.call_status AS call_status
FROM ap_servicedesk.corporate_installs AS ci
LEFT JOIN service_calls AS sc ON ci.wf_obj = sc.wf_obj
WHERE ci.acc_id = 3
GROUP BY ci.id
ORDER BY link_serial
asc
LIMIT 40, 20
Can anyone spot any way to make this more efficient, thanks.
(Some values are set as variables but running the above query in PHPMyAdmin takes ~4secs)
The id column is the primary index.
More Info as requested:
corporate_installs table:
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
customer varchar(800) NO NULL
acc_id varchar(11) NO NULL
installer varchar(50) NO NULL
install_date varchar(50) NO NULL
address_name varchar(30) NO NULL
address_street varchar(40) NO NULL
address_city varchar(30) NO NULL
address_region varchar(30) NO NULL
address_post_code varchar(10) NO NULL
latitude varchar(15) NO NULL
longitude varchar(15) NO NULL
registration varchar(50) NO NULL
driver_name varchar(50) NO NULL
vehicle_type varchar(50) NO NULL
make varchar(50) NO NULL
model varchar(50) NO NULL
vin varchar(50) NO NULL
wf_obj varchar(50) NO NULL
link_serial varchar(50) NO NULL
sim_serial varchar(50) NO NULL
tti_inv_no varchar(50) NO NULL
pro_serial varchar(50) NO NULL
eco_serial varchar(50) NO NULL
eco_bluetooth varchar(50) NO NULL
warranty_expiry varchar(50) NO NULL
project_no varchar(50) NO NULL
status varchar(15) NO NULL
service_calls table:
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
acc_id int(15) NO NULL
ciid int(11) NO NULL
installer_job_no varchar(50) NO NULL
installer_inv_no varchar(50) NO NULL
engineer varchar(50) NO NULL
request_date varchar(50) NO NULL
completion_date varchar(50) NO NULL
call_status varchar(50) NO NULL
registration varchar(50) NO NULL
wf_obj varchar(50) NO NULL
driver_name varchar(50) NO NULL
driver_phone varchar(50) NO NULL
team_leader_name varchar(50) NO NULL
team_leader_phone varchar(50) NO NULL
servicing_address varchar(150) NO NULL
region varchar(50) NO NULL
post_code varchar(50) NO NULL
latitude varchar(50) NO NULL
longitude varchar(50) NO NULL
incident_no varchar(50) NO NULL
service_type varchar(20) NO NULL
fault_description varchar(50) NO NULL
requested_action varchar(50) NO NULL
requested_replacemt varchar(100) NO NULL
fault_detected varchar(50) NO NULL
action_taken varchar(50) NO NULL
parts_used varchar(50) NO NULL
new_link_serial varchar(50) NO NULL
new_sim_serial varchar(50) NO NULL
(Apologies for the formatting, I did the best I could)
Let me know if you need more info thanks.
Further info (I did the query again with EXPLAIN):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ci ALL acc_id NULL NULL NULL 7227 Using where; Using temporary; Using filesort
1 SIMPLE sc ALL NULL NULL NULL NULL 410
Add indices on the two wf_obj columns, the link_serial column (you may also need an index on the acc_id, too).
Then try this version:
SELECT ...
FROM
( SELECT *
FROM ap_servicedesk.corporate_installs
WHERE acc_id = 3
ORDER BY link_serial ASC
LIMIT 60
) AS ci
LEFT JOIN service_calls AS sc
ON sc.PK = --- the PRIMARY KEY of the table
( SELECT PK
FROM service_calls AS scm
WHERE ci.wf_obj = scm.wf_obj
ORDER BY scm. --- whatever suits you
LIMIT 1
)
ORDER BY ci.link_serial ASC
LIMIT 20 OFFSET 40
The ORDER BY scm.SomeColumn is needed not for performance but to get consistent results. Your query as it is, is joining a row from the first table to all related rows of the second table. But the final GROUP BY aggregates all these rows (of the second table), so your SELECT ... sc.call_status picks a more or less random call_status from one of these rows.
The first place I'd look on this would have to be indexes.
There is a group on ci.id which is the PK which is fine, however you are ordering by link_ser (source table unspecified) and you are selecting based on ci.acc_id.
If you add an extra key on the table corp_installs for the field acc_id then that alone should help increase performance as it will be usable for the WHERE clause.
Looking further you have ci.wf_obj = sc.wf_obj within the join. Joining on a VARCHAR will be SLOW, and you are not actually using this as part of the selection criteria and so a SUBQUERY may be your friend, consider the following
SELECT
serviceCallData.*,
sc.call_status AS call_status
FROM (
SELECT
SQL_CALC_FOUND_ROWS AS found_rows,
ci.id AS id,
ci.customer AS customer,
ci.installer AS installer,
ci.install_date AS install_date,
ci.registration AS registration,
ci.wf_obj AS wf_obj,
ci.link_serial AS link_serial,
ci.sim_serial AS sim_serial
FROM ap_servicedesk.corporate_installs AS ci
WHERE ci.acc_id = 3
GROUP BY ci.id
ORDER BY ci.link_serial ASC
LIMIT 40, 20
) AS serviceCallData
LEFT JOIN serice_calls AS sc ON serviceCallData.wf_obj = sc.wf_obj
In addition to this, change that (acc_id) key to be (acc_id, link_serial) as then it will be usable in the sort. Also add a key on (wf_obj) into serice_calls.
This will select the 20 rows from the corpoprate_installs table and then only join them onto the service_calls table using the inefficient VARCHAR join
I hope this is of help
I think the SQL_CALC_FOUND_ROWS option used with a join and a group by could be degrading the performance (look here for some tests, info on SQL_CALC_FOUND_ROWS here). It seems in facts that indexes are not used in that case.
Try replacing your query with two separate queries, the one with the LIMIT followed by a COUNT().