Very slow SQL query with only two left joins - mysql

I am joining three tables 'customer', 'customer_address' and 'country' using left join because I'm allowing a customer to have one or none address.
At the moment I have 13k+ customers and the query takes about 40 sec. I tried inner join but in that case I'm not getting the customers with no address.
All columns in 'ON' are indexed but it doesn't make much of a difference.
Here is my query:
SELECT DISTINCT *,
CASE
WHEN customer_address.customerid is NULL THEN customer.customerid
ELSE customer_address.customerid
END as customerid,
CASE
WHEN address1 = '' THEN 'NA'
ELSE address1
END as address1
FROM customer
LEFT JOIN customer_address ON customer.customerid = customer_address.customerid
LEFT JOIN country ON country.id = customer_address.country
WHERE deleted='0'
ORDER BY customer.customerid
DESC
LIMIT 0, 10
Any help would be appreciated
EDIT:
Here is 'explain' for the three tables:
customer
Field Type Null Key Default Extra
customerid int(12) NO PRI NULL auto_increment
forename varchar(128) YES NULL
surname varchar(128) YES NULL
company varchar(64) YES NULL
tel varchar(32) YES NULL
tel2 varchar(32) YES NULL
fax varchar(32) YES NULL
mob varchar(32) YES NULL
email varchar(255) YES NULL
date_reg date YES NULL
last_update datetime YES NULL
deleted int NO
customer_address
Field Type Null Key Default Extra
addressid varchar(12) NO PRI
customerid varchar(12) YES MUL NULL
address1 varchar(128) YES NULL
address2 varchar(128) YES NULL
town varchar(128) YES NULL
county varchar(128) YES MUL NULL
postcode varchar(12) YES NULL
country int(12) YES NULL
address_date datetime YES NULL
isprimary int NO not
country
Field Type Null Key Default Extra
id int(12) NO PRI 0
country varchar(255) YES NULL
At the moment there are no deleted!='0'
EDIT 2:
Query Explain:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE customer NULL ALL deleted NULL NULL NULL 13082 99.98 Using where; Using temporary; Using filesort
1 SIMPLE customer_address NULL ALL NULL NULL NULL NULL 9983 100.00 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE country NULL eq_ref PRIMARY,id PRIMARY 4 db_name.customer_address.country 1 100.00 NULL
EDIT 3:
1 SIMPLE customer NULL index NULL customerid 4 NULL 1 10.00 Using where; Using temporary
1 SIMPLE customer_address NULL ALL NULL NULL NULL NULL 9983 100.00 Using where
1 SIMPLE country NULL eq_ref PRIMARY,id PRIMARY 4 db_name.customer_address.country 1 100.00 NULL

Well, you have to ALL type queries that do not use any indexes. One of them even has the dreaded filesort, which is a very expensive operation.
Add an index on customer_address.customerid field. This will be used to match the records from customer_address table to the main customer table.
List the columns you want to return from the query, do not use *. For example, I do not see why you need to return customerid from both the customer and the address tables.
Get rid of the 1st case statement. customer.customerid field will always be populated.
Add an index hint after the customer table to make mysql think about using the customerid index for sorting:
...
FROM customer FORCE INDEX index_name_forcustomerid_field
...
You may want to consider increasing join_buffer_size server variable, however, adding the index in the first place should help a lot.

you can try this. You do not need to use first CASE statement as You are never going to receive CustomerId as NULL. I have removed ORDER BY clause too as I am assuming it will improve query perfomance (CustomerId is a primary key which shows how records are physically arranged in database. The default arrangement order is Ascending.)
SELECT DISTINCT *, C.customerid as customerid,
CASE
WHEN customer_address.address1 = '' THEN 'NA'
ELSE customer_address.address1
END as address1 from (select *
FROM customer where deleted='0' order by customerid DESC) AS C
LEFT JOIN customer_address ON C.customerid = customer_address.customerid
LEFT JOIN country ON country.id = customer_address.country
LIMIT 0, 10

Related

MYSQL query to retrieve comments and replies

I am having trouble writing a mysql query, I am aware I can do it via iteration, but believe it can be achieved in a single query, I am hoping someone can point me in the right direction.
I have a comments table and a user table.
I am adding "reply to comment" functionality by modifying the comment table to contains the id of the comment replied to.
data structure:
`comments`
(`i` int(5) NOT NULL AUTO_INCREMENT
,`id` int (5) NOT NULL
,`deleted` tinyint NULL
,`articleId` int (5) NOT NULL
,`userId` VARCHAR (200) NOT NULL
,`comment` VARCHAR (2000) NOT NULL
,`replyTo` int (5) NULL
,`knownFrom` datetime NOT NULL
,`knownTo` datetime NOT NULL
'users'
(`i` int(5) NOT NULL AUTO_INCREMENT
,`id` VARCHAR (200) NOT NULL
,`deleted` tinyint NULL
,`name` VARCHAR (200) NOT NULL
,`email` VARCHAR(200) NOT NULL
,`knownFrom` datetime NOT NULL
,`knownTo` datetime NOT NULL
Currently I am retrieving comments using the below query:
SELECT
c.id, c.articleId, c.userId, c.comment, c.replyTo, u.name, c.knownFrom
FROM `comments` c
INNER JOIN `users` u ON c.userId=u.id
WHERE
c.`articleId` = :articleId
AND c.`deleted` IS NULL
AND c.`knownTo` = '9999-12-31 23:59:59'
AND u.`knownTo` = '9999-12-31 23:59:59'
How do i retrieve data in this format?
c.id c.articleId c.userId c.comment c.replyTo u.name c.knownFrom
1 1 1 comment null user1 date
2 1 2 reply 1 user2 date
3 1 3 reply 1 user4 date
4 1 4 comment null user3 date
5 1 5 comment null user5 date
Structure breakdown:
All comments are retrieved for an article.
Comments which are not replies are retrieved first with the
'repliedTo' column as null.
Replies follow the comment with the corresponding ID.
Is this possible in a single query or using sub selects?
Is this better than iteration over comments to retrieve replies?
Many thanks in advance:-) !!

MySQL: If select in table X is empty, do select in table Y

in one query I would like to select information from table X.
however if table X doesn't return any information I would like to retrieve data from table Y.
Apart from each other the queries would look like this:
SELECT * FROM tableY WHERE user_id=1
SELECT * FROM tableX WHERE id=1
I tried the following to combine this, but it doesn't seem to work
SELECT * FROM tableY WHERE user_id=
IF (EXISTS (SELECT * FROM tableX WHERE id=1), 1, 0)
and of course the other way around
SELECT * FROM tableX WHERE id=
IF (EXISTS (SELECT * FROM tableY WHERE user_id=1), 1, 0)
Bot versions will only execute the first query, but not the second.
So I am kinda stuck here.
I also tried this, but as the tables do not have the same rows this shouldn't work... and thats correct it doesn't work:
SELECT *
FROM orbib.billing_address
WHERE user_id=1
UNION ALL
SELECT *
FROM orbib.users
WHERE id=1
AND NOT EXISTS
(SELECT *
FROM orbib.billing_address
WHERE user_id=1
)
Also tried doing this with a procedure as explained here:
However this didn't help as well, besides that it looks like the procedure is saved, causing the user id to always be 1, and this of course varies.
Maybe anybody has an idea how to create a query which does do what I want?
EDITS:
Here are table descriptions:
tableX:
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
username varchar(30) NO UNI NULL
firstname varchar(45) YES NULL
lastname varchar(45) YES NULL
street varchar(45) YES NULL
street_nr varchar(10) YES NULL
zipcode varchar(10) YES NULL
city varchar(45) YES NULL
password varchar(255) NO NULL
salt varchar(255) NO UNI NULL
email varchar(255) NO NULL
create_time datetime NO CURRENT_TIMESTAMP
company varchar(45) YES NULL
branche varchar(45) YES NULL
tableY:
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
user_id int(11) NO NULL
company varchar(45) YES NULL
contact_name varchar(100) YES NULL
street varchar(45) YES NULL
street_nr varchar(10) YES NULL
zipcode varchar(45) YES NULL
city varchar(45) YES NULL
terms_ok tinyint(1) YES NULL
billing_ok tinyint(1) YES NULL
So from the idea from #kickstart I tried to do this:
SELECT
IFNULL(tableY.company, tableX.company) company,
IFNULL(tableY.contact_name, tableX.lastname) contact,
IFNULL(tableY.street, tableX.street) street,
IFNULL(tableY.street_nr, tableX.street_nr) street_nr,
IFNULL(tableY.zipcode, tableX.zipcode) zipcode,
IFNULL(tableY.city, tableX.city) city
FROM (SELECT * FROM tableX) x
LEFT OUTER JOIN tableY ON tableY.user_id=1
LEFT OUTER JOIN tableX ON tableX.id=1
This gave me the error: 1248 Every derived table must have its own alias.
But found the solution I forgot the x in the FROM (SELECT)
After changing this it worked, resulting on two rows however, so I need to change this a bit.
Tnx #kickstarter
Making a major assumption that this is to return a single row, then possibly have a sub query to generate a single row and then LEFT OUTER JOIN the other 2 tables to that row.
Then you can use a load of IF statements to decide which tables values to return.
Efficiency is not likely to be its strong point!
SELECT IF(tableY.user_id IS NULL, tableX.id, tableY.user_id) AS id
IF(tableY.user_id IS NULL, tableX.field2, tableY.other_field2) AS field2,
etc
FROM (SELECT 1 AS dummy) a
LEFT OUTER JOIN tableY ON tableY.user_id = 1
LEFT OUTER JOIN tableX ON tableX.id = 1

Where am I going wrong in using a Join in the mysql query - Explain result posted too

I have this query which takes about 3.5 seconds just to fetch 2 records. However there are over 100k rows in testimonials, 13k in users, 850 in courses, 2 in exams.
SELECT t.*, u.name, f.feedback
FROM testmonials t
INNER JOIN user u ON u.id = t.userid
INNER JOIN courses co ON co.id = t.courseid
LEFT JOIN exam ex ON ex.id = t.exam_id
WHERE t.status = 4
AND t.verfication_required = 'Y'
AND t.verfication_completed = 'N'
ORDER BY t.submissiondate DESC
.Explain result: .
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE co ALL PRIMARY NULL NULL NULL 850 Using temporary; Using filesort
1 SIMPLE t ref CID,nuk_tran_user CID 4 kms.co.id 8 Using where
1 SIMPLE u eq_ref PRIMARY PRIMARY 4 kms.t.userid 1 Using where
1 SIMPLE ex eq_ref PRIMARY PRIMARY 3 kms.t.eval_id 1
If I remove the courses table join then the query returns the result pretty quick. I can't figure out why this query has to select all the courses rows i.e. 850?
Any ideas what I am doing wrong?
Edit:
I have an index on courseid, userid in testimonials table and these are primary keys of their respective tables.
EDIT 2
I have just removed the courseid index from the testimonials table (just to test) and interestingly the query returned result in 0.22 seconds!!!?? Everything else the same as above just removed only this index.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t ALL nuk_tran_user NULL NULL NULL 130696 Using where; Using filesort
1 SIMPLE u eq_ref PRIMARY PRIMARY 4 kms.t.userid 1 Using where
1 SIMPLE co eq_ref PRIMARY PRIMARY 4 kms.t.courseid 1
1 SIMPLE ex eq_ref PRIMARY PRIMARY 3 kms.t.exam_id 1
EDIT 3
EDIT 3
CREATE TABLE IF NOT EXISTS `courses` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`description` text NOT NULL,
`duration` varchar(100) NOT NULL DEFAULT '',
`objectives` text NOT NULL,
`updated_at` datetime DEFAULT NULL,
`updated_by` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=851 ;
Testimonials
CREATE TABLE IF NOT EXISTS `testimonials` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`feedback` text NOT NULL,
`userid` int(10) unsigned NOT NULL DEFAULT '0',
`courseid` int(10) unsigned NOT NULL DEFAULT '0',
`eventid` int(10) unsigned NOT NULL DEFAULT '0',
`emr_date` datetime DEFAULT NULL,
`exam_required` enum('Y','N') NOT NULL DEFAULT 'N',
`exam_id` smallint(5) unsigned NOT NULL DEFAULT '0',
`emr_completed` enum('Y','N') NOT NULL DEFAULT 'N',
PRIMARY KEY (`id`),
KEY `event` (`eventid`),
KEY `nuk_tran_user` (`userid`),
KEY `emr_date` (`emr_date`),
KEY `courseid` (`courseid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=134691 ;
.. this is the latest Explain query result now ...
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t ALL nuk_tran_user,courseid NULL NULL NULL 130696 Using where; Using filesort
1 SIMPLE u eq_ref PRIMARY PRIMARY 4 kms.t.userid 1 Using where
1 SIMPLE co eq_ref PRIMARY PRIMARY 4 kms.t.courseid 1
1 SIMPLE ex eq_ref PRIMARY PRIMARY 3 kms.t.exam_id 1
Doing an ORDER BY that does not have a corresponding index that can be utilized is known to cause delay issues. Even though this does not specifically answer your issue of the courses table.
Your original query looks MOSTLY ok, but you reference "f.feedback" and there is no "f" alias in the query. You also refer to "verification_required" and "verification_completed" but don't see those in the table structures but DO find "exam_required" and "emr_completed".
I would however change one thing. In the testimonials table, instead of individual column indexes, I would add one more with multiple columns to both take advantage of your multiple criteria query AND the order by
create table ...
KEY StatVerifySubmit ( status, verification_required, verification_completed, submissionDate )
but appears your query is referring to columns not listed in your table structure listing, but instead might be
KEY StatVerifySubmit ( status, exam_required, emr_completed, emr_Date)
Could you give a try to the following query instead of the original:
SELECT t.*, u.name, f.feedback
FROM testmonials t
INNER JOIN user u ON u.id = t.userid
LEFT JOIN exam ex ON ex.id = t.exam_id
WHERE t.status = 4
AND t.verfication_required = 'Y'
AND t.verfication_completed = 'N'
AND t.courseid in ( SELECT co.id FROM courses co)
ORDER BY t.submissiondate DESC
Do you need to select columns from the courses table?

Performance of MySQL Query

I have inherited some code, the original author is not contactable and I would be extremely grateful for any assistance as my own MySQL knowledge is not great.
I have the following query that is taking around 4 seconds to execute, there is only around 20,000 rows of data in all the tables combined so I suspect the query could be made more efficient, perhaps by splitting it into more than one query, here it is:
SELECT SQL_CALC_FOUND_ROWS ci.id AS id, ci.customer AS customer, ci.installer AS installer, ci.install_date AS install_date, ci.registration AS registration, ci.wf_obj AS wf_obj, ci.link_serial AS link_serial, ci.sim_serial AS sim_serial, sc.call_status AS call_status
FROM ap_servicedesk.corporate_installs AS ci
LEFT JOIN service_calls AS sc ON ci.wf_obj = sc.wf_obj
WHERE ci.acc_id = 3
GROUP BY ci.id
ORDER BY link_serial
asc
LIMIT 40, 20
Can anyone spot any way to make this more efficient, thanks.
(Some values are set as variables but running the above query in PHPMyAdmin takes ~4secs)
The id column is the primary index.
More Info as requested:
corporate_installs table:
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
customer varchar(800) NO NULL
acc_id varchar(11) NO NULL
installer varchar(50) NO NULL
install_date varchar(50) NO NULL
address_name varchar(30) NO NULL
address_street varchar(40) NO NULL
address_city varchar(30) NO NULL
address_region varchar(30) NO NULL
address_post_code varchar(10) NO NULL
latitude varchar(15) NO NULL
longitude varchar(15) NO NULL
registration varchar(50) NO NULL
driver_name varchar(50) NO NULL
vehicle_type varchar(50) NO NULL
make varchar(50) NO NULL
model varchar(50) NO NULL
vin varchar(50) NO NULL
wf_obj varchar(50) NO NULL
link_serial varchar(50) NO NULL
sim_serial varchar(50) NO NULL
tti_inv_no varchar(50) NO NULL
pro_serial varchar(50) NO NULL
eco_serial varchar(50) NO NULL
eco_bluetooth varchar(50) NO NULL
warranty_expiry varchar(50) NO NULL
project_no varchar(50) NO NULL
status varchar(15) NO NULL
service_calls table:
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
acc_id int(15) NO NULL
ciid int(11) NO NULL
installer_job_no varchar(50) NO NULL
installer_inv_no varchar(50) NO NULL
engineer varchar(50) NO NULL
request_date varchar(50) NO NULL
completion_date varchar(50) NO NULL
call_status varchar(50) NO NULL
registration varchar(50) NO NULL
wf_obj varchar(50) NO NULL
driver_name varchar(50) NO NULL
driver_phone varchar(50) NO NULL
team_leader_name varchar(50) NO NULL
team_leader_phone varchar(50) NO NULL
servicing_address varchar(150) NO NULL
region varchar(50) NO NULL
post_code varchar(50) NO NULL
latitude varchar(50) NO NULL
longitude varchar(50) NO NULL
incident_no varchar(50) NO NULL
service_type varchar(20) NO NULL
fault_description varchar(50) NO NULL
requested_action varchar(50) NO NULL
requested_replacemt varchar(100) NO NULL
fault_detected varchar(50) NO NULL
action_taken varchar(50) NO NULL
parts_used varchar(50) NO NULL
new_link_serial varchar(50) NO NULL
new_sim_serial varchar(50) NO NULL
(Apologies for the formatting, I did the best I could)
Let me know if you need more info thanks.
Further info (I did the query again with EXPLAIN):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ci ALL acc_id NULL NULL NULL 7227 Using where; Using temporary; Using filesort
1 SIMPLE sc ALL NULL NULL NULL NULL 410
Add indices on the two wf_obj columns, the link_serial column (you may also need an index on the acc_id, too).
Then try this version:
SELECT ...
FROM
( SELECT *
FROM ap_servicedesk.corporate_installs
WHERE acc_id = 3
ORDER BY link_serial ASC
LIMIT 60
) AS ci
LEFT JOIN service_calls AS sc
ON sc.PK = --- the PRIMARY KEY of the table
( SELECT PK
FROM service_calls AS scm
WHERE ci.wf_obj = scm.wf_obj
ORDER BY scm. --- whatever suits you
LIMIT 1
)
ORDER BY ci.link_serial ASC
LIMIT 20 OFFSET 40
The ORDER BY scm.SomeColumn is needed not for performance but to get consistent results. Your query as it is, is joining a row from the first table to all related rows of the second table. But the final GROUP BY aggregates all these rows (of the second table), so your SELECT ... sc.call_status picks a more or less random call_status from one of these rows.
The first place I'd look on this would have to be indexes.
There is a group on ci.id which is the PK which is fine, however you are ordering by link_ser (source table unspecified) and you are selecting based on ci.acc_id.
If you add an extra key on the table corp_installs for the field acc_id then that alone should help increase performance as it will be usable for the WHERE clause.
Looking further you have ci.wf_obj = sc.wf_obj within the join. Joining on a VARCHAR will be SLOW, and you are not actually using this as part of the selection criteria and so a SUBQUERY may be your friend, consider the following
SELECT
serviceCallData.*,
sc.call_status AS call_status
FROM (
SELECT
SQL_CALC_FOUND_ROWS AS found_rows,
ci.id AS id,
ci.customer AS customer,
ci.installer AS installer,
ci.install_date AS install_date,
ci.registration AS registration,
ci.wf_obj AS wf_obj,
ci.link_serial AS link_serial,
ci.sim_serial AS sim_serial
FROM ap_servicedesk.corporate_installs AS ci
WHERE ci.acc_id = 3
GROUP BY ci.id
ORDER BY ci.link_serial ASC
LIMIT 40, 20
) AS serviceCallData
LEFT JOIN serice_calls AS sc ON serviceCallData.wf_obj = sc.wf_obj
In addition to this, change that (acc_id) key to be (acc_id, link_serial) as then it will be usable in the sort. Also add a key on (wf_obj) into serice_calls.
This will select the 20 rows from the corpoprate_installs table and then only join them onto the service_calls table using the inefficient VARCHAR join
I hope this is of help
I think the SQL_CALC_FOUND_ROWS option used with a join and a group by could be degrading the performance (look here for some tests, info on SQL_CALC_FOUND_ROWS here). It seems in facts that indexes are not used in that case.
Try replacing your query with two separate queries, the one with the LIMIT followed by a COUNT().

Want to learn to improve slow mysql query

I have a MySQL query to select all product id's with certain filters applied to the products. This query
works but I want to learn to improve this query. Alternatives for this query are welcome with explanation.
SELECT kkx_products.id from kkx_products WHERE display = 'yes' AND id in
(SELECT product_id FROM `kkx_filters_products` WHERE `filter_id` in
(SELECT id FROM `kkx_filters` WHERE kkx_filters.urlname = "comics" OR kkx_filters.urlname = "comicsgraphicnovels")
group by product_id having count(*) = 2)
ORDER BY kkx_products.id desc LIMIT 0, 24
I've included the structure of the tables being used in the query.
EXPLAINkkx_filters;
Field Type Null Key Default Extra
id int(11) unsigned NO PRI NULL auto_increment
name varchar(50) NO
filtergroup_id int(11) YES MUL NULL
urlname varchar(50) NO MUL NULL
date_modified timestamp NO CURRENT_TIMESTAMP
orderid float(11,2) NO NULL
EXPLAIN kkx_filters_products;
Field Type Null Key Default Extra
filter_id int(11) NO PRI 0
product_id int(11) NO PRI 0
EXPLAIN kkx_products;
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
title varchar(255) NO
urlname varchar(50) NO MUL
description longtext NO NULL
price float(11,2) NO NULL
orderid float(11,2) NO NULL
imageurl varchar(255) NO
date_created datetime NO NULL
date_modified timestamp NO CURRENT_TIMESTAMP
created_by varchar(11) NO NULL
modified_by varchar(11) NO NULL
productnumber varchar(32) NO
instock enum('yes','no') NO yes
display enum('yes','no') NO yes
Instead of using inline queries in your criteria statements, try using the EXISTS block...
http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.html
You will be able to see the difference in your explain plan. Before you had a query executing for each and every record in your result set, and every result in that inline view result set had its own query executing to.
You see how nested inline views can create an exponential increase in cost. EXISTS doesn't work that way.
Example of the use of EXISTS:
Consider tbl1 has columns id and data. tbl2 has columns id, parentid, and data.
SELECT a.*
FROM tbl1 a
WHERE 1 = 1
AND EXISTS (
SELECT NULL
FROM tbl2 b
WHERE b.parentid = a.id
AND b.data = 'SOME CONDITIONAL DATA TO CONSTRAIN ON'
)
1) We can assume the 1 = 1 is some condition that equates to true for every record
2) Doesn't matter what we select in the EXISTS statment really, NULL is fine.
3) It is important to look at b.parentid = a.id, this links our exist statement to the result set