Improving the MySQL Query - mysql

I have the following query which filters the row with replyAutoId=0 and then fetches the most recent record of each propertyId. Now the query takes 0.23225 sec for fetching just 5,435 from 21,369 rows and I want to improve this. All I am asking is, Is there a better way of writing this query ? Any suggestions ?
SELECT pc1.* FROM (SELECT * FROM propertyComment WHERE replyAutoId=0) as pc1
LEFT JOIN propertyComment as pc2
ON pc1.propertyId= pc2.propertyId AND pc1.updatedDate < pc2.updatedDate
WHERE pc2.propertyId IS NULL
The SHOW CREATE TABLE propertyComment Output:
CREATE TABLE `propertyComment` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`propertyId` int(11) NOT NULL,
`agentId` int(11) NOT NULL,
`comment` longtext COLLATE utf8_unicode_ci NOT NULL,
`replyAutoId` int(11) NOT NULL,
`updatedDate` datetime NOT NULL,
`contactDate` date NOT NULL,
`status` enum('Y','N') COLLATE utf8_unicode_ci NOT NULL DEFAULT 'N',
`clientStatusId` int(11) NOT NULL,
`adminsId` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `propertyId` (`propertyId`),
KEY `agentId` (`agentId`),
KEY `status` (`status`),
KEY `adminsId` (`adminsId`),
KEY `replyAutoId` (`replyAutoId`)
) ENGINE=MyISAM AUTO_INCREMENT=21404 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Try to get rid of the nested query.
The following query should give the same result as your original query:
SELECT pc1.*
FROM propertyComment AS pc1
LEFT JOIN propertyComment AS pc2
ON pc1.propertyID = pc2.propertyId AND pc1.updatedDate < pc2.updatedDate
WHERE pc1.replyAutoId = 0 AND pc2.propertyID IS NULL

SELECT pc1.* FROM (SELECT * WHERE replyAutoId=0) as pc1
LEFT JOIN (SELECT propertyID, updatedDate from propertyComment order by 1,2) as pc2
ON pc1.propertyId= pc2.propertyId AND pc1.updatedDate < pc2.updatedDate
WHERE pc2.propertyId IS NULL
You also don't have any indexes?
If you did on primary key, you're not joining on it, so why include it?
Why not only select the columns you're interested from B table? This will limit the number of columns you're selecting from table B. Since you're pulling everything from table A where replyAutoID = 0, it wouldn't make much sense to limit the columns there. This should speed it up little.

Related

Strange index behavior mysql

I usually pride myself to be a database pro but I can't really wrap my head around this behavior. I hope someone can explain how this is working.
I have two mysql tables orders:
CREATE TABLE `orders` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`status` tinyint(4) NOT NULL,
`total` decimal(7,2) NOT NULL,
`date_created` datetime NOT NULL,
`date_updated` datetime NOT NULL,
`voucher_code` varchar(127) DEFAULT NULL,
`voucher_id` int(11) unsigned DEFAULT NULL,
`user_id` int(11) unsigned DEFAULT NULL,
`billing_address_id` int(11) unsigned NOT NULL,
`shipping_address_id` int(11) unsigned NOT NULL,
`reference_id` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `reference_id` (`reference_id`),
KEY `address_id` (`billing_address_id`)
) ENGINE=InnoDB AUTO_INCREMENT=168067 DEFAULT CHARSET=latin1;
and addresses:
CREATE TABLE `addresses` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`title` tinyint(4) DEFAULT NULL,
`first_name` varchar(255) NOT NULL,
`last_name` varchar(255) NOT NULL,
`street` varchar(255) NOT NULL,
`street2` varchar(255) DEFAULT NULL,
`company_name` varchar(255) DEFAULT NULL,
`city` varchar(45) NOT NULL,
`postcode` varchar(45) DEFAULT NULL,
`region` varchar(45) DEFAULT NULL,
`country` varchar(45) NOT NULL,
`phone` varchar(45) DEFAULT NULL,
`user_id` int(11) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_addresses_users1_idx` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=95277 DEFAULT CHARSET=latin1;
Now as you can see I have created an index inside the orders table for the billing_address_id called address_id that should match with the address id.
This is the query I am trying to run:
SELECT
o.id, a.first_name, a.last_name, o.total, o.date_created
FROM
orders o USE INDEX FOR JOIN (PRIMARY) JOIN
addresses a ON a.id = o.billing_address_id
ORDER BY id DESC
LIMIT 0, 50
If I run the query without any index specification it will pickup and use the address_id index which I would expect be the fastest way to match the two tables.
Strangely enough with the 'address_id' index the query runs in 2 seconds.
If i use the normal 'PRIMARY' index which works on the order id it takes 0.000 seconds.
This is bugging me out. I thought I was supposed to create indexes to expedite the joining process between tables.
If I run EXPLAIN on the two queries I get:
EXPLAIN EXTENDED
SELECT o.id, a.first_name, a.last_name, o.total, o.date_created
FROM orders o
JOIN addresses a ON a.id = o.billing_address_id
ORDER BY id DESC
LIMIT 0, 50
1 SIMPLE a ALL PRIMARY 95234 100.00 Using temporary; Using filesort
1 SIMPLE o ref address_id address_id 4 my_basket.a.id 1 100.00
With the index:
EXPLAIN EXTENDED
SELECT o.id, a.first_name, a.last_name, o.total, o.date_created
FROM orders o USE INDEX FOR
JOIN (PRIMARY)
JOIN addresses a ON a.id = o.billing_address_id
ORDER BY id DESC
LIMIT 0, 50
1 SIMPLE o index PRIMARY 4 50 332632.00
1 SIMPLE a eq_ref PRIMARY PRIMARY 4 my_basket.o.billing_address_id 1 100.00
Thank you for finding the time to answer this question.
For ORDER BY ... LIMIT queries it will often be beneficial to use a query execution plan that avoids sorting. This is not necessarily because the sorting is expensive, but because it makes it possible to stop the query execution once the number of requested rows (here 50) are found.
In your case, if one starts with table a, the full join result will have to be generated before selecting the "top" 50 rows. If you start with scanning table o using the PRIMARY index, the join result will be sorted on o.id, and the join execution can stop once 50 rows have been found.
The cost model used to select between the two approaches has been improved since MySQL 5.6. I suggest you try out MySQL 5.7 to see if the MySQL optimizer is now able to select the most optimal plan.
I'm surprised that the two queries even compile -- ORDER BY id is ambiguous since each table has a different id.
When doing a JOIN, always qualify all columns.
Meanwhile, remove the USE INDEX.

How to use GROUP BY in MySQL subquery

I'm using phpMyAdmin for submitting queries. When using GROUP BY in subquery the whole application just hangs without errors until I restart the browser.
I have three tables: files stores information about uploaded files, file_category defines the available categories for files and file_category_r stores relations between files and categories.
I want to count how many files each category has, but some files can have multiple entries in the files table, so I need to group them by files.filename.
I tried two different approaches, both resulting in a hang:
SELECT
fc.*,
(SELECT COUNT(*) FROM file_category_r
WHERE file_category_r.category_id = fc.id
AND file_category_r.file_id IN
(SELECT f2.id FROM
(SELECT * FROM files f3 GROUP BY f3.filename) f2
WHERE f2.mandant_id = 1)
) as file_count
FROM file_category fc ORDER BY name ASC
or
SELECT
fc.*,
(SELECT COUNT(*) FROM file_category_r
WHERE file_category_r.category_id = fc.id
AND file_category_r.file_id IN
(SELECT id FROM files WHERE mandant_id = 1 GROUP BY filename)
) as file_count
FROM file_category fc ORDER BY name ASC
I don't see a problem with my queries, running the subquery alone works ok. Even removing the GROUP BY return the result, but the result is wrong, because it's counting duplicate values.
Here is the table schema:
CREATE TABLE IF NOT EXISTS `files` (
`id` bigint(20) unsigned NOT NULL,
`project_id` bigint(20) unsigned DEFAULT NULL,
`customer_id` bigint(20) unsigned DEFAULT NULL,
`opportunity_id` int(11) DEFAULT NULL,
`task_id` bigint(20) unsigned DEFAULT NULL,
`calendar_event_id` bigint(20) unsigned DEFAULT NULL,
`mandant_id` tinyint(4) DEFAULT NULL,
`time` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`size` float NOT NULL,
`mime_type` varchar(100) NOT NULL,
`filename` text NOT NULL,
`file` longblob NOT NULL,
`folder_id` int(11) DEFAULT NULL,
`user_id` int(11) DEFAULT NULL,
`is_public` tinyint(1) unsigned NOT NULL DEFAULT '0',
`description` text,
`file_link` varchar(500) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=104832 ;
CREATE TABLE IF NOT EXISTS `file_category` (
`id` int(11) NOT NULL,
`name` varchar(200) NOT NULL,
`parent` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=445 ;
CREATE TABLE IF NOT EXISTS `file_category_r` (
`id` bigint(20) unsigned NOT NULL,
`file_id` bigint(20) unsigned NOT NULL,
`category_id` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=300346 ;
What am I doing wrong? The tables are quite big, is it possible the request is too heavy? I'm out of ideas, please help! Thanks!
select fc.name, count(*)
from file_category fc
inner join file_category_r fcr on fc.id = fcr.category_id
group by fc.name
Not quite sure about that "some files can have multiple entries in the files table, so I need to group them by files.filename", though.
Maybe you need something like
select fc.name, count(distinct f.filename)
from file_category fc
inner join file_category_r fcr on fc.id = fcr.category_id
inner join files f on fcr.file_id = f.id
group by fc.name
Often, the use of in can result in inefficient query plans. You can try exists instead:
SELECT fc.*,
(SELECT COUNT(*)
FROM file_category_r fcr
WHERE fcr.category_id = fc.id AND
exists (select 1 from files f where f.mandant_id = 1 and fcr.file_id = f.id)
) as file_count
FROM file_category fc
ORDER BY name ASC;
Now, you should add indexes. Start with file_category_r(category_id, file_id) and files(id, mandant_id).
I use heidisql, not phpmyadmin and your query works fine here. Maybe phpmyadmin has got problems to parse your query.
edit: also, there is a limit for query length. if your "in"-statement is to long, mysql will return an error that phpmyadmin should return.
but if phpmyadmin hangs, i'd try to execute your query my mysqlc or another mysql client like heidisql.

select count, group by and having optimization

I have this query
SELECT
t2.counter_id,
t2.hash_counter,
count(1) AS cnt
FROM
table1 t1
RIGHT JOIN
table2 t2 USING(counter_id)
WHERE
t2.hash_id = 973
GROUP BY
t1.counter_id
HAVING
cnt < 8000
Here are the tables.
CREATE TABLE `table1` (
`id` varchar(255) NOT NULL,
`platform` varchar(32) DEFAULT NULL,
`version` varchar(10) DEFAULT NULL,
`edition` varchar(2) NOT NULL DEFAULT 'us',
`counter_id` int(11) NOT NULL,
`created_on` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `counter_id` (`counter_id`)
) ENGINE=InnoDB
CREATE TABLE `table2` (
`counter_id` int(11) NOT NULL AUTO_INCREMENT,
`hash_id` int(11) DEFAULT NULL,
`hash_counter` int(11) DEFAULT NULL,
PRIMARY KEY (`counter_id`),
UNIQUE KEY `counter_key` (`hash_id`,`hash_counter`)
) ENGINE=InnoDB
The "EXPLAIN" shows "Using index; Using temporary; Using filesort" for table t2. Is there any way to get rid off temporary/filesort ? or any other ideas about optimizing this guy.
Your comment above gives more insight into what you want. It is always better to explain more about what you are trying to achieve - just looking at the non-working SQL leads people down the wrong path.
So, you want to know which table2 rows have < 8000 table1 rows?
Why not this:
select *
from table2 as t2
where hash_id = 973
and 8000 < (select count(*) from table1 as t1 where t1.counter_id = t2.counter_id)
;

Mysql optimize slow query with explain

I'm working on MySQL 5.5.29-0ubuntu0.12.04.1.
I have the need to create a query that can sort results by date and by a score.
I read the documentation and the posts here on stackoverflow (specifically this) about how to optimize a query but I'm still struggling to do it well.
The key findings is that to avoid the use of a temporary table the ORDER BY or GROUP BY must contains only columns from the first table in the join queue, so that's why the use of the STRAIGHT_JOIN clause and the two slightly different queries.
To avoid confusion, I'm going to assign a number to various query configuration:
order by date with STRAIGHT_JOIN clause
order by score with STRAIGHT_JOIN clause
order by date without STRAIGHT_JOIN clause
order by score without STRAIGHT_JOIN clause
Following is query 1, takes about 2.5 seconds to complete:
SELECT STRAIGHT_JOIN item.id AS id
FROM item
INNER JOIN score ON item.id = score.item_id
LEFT JOIN url ON item.url_id = url.id
LEFT JOIN doc ON url.doc_id = doc.id
INNER JOIN feed ON feed.id = item.feed_id
INNER JOIN user_feed ON feed.id = user_feed.feed_id AND score.user_id = user_feed.user_id
LEFT JOIN star ON item.id = star.item_id AND score.user_id = star.user_id
JOIN unseen ON item.id = unseen.item_id AND score.user_id = unseen.user_id
WHERE score.user_id = 1 AND user_feed.id = 7
ORDER BY zen_time DESC
LIMIT 0, 10
Following is query 2 (first join tables are inverted and the ordering column is different), takes only about 0.01 seconds to complete:
SELECT STRAIGHT_JOIN item.id AS id
FROM score
INNER JOIN item ON item.id = score.item_id
LEFT JOIN url ON item.url_id = url.id
LEFT JOIN doc ON url.doc_id = doc.id
INNER JOIN feed ON feed.id = item.feed_id
INNER JOIN user_feed ON feed.id = user_feed.feed_id AND score.user_id = user_feed.user_id
LEFT JOIN star ON item.id = star.item_id AND score.user_id = star.user_id
JOIN unseen ON item.id = unseen.item_id AND score.user_id = unseen.user_id
WHERE score.user_id = 1 AND user_feed.id = 7
ORDER BY score DESC
LIMIT 0, 10
Following are the EXPLAIN results for the queries.
Explain for query 1:
Explain for query 2:
Explain for query 3:
Explain for query 4:
Profiler result for query 1:
Profiler result for query 2:
Profiler result for query 3:
Profiler result for query 4:
Following are tables definitions:
CREATE TABLE `doc` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`md5` char(32) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `Md5_index` (`md5`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `feed` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`url` text NOT NULL,
`title` text,
PRIMARY KEY (`id`),
FULLTEXT KEY `Title_url_index` (`title`,`url`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `item` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`feed_id` bigint(20) unsigned NOT NULL,
`url_id` bigint(20) unsigned DEFAULT NULL,
`md5` char(32) NOT NULL,
PRIMARY KEY (`id`),
KEY `Md5_index` (`md5`),
KEY `Zen_time_index` (`zen_time`),
KEY `Feed_index` (`feed_id`),
KEY `Url_index` (`url_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `score` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned NOT NULL,
`item_id` bigint(20) unsigned NOT NULL,
`score` float DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `User_item_index` (`user_id`,`item_id`),
KEY Score_index (`score`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `star` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned NOT NULL,
`item_id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `User_item_index` (`user_id`,`item_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `unseen` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned NOT NULL,
`item_id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `User_item_index` (`user_id`,`item_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `url` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`doc_id` bigint(20) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY Doc_index (`doc_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `user` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_Email` (`email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `user_feed` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned NOT NULL,
`feed_id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `User_feed_index` (`user_id`,`feed_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Here are the row counts for the tables involved in the query:
Score: 68657
Item: 197602
Url: 198354
Doc: 186113
Feed: 754
User_feed: 721
Star: 0
Unseen: 150762
Which approach should I take since my program needs to be able to order results both by zen_time and score in the fastest way possible?
Due to the different query speeds I decided to make an even more accurate analysis based on the various results I want to achieve.
The result sets I need are four:
Select all the items from a specific feed, order them by SCORE.score (intelligent order)
Select all the items from a specific feed, order them by ITEM.zen_time (time order)
Select all the items, order them by SCORE.score (intelligent order)
Select all the items, order them by ITEM.zen_time (time order)
The query so has to be adapted to those conditions, and its variable parts are:
STRAIGHT_JOIN yes/no
First JOIN table score/item
WHERE condition on specific feed yes/no
ORDER BY score/zen_time
All of the tests have been executed with the SELECT SQL_NO_CACHE instruction.
Following are the results:
Now it's clear what I have to do:
No STRAIGHT_JOIN, first JOIN table SCORE
No STRAIGHT_JOIN, first JOIN table SCORE
STRAIGHT_JOIN (I did beat MySQL engine here :D ), first JOIN table SCORE
STRAIGHT_JOIN (I did beat MySQL engine here :D ), first JOIN table ITEM

trouble in find child field from primary field in mysql

I have two table like below
CREATE TABLE IF NOT EXISTS `countries` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=196 ;
ANd ANother one
CREATE TABLE IF NOT EXISTS `students` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`admission_no` varchar(255) DEFAULT NULL,
`nationality_id` int(11) DEFAULT NULL,
`country_id` int(11) DEFAULT NULL,
`is_active` tinyint(1) DEFAULT '1',
`is_deleted` tinyint(1) DEFAULT '0',
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `admission_no` (`admission_no`)
) ENGINE=InnoDB DEFAULT CHARSET=latin
1 AUTO_INCREMENT=2 ;
So the problem is i want fetch both nationality_id,country_id name from countries table for this im have to use LEFT JOIN query so in this case i am facing problem as im getting same name for both if nationality_id,country_id are different as i can only join on one table only so could someone plz help me to solve this.
If I understand you correctly, you can achieve this by LEFT JOINING the same table twice, using aliases.
Something like
SELECT *
FROM students s LEF TJOIN
countries c ON s.country_id = c.id LEFT JOIN
countries n ON s.nationality_id = n.id
#astander there is a little bug in your query (second alias for countries n is not used in on statement). here is a correct statement.
select s.Id, cNationality.Name, cCountry.Name
from Students as s
left outer join Countries as cNationality on cNationality.Id = s.Nationality_id
left outer join Countries as cCountry on cCountry.Id = s.Country_id