Trouble debugging a slow query - mysql

I am having trouble debugging a slow query, when I take it appart they perform relatively fast, let me break it down for you:
The first query, which is my subquery, groups all rows by lmu_id (currently only 2 unique ones) and returns the max(id) in other words the last inserted row.
SELECT max(id) FROM `position` GROUP by lmu_id
-> 15055,15091
2 total, Query took 0.0030 seconds
The outer query retrieves the full row of those two positions, so here I've manually inserted the ids (15055,15091)
SELECT * FROM `position` WHERE id IN (15055,15091)
2 total, Query took 0.1169 sec
Not the fastest query, but still a bink of an eye.
Now my problem is I do not understand why if I combine these two queries the whole system crashes:
SELECT * FROM `position` AS p1 WHERE p1.id IN (SELECT max(id) FROM `position` AS p2 GROUP by p2.lmu_id)
takes forever, 100% cpu, crashing, lost patience after 2 minutes, service mysql restart
For your reference I did an explain of the query
EXPLAIN SELECT * FROM `position` AS p1 WHERE p1.id IN (SELECT max(p2.id) FROM `position` AS p2 GROUP by p2.lmu_id)
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY p1 ALL NULL NULL NULL NULL 7613 Using where
2 DEPENDENT SUBQUERY p2 index NULL position_lmu_id_index 5 NULL 1268 Using index
id is the primary key, and lmu_id is a foreign key and also indexed.
I'm really stumped. Why is the final query taking so long/crashing? What other things shoud I look in to?

Joins can work too.
SELECT *
FROM `position` AS p1
INNER JOIN (SELECT max(id) FROM `position` GROUP by lmu_id) p2 on (p1.id = p2.id)
Scott's answer is good too, as I find EXISTS tends to run quite fast as well. In general, avoid IN.
Also try
SELECT *
FROM `position` AS p1
GROUP BY p1.lmu_id
HAVING p1.id = (SELECT max(id) FROM `position` where lmu_id = p1.lmu_id)

I've found that using EXISTS runs much faster than IN sub selects.
http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.html
Subqueries with EXISTS vs IN - MySQL

Related

delete multiple rows in mysqldb

How can we optimize the delete query.
delete FROM student_score
WHERE lesson_id IS NOT null
AND id NOT IN(SELECT MaxID FROM temp)
ORDER BY id
LIMIT 1000
This select statement return "SELECT MaxID FROM temp" 35k lines and temp is a temporary table.
and select * FROM student_score WHERE
lesson_id IS NOT null return around 500k rows
I tried using limit and order by clause but doesn't result in faster ways
IN(SELECT...)` is, in many situations, really inefficient.
Use a multi-table DELETE. This involves a LEFT JOIN ... IS NULL, which is much more efficient.
Once you have mastered that, you might be able to get rid of the temp and simply fold it into the query.
Also more efficient is
WHERE NOT EXISTS ( SELECT 1 FROM temp
WHERE student_score.lesson_id = temp.MAXID )
Also, DELETEing a large number of rows is inherently slow. 1000 is not so bad; 35K is. The reason is the need to save all the potentially-deleted rows until "commit" time.
Other techniques for big deletes: http://mysql.rjweb.org/doc.php/deletebig
Note that one of then explains a more efficient way to walk through the PRIMARY KEY (via id). Note that your query may have to step over lots of ids that have lesson_id IS NULL. That is, the LIMIT 1000 is not doing what you expected.
You can do it without order by :
DELETE FROM student_score
WHERE lesson_id IS NOT null
AND id NOT IN (SELECT MaxID FROM temp)
Or like this using left join which is more optimized in term of speed :
DELETE s
FROM student_score s
LEFT JOIN temp t1 ON s.id = t1.MaxID
WHERE lesson_id IS NOT null and t1.MaxID is null;

Query time that I can't understand in MYSQL

I'm new to this platform, even this is my first question. Sorry for my bad English, I use translate. Let me know if I have used inappropriate language.
my table is like this
CREATE TABLE tbl_records (
id int(11) NOT NULL,
data_id int(11) NOT NULL,
value double NOT NULL,
record_time datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
ALTER TABLE tbl_records
ADD PRIMARY KEY (id),
ADD KEY data_id (data_id),
ADD KEY record_time (record_time);
ALTER TABLE tbl_records
MODIFY id int(11) NOT NULL AUTO_INCREMENT;
COMMIT;
my first query
It takes 0.0096 seconds
SELECT b.* FROM tbl_records b
INNER JOIN
(SELECT MAX(id) AS id FROM tbl_records GROUP BY data_id) a
ON a.id=b.id;
my second query
Its takes 2.4957 seconds
SELECT MAX(id) AS id FROM tbl_records GROUP BY data_id;
When I do these operations over and over again, the result is similar.
There are 20 million data in the table.
Why is the one with the subquery faster?
Also what I really need is MAX(record_time) but
SELECT b.* FROM tbl_records b
INNER JOIN
(SELECT MAX(record_time) AS id FROM tbl_records GROUP BY data_id) a
ON a.id=b.id
It takes minutes when I run it.
I also need records such as hourly, daily, and monthly. I couldn't see much performance difference between GROUP BY SUBSTR(record_time,1,10) or GROUP BY DATE_FORMAT(record_time,'%Y%m%d') both take minutes.
What am I doing wrong?
The first query can be simplified to
SELECT * FROM tbl_records
ORDER BY id DESC
LIMIT 1.
The second:
SELECT id FROM tbl_records
ORDER BY data_id DESC
LIMIT 1;
I don't know what the third is trying to do. This does not make sense: MAX(record_time) AS id -- it is a DATETIME that will subsequently be compared to an INT in ON a.id=b.id.
Another option for turning a DATETIME into a DATE is simply DATE(record_time). But it will not be significantly faster.
If the goal is to build daily counts and subtotals, then there is a much better way. Build and incrementally maintain a Summary table .
(responding to Comment)
The GROUP BY that you have is improper and probably incorrect. I took the liberty of changing from id to data_id:
SELECT b.*
FROM
( SELECT data_id, MAX(record_time) AS max_time
FROM tbl_records
GROUP BY data_id
) AS a
FROM tbl_records AS b
ON a.data_id = b.data_id
AND a.max_time = b.record_time
And have
INDEX(data_id, record_time)
Can there be duplicate times for one data_id? To discuss that and other "groupwise-max" queries, see http://mysql.rjweb.org/doc.php/groupwise_max

Mysql long execution query

I have table with 38k rows and I use this query to compare item id from items table with item id from posted_domains table.
select * from `items`
where `items`.`source_id` = 2 and `items`.`source_id` is not null
and not exists (select *
from `posted_domains`
where `posted_domains`.`item_id` = `items`.`id` and `domain_id` = 1)
order by `item_created_at` asc limit 1
This query took 8s. I don't know if is a problem with my query or my mysql is bad configured. This query is generated by Laravel relations like
$items->doesntHave('posted', 'and', function ($q) use ($domain) {
$q->where('domain_id', $domain->id);
});
CORRELATED subqueries can be rather slow (as they are often executed repeatedly, once for each row in the outer query), this might be faster.
select *
from `items`
where `items`.`source_id` = 2
and `items`.`source_id` is not null
and item_id not in (
select DISTINCT item_id
from `posted_domains`
where `domain_id` = 1)
order by `item_created_at` asc
limit 1
I say might because subqueries in where are also rather slow in MySQL.
This LEFT JOIN will probably be the fastest.
select *
from `items`
LEFT JOIN (
select DISTINCT item_id
from `posted_domains`
where `domain_id` = 1) AS subQ
ON items.item_id = subQ.item_id
where `items`.`source_id` = 2
and `items`.`source_id` is not null
and subQ.item_id is null
order by `item_created_at` asc
limit 1;
Since it is a no matches scenario, it technically doesn't even need to be a subquery; and might be faster as a direct left join, but that will depend on indexes, and possibly actual data values.

how to speed up query when using count and group by

I have two tables named seller and item. They are connected through a third table (seller_item) using a "n" to "m" foreign key relation.
Now I a try to answer the requirement: "I as a seller want a list of my competitors with a count of items I am selling and they are selling as well".
So a list of all sellers with the count of overlapping items in relation to one specific seller.
Also I want this to be sorted by count and limited.
But the query is using temp table and filesort which is very slow.
Explain says:
Using where; Using index; Using temporary; Using filesort
How can I speed this up ?
Here is the query:
SELECT
COUNT(*) AS itemCount,
s.sellerName
FROM
seller s,
seller_item si
WHERE
si.itemId IN
(SELECT itemId FROM seller_item WHERE sellerId = 4711)
AND
si.sellerId=s.id
GROUP BY
sellerName
ORDER BY
itemCount DESC
LIMIT 50;
the table defs:
CREATE TABLE `seller` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`sellerName` varchar(50) NOT NULL
PRIMARY KEY (`id`),
UNIQUE KEY `unique_index` (`sellerName`),
) ENGINE=InnoDB
contains about 200.000 rows
--
CREATE TABLE `item` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`itemName` varchar(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_index` (`itemName`),
) ENGINE=InnoDB
contains about 100.000.000 rows
--
CREATE TABLE `seller_item` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`sellerId` bigint(20) unsigned NOT NULL,
`itemId` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `sellerId` (`sellerId`,`itemId`),
KEY `item_id` (`itemId`),
CONSTRAINT `fk_1` FOREIGN KEY (`sellerId`) REFERENCES `seller` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
CONSTRAINT `fk_2` FOREIGN KEY (`itemId`) REFERENCES `item` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB
contains about 170.000.000 rows
Database is Mysql Percona 5.6
Output of EXPLAIN:
+----+-------------+-------------+--------+----------------------+----- ---------+---------+---------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+----------------------+--------------+---------+---------------------+------+----------------------------------------------+
| 1 | SIMPLE | s | index | PRIMARY,unique_index | unique_index | 152 | NULL | 1 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | si | ref | sellerId,item_id | sellerId | 8 | tmp.s.id | 1 | Using index |
| 1 | SIMPLE | seller_item | eq_ref | sellerId,item_id | sellerId | 16 | const,tmp.si.itemId | 1 | Using where; Using index |
+----+-------------+-------------+--------+----------------------+--------------+---------+---------------------+------+----------------------------------------------+
I doubt it's feasible to make a query like that run fast in realtime on a database of your size, especially for sellers with lots of popular items in stock.
You should materialize it. Create a table like this
CREATE TABLE
matches
(
seller INT NOT NULL,
competitor INT NOT NULL,
matches INT NOT NULL,
PRIMARY KEY
(seller, competitor)
)
and update it in batches in a cron script:
DELETE
FROM matches
WHERE seller = :seller
INSERT
INTO matches (seller, competitor, matches)
SELECT si.seller, sc.seller, COUNT(*) cnt
FROM seller_item si
JOIN seller_item sc
ON sc.item = si.item
AND sc.seller <> si.seller
WHERE si.seller = :seller
GROUP BY
si.seller, sc.seller
ORDER BY
cnt DESC
LIMIT 50
You also need to make (seller, item) the PRIMARY KEY on seller_item. The way it is now, finding a seller by item requires two lookups instead of one: first id by item using KEY (item), then seller by id using the PRIMARY KEY (id)
I believe you're under a misimpression about your ability to eliminate the Using temporary; Using filesort steps to satisfy your query. Queries of the form
SELECT COUNT(*), grouping_value
FROM table
GROUP BY grouping_value
ORDER BY COUNT(*)
LIMIT n
always use a temporary in-memory result set, and always sort that resultset. That's because the result set doesn't exist anywhere until the query runs, and it has to be sorted before the LIMIT clause can be satisfied.
"Filesort" is somewhat misnamed. It doesn't necessarily mean the sorting is happening on a file in the file system, just that a temporary resultset is being sorted. If that resultset is massive, the sort can spill out of RAM into the filesystem, but it doesn't have to. Please read this. https://www.percona.com/blog/2009/03/05/what-does-using-filesort-mean-in-mysql/ Don't get distracted by the Using filesort item in your EXPLAIN results.
One of the tricks to getting better performance from this sort of query is to minimize the size of the sorted results. You've already filtered them down to the stuff you want; that's good.
But, you can still arrange to sort less stuff, by sorting just the seller.id and the count, then joining the (longer) sellerName in after you know the exact fifty rows you need. That also has the benefit of letting you do your aggregating with just the seller_item table, rather than with the resultset that comes from joining the two.
Here's what I mean. This subquery generates the list of fifty sellerId values you need. All it has to sort is the count and sellerId. That's faster than sorting the count and sellerName because there's less data, and fixed-length data, to shuffle around in the sort operation.
SELECT COUNT(*) AS itemCount,
sellerId
FROM seller_item
WHERE itemId IN
(SELECT itemId FROM seller_item WHERE sellerId = 4711)
GROUP BY SellerId
ORDER BY COUNT(*) DESC
LIMIT 50
Notice that this sorts a big result set, then discards most of it. It gives you the exact fifty seller id values you need.
You can make this even faster by filtering out more rows by adding HAVING COUNT(*) > 1 right after your GROUP BY clause, but that changes the meaning of your query and may not meet your business requirements.
Once you have those fifty items, you can retrieve the seller names. The whole query looks like this:
SELECT s.sellerName, c.itemCount
FROM seller s
JOIN (
SELECT COUNT(*) AS itemCount, sellerId
FROM seller_item
WHERE itemId IN
(SELECT itemId FROM seller_item WHERE sellerId = 4711)
GROUP BY SellerId
ORDER BY COUNT(*) DESC
LIMIT 50
) c ON c.sellerId = s.id
ORDER BY c.itemCount DESC
Your indexing effort should be spent trying to make the inner queries fast. The outer query will be fast no matter what; it's only handling fifty rows, and using an indexed id value to look up other values.
The inmost query is SELECT itemId FROM seller_item WHERE sellerId = 4711. This will benefit greatly from your existing (sellerId, itemId) compound index: it can random-access and then scan that index, which is very quick.
The SELECT COUNT(*)... query will benefit from a (itemId, sellerId) compound index. That part of your query is the hard and slow part, but still, this index will help.
Look, others have mentioned this, and so will I. Having both a unique composite key (sellerId, itemId) and a primary key id on that seller_item table is, with respect, incredibly wasteful.
It makes your updates and inserts slower.
It means your table is organized as a tree based on the meaningless id rather than the meaningful value pair.
If you make one of the two indexes I mentioned the primary key, and create the other one without making it unique, you'll have a much more efficient table. These many-to-many join tables don't need, and should not have, surrogate keys.
Reformulation
I think this is what you really wanted:
SELECT si2.sellerId, COUNT(DISTINCT si2.itemId) AS itemCount
FROM seller_item si1
JOIN seller_item si2 ON si2.itemId = si1.itemId
WHERE si1.sellerId = 4711
GROUP BY si2.sellerId
ORDER BY itemCount DESC
LIMIT 50;
(Note: DISTINCT is probably unnecessary.)
In words: For seller #4711, find the items he sells, then find which sellers are selling nearly the same set of items. (I did not try to filter out #4711 from the resultset.)
More efficient N:M
But there is still an inefficiency. Let's dissect your many-to-many mapping table (seller_item).
It has an id which is probably not used for anything. Get rid of it.
Then promote UNIQUE(sellerId, itemId) to PRIMARY KEY(sellerId, itemId).
Now change INDEX(itemId) to INDEX(itemId, sellerId) so that the last stage of the query can be "using index".
Blog discussing that further.
You have a very large dataset; you have debugged your app. Consider removing the FOREIGN KEYs; they are somewhat costly.
Getting sellerName
It may be possible to JOIN to sellers to get sellerName. But try it with just sellerId first. Then add the name. Verify that the count does not inflate (that often happens) and that the query does not slow down.
If either thing goes wrong, then do
SELECT s.sellerName, x.itemCount
FROM ( .. the above query .. ) AS x
JOIN sellers AS s USING(sellerId);
(Optionally you could add ORDER BY sellerName.)
I'm not sure how fast this would be on your database but I'd write the query like this.
select * from (
select seller.sellerName,
count(otherSellersItems.itemId) itemCount from (
select sellerId, itemId from seller_item where sellerId != 4711
) otherSellersItems
inner join (
select itemId from seller_item where sellerId = 4711
) thisSellersItems
on otherSellersItems.itemId = thisSellersItems.itemId
inner join seller
on otherSellersItems.sellerId = seller.id
group by seller.sellerName
) itemsSoldByOtherSellers
order by itemCount desc
limit 50 ;
Since we are limiting the (potentially large) resultset to at most 50 rows, I would put off getting the sellername until after we have the counts, so we only need to get 50 seller names.
First, we get the itemcount by seller_id
SELECT so.seller_id
, COUNT(*) AS itemcount
FROM seller_item si
JOIN seller_item so
ON so.item_id = si.item_id
WHERE si.seller_id = 4711
GROUP BY so.seller_id
ORDER BY COUNT(*) DESC, so.seller_id DESC
LIMIT 50
For improved performance, I would make a suitable covering index available for the join to so. e.g.
CREATE UNIQUE INDEX seller_item_UX2 ON seller_item(item_id,seller_id)
By using a "covering index", MySQL can satisfy the query entirely from the index pages, without a need to visit the pages in the underlying table.
Once the new index is created, I would drop the index on the singleton item_id column, since that index is now redundant. (Any query that could make effective use of that index will be able to make effective use of the composite index which has item_id as the leading column.)
There's no getting around a "Using filesort" operation. MySQL has to evaluate the COUNT() aggregate on each row, before it can perform a sort. There's no way (given the current schema) for MySQL to return the rows in order using an index to avoid a sort operation.
Once we have that set of (at most) fifty rows, then we can get the sellername.
To get the sellername, we could either use a correlated subquery in the SELECT list, or a join operation.
1) Using a correlated subquery in SELECT list, e.g.
SELECT so.seller_id
, ( SELECT s.sellername
FROM seller s
WHERE s.seller_id = so.seller_id
ORDER BY s.seller_id, s.sellername
LIMIT 1
) AS sellername
, COUNT(*) AS itemcount
FROM seller_item si
JOIN seller_item so
ON so.item_id = si.item_id
WHERE si.seller_id = 4711
GROUP BY so.seller_id
ORDER BY COUNT(*) DESC, so.seller_id DESC
LIMIT 50
(We know that subquery will be executed (at most) fifty times, once for each row returned by the outer query. Fifty executions (with a suitable index available) isn't that bad, at least compared to 50,000 executions.)
Or, 2) using a join operation, e.g.
SELECT c.seller_id
, s.sellername
, c.itemcount
FROM (
SELECT so.seller_id
, COUNT(*) AS itemcount
FROM seller_item si
JOIN seller_item so
ON so.item_id = si.item_id
WHERE si.seller_id = 4711
GROUP BY so.seller_id
ORDER BY COUNT(*) DESC, so.seller_id DESC
LIMIT 50
) c
JOIN seller s
ON s.seller_id = c.seller_id
ORDER BY c.itemcount DESC, c.seller_id DESC
(Again, we know the the inline view c will return (at most) fifty rows, and getting fifty sellername (using a suitable index) should be fast.
SUMMARY TABLE
If we denormalized the implementation, and added summary table containing item_id (as the primary key) and a "count" of the number of sellers of that item_id, our query could take advantage of that.
As an illustration of what that might look like:
CREATE TABLE item_seller_count
( item_id BIGINT NOT NULL PRIMARY KEY
, seller_count BIGINT NOT NULL
) Engine=InnoDB
;
INSERT INTO item_seller_count (item_id, seller_count)
SELECT d.item_id
, COUNT(*)
FROM seller_item d
GROUP BY d.item_id
ORDER BY d.item_id
;
CREATE UNIQUE INDEX item_seller_count_IX1
ON item_seller_count (seller_count, item_id)
;
The new summary table will become "out of sync" when rows are inserted/updated/deleted from the seller_item table.
And populating this table would take resources. But having this available would speed up queries of the type we're working on.

Alter and Optimize sql query

I need to please change this SQL query to NOT use sub-query with IN, I need for this query to work faster.
here is the query i am working on. About 7 million rows.
SELECT `MovieID`, COUNT(*) AS `Count`
FROM `download`
WHERE `UserID` IN (
SELECT `UserID` FROM `download`
WHERE `MovieID` = 995
)
GROUP BY `MovieID`
ORDER BY `Count` DESC
Thanks
Something like this - but (in the event that you switch to an OUTER JOIN) make sure you're counting the right thing...
SELECT MovieID
, COUNT(*) ttl
FROM download x
JOIN download y
ON y.userid = x.userid
AND y.movieid = 995
GROUP
BY x.MovieID
ORDER
BY ttl DESC;
Use Exists instead, see Optimizing Subqueries with EXISTS Strategy:
Consider the following subquery comparison:
outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where) MySQL
evaluates queries “from outside to inside.” That is, it first obtains
the value of the outer expression outer_expr, and then runs the
subquery and captures the rows that it produces.
A very useful optimization is to “inform” the subquery that the only
rows of interest are those where the inner expression inner_expr is
equal to outer_expr. This is done by pushing down an appropriate
equality into the subquery's WHERE clause. That is, the comparison is
converted to this:
EXISTS (SELECT 1 FROM ... WHERE subquery_where AND
outer_expr=inner_expr) After the conversion, MySQL can use the
pushed-down equality to limit the number of rows that it must examine
when evaluating the subquery.
filter direct on movieId..you does not need to add sub query. it can be done by using movieID =995 in where clause.
SELECT `MovieID`, COUNT(*) AS `Count`
FROM `download`
WHERE `MovieID` = 995
GROUP BY `MovieID`
ORDER BY `Count` DESC