I run a simple SELECT (noted below) in a stored procedure of a table that's around 1,500 rows.
CREATE PROCEDURE `LoadCollectionItemProperty`(IN sId int(10))
BEGIN
SELECT *
FROM itemproperty
WHERE itemid IN
(SELECT itemid
FROM collectionitem
WHERE collectionid = sId AND removed ='0000-00-00 00:00:00');
END
This operation takes around 7 seconds. I inserted Breakpoints and used F11 to determine that upon MySqlAdapter.Fill is where the lag starts. Both my computer and the server hosting the MySQL database are NOT challenged spec wise. I'm guessing it's the query itself.
collectionitem holds the 2 foreign keys linking an itemproperty to a collection. we feed the sproc sId(PK of collection) so that the subquery returns all the itemids from a specific collection and then we use the itemid(PK) in itemproperty.
Is there any way to speed up the process?
UPDATE
My issue was entirely due to improper indexing. Once I learned which columns to index, everything is extremely smooth! Thank you for your help.
You can try this, but it may not help much if your tables are missing indexes.
BEGIN
SELECT *
FROM itemproperty i
WHERE exists
(SELECT 1
FROM collectionitem c
WHERE collectionid = sId AND i.itemid = c.itemid AND removed ='0000-00-00 00:00:00');
END
Well given it's the query, (you should prove that by just running it at teh propmpt on the server)
Cut the query out of the sp and prefix it with Explain to see the query execution plan to confrm but some things stand out straight off.
SELECT *
FROM itemproperty
inner join collectionitem on collectionitem.itemid = itemproperty.itemid and removed ='0000-00-00 00:00:00'
to get rid of the subquery.
Is removed a date time, is it indexed?
Related
I need to filter a list by whether the person has an appointment. This runs in 0.09 seconds.
select personid from persons p
where EXISTS (SELECT 1 FROM appointments a
WHERE a.personid = p.personid);
Since I use this in more than one query and it actually contains another condition, it seemed convenient to put the filter into a function, so I have
CREATE FUNCTION `has_appt`(pid INT) RETURNS tinyint(1)
BEGIN
RETURN
EXISTS (SELECT 1 FROM appointments WHERE personid = pid);
END
Then I can use
select personid from persons where has_appt(personid)
However, two unexpected things happen. First, the statement using the has_appt() function now takes 2.5 seconds to run. I know there is overhead to a function call, but this seems extreme. Second, if I run the statement repeatedly, it takes about 5 seconds longer each time, so by the 4th time, it is taking over 20 seconds. This happens regardless of how long I wait between tries, but storing the function again resets the time to 2.5 seconds. What can account for the progressive slowness? What state can be affected by simply running it multiple times?
I know the solution is to forget the function and just embed this into my queries, but I want to understand the principle so I can avoid making the same mistake again. Thanks in advance for you help.
I'm using MySQL 8 and Workbench.
Your original query can be replaced by, and sped up by,
SELECT personid FROM appointments;
But the query seems dumb -- why would you want a list of all it ids of people with appointments, but no info about them? Perhaps you over-simplified the query?
If a person might have multiple appointments, then this would be needed, and might not be as fast:
SELECT DISTINCT personid FROM appointments;
As for why the function is so slow... Optimization does not see what is inside the function. So select personid from persons where has_appt(personid) walks through the entire persons table, calling the function repeatedly.
Hello again internet nerds!
Technically I have solved this problem, but want to know if there is a more optimal route I should take...
I have a large table (~4m rows) that is a collection of data, segmented using int "chip." There are 6 segments of data, so Chip IDs 1 through 6.
Of these 6 segments, I need to assign an order integer, which needs to be iterative, as it represents the exact location of the data on said segment.
My solution is (was) this:
# set iterative
set #i:=0;
# init
update table set `order` = #i:=(#i+1) where chip = 1;
This works. But it is so slow it sometimes triggers a timeout error. I need to run it 6 times and it may be triggered on occasion by our application whenever necessary. Maybe it's just that I need to alot more time to MySQL settings to account for the slow query, or is there an optimal, simpler solution for this?
Thanks for the advice.
Edit:
I've found a solution that works accurately and takes ~50 seconds to complete.
I'm now using an ordered select statement paired in the update, using a generated join table that's iterating within a column.
See:
set #count:= 0;
update
table as target,
(select
(#count := #count+1) as row_num,
t.*
from table as t
where chip = 1
order by t.id asc) as table_with_iterative
set target.`order` = table_with_iterative.row_num
where target.id = table_with_iterative.id;
I think that if possible you should assign the sequence numbers as you are inserting the rows into the table for the very first time. Strictly speaking, an SQL database does not promise that rows will initially appear inside the table in the order of of the INSERT statements.
One possibility that occurs to me is to use an auto-increment field in this table, because this will express the order in which the rows were added. But the field-values for any given chip, although ascending, might not be consecutive and undoubtedly would not start at 1.
If you did need such a field, I think I'd define a separate field (default value: NULL) to hold it. Then, a very simple stored procedure could query the rows for a given chip (ORDER BY auto-increment number) and assign a 1-based consecutive sequential number to that separate field. (A slightly more elaborate query could identify all the lists that don't have those numbers yet, and number all of them at once.)
I've found a solution that works accurately and takes ~50 seconds to complete.
This handles the out-of-order updates I was experiencing.
I'm now using an ordered select statement paired in the update statement, using a generated join table that's iterating within a column row_num.
See:
set #count:= 0;
update
table as target,
(select
(#count := #count+1) as row_num,
t.*
from table as t
where chip = 1
order by t.id asc) as table_with_iterative
set target.`order` = table_with_iterative.row_num
where target.id = table_with_iterative.id;
Problem Description
I have an audit table that contains history changes of some objects. The audit contains a unique audit event id, id of the object being changed, date of the change, the property that has been changed and also before and after change values and other columns.
What I need to do is query the audit data and get the date the same field was previously changed on the same object. So I need to look at the audit a second time and for each audit entry add the previous similar entry with it's date as the previous change date.
Schema & Data
The table schema has id (id) as the primary key and the object id (parent_id) as an index. Nothing else is indexed. In my test case I have roughly 150 objects with around 80k audit entries for them.
Solution
There are two obvious solutions sub-queries and left join.
In left join I basically join the same audit table on itself again with the join statement making sure the object, field and value changes are correspond, the changes are older than the current change and select the max change date and finally to only pickup one latest previous change I group by id. In case no previous change has been found use the creation date of the object itself.
LEFT JOIN SQL
SELECT `audit`.`id` AS `id`,
`audit`.`parent_id` AS `parent_id`,
`audit`.`date_created` AS `date_created`,
COALESCE(MAX(`audit_prev`.`date_created`), `audit_parent`.`date_entered`) AS `date_created_before`,
`audit`.`field_name` AS `field_name`,
`audit`.`before_value_string` AS `before_value_string`,
`audit`.`after_value_string` AS `after_value_string`
FROM `opportunities_audit` `audit`
LEFT JOIN `opportunities_audit` `audit_prev`
ON(`audit`.`parent_id` = `audit_prev`.`parent_id`
AND `audit_prev`.`date_created` < `audit`.`date_created`
AND `audit_prev`.`after_value_string` = `audit`.`before_value_string`
AND `audit`.`field_name` = `audit_prev`.`field_name`)
LEFT JOIN `opportunities` `audit_parent` ON(`audit`.`parent_id` = `audit_parent`.`id`)
GROUP BY `audit`.`id`;
Subquery logic is rather similar but instead grouping and using the MAX function I simply have order by date DESC and LIMIT 1
SELECT `audit`.`id` AS `id`,
`audit`.`parent_id` AS `parent_id`,
`audit`.`date_created` AS `date_created`,
COALESCE((SELECT `audit_prev`.`date_created`
FROM `opportunities_audit` AS `audit_prev`
WHERE
(`audit_prev`.`parent_id` = `audit`.`parent_id`)
AND (`audit_prev`.`date_created` < `audit`.`date_created`)
AND (`audit_prev`.`after_value_string` = `audit`.`before_value_string`)
AND (`audit_prev`.`field_name` = `audit`.`field_name` )
ORDER BY `date_created` DESC
LIMIT 1
), `audit_parent`.`date_entered`) AS `date_created_before`,
`audit`.`field_name` AS `field_name`,
`audit`.`before_value_string` AS `before_value_string`,
`audit`.`after_value_string` AS `after_value_string`
FROM `opportunities_audit` `audit`
LEFT JOIN `opportunities` `audit_parent` ON(`audit`.`parent_id` = `audit_parent`.`id`);
Both queries produce identical result sets.
Issue
When I run the query in phpMyAdmin the solution with join takes roughly 2m30s to return the result. However, phpMyAdmin says the query took 0.04 seconds. When I run the subquery solution the result comes back immediately and the reported execution time by phpMyAdmin is something like 0.06 seconds.
So I have a hard time understanding where this difference in actual execution time comes from. My initial guess was that the problem would be related to phpMyAdmin's automatic LIMITS on the returned data set- while the result has 80k rows it only displays 25. But adding the LIMIT manually to the queries makes them both execute fast.
Also running the queries from the command line mysql tool returns the full result sets for both queries and the reported execution times correspond to the actual execution time and the method using joins is still roughly 1.5x faster then subquery.
From the profiler data it seems that the bulk of that wait time is spent on sending data. As it takes sending data takes in the order of minutes and everything else is in the order of microseconds.
Still why would the behaviour of phpMyAdmin differ so greatly in the case of the two queries?
I have an issue on creating tables by using select keyword (it runs so slow). The query is to take only the details of the animal with the latest entry date. that query will be used to inner join another query.
SELECT *
FROM amusementPart a
INNER JOIN (
SELECT DISTINCT name, type, cageID, dateOfEntry
FROM bigRegistrations
GROUP BY cageID
) r ON a.type = r.cageID
But because of slow performance, someone suggested me steps to improve the performance. 1) use temporary table, 2)store the result and use it and join it the the other statement.
use myzoo
CREATE TABLE animalRegistrations AS
SELECT DISTINCT name, type, cageID, MAX(dateOfEntry) as entryDate
FROM bigRegistrations
GROUP BY cageID
unfortunately, It is still slow. If I only use the select statement, the result will be shown in 1-2 seconds. But if I add the create table, the query will take ages (approx 25 minutes)
Any good approach to improve the query time?
edit: the size of big registration table is around 3.5 million rows
Can you please try the query in the way below to achieve The query is to take only the details of the animal with the latest entry date. that query will be used to inner join another query, the query you are using is not fetching records as per your requirement and it will faster:
SELECT a.*, b.name, b.type, b.cageID, b.dateOfEntry
FROM amusementPart a
INNER JOIN bigRegistrations b ON a.type = b.cageID
INNER JOIN (SELECT c.cageID, max(c.dateOfEntry) dateofEntry
FROM bigRegistrations c
GROUP BY c.cageID) t ON t.cageID = b.cageID AND t.dateofEntry = b.dateofEntry
Suggested indexing on cageID and dateofEntry
This is a multipart question.
Use Temporary Table
Don't use Distinct - group all columns to make distinct (dont forget to check for index)
Check the SQL Execution plans
Here you are not creating a temporary table. Try the following...
CREATE TEMPORARY TABLE IF NOT EXISTS animalRegistrations AS
SELECT name, type, cageID, MAX(dateOfEntry) as entryDate
FROM bigRegistrations
GROUP BY cageID
Have you tried doing an explain to see how the plan is different from one execution to the next?
Also, I have found that there can be locking issues in some DB when doing insert(select) and table creation using select. I ran this in MySQL, and it solved some deadlock issues I was having.
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
The reason the query runs so slow is probably because it is creating the temp table based on all 3.5 million rows, when really you only need a subset of those, i.e. the bigRegistrations that match your join to amusementPart. The first single select statement is faster b/c SQL is smart enough to know it only needs to calculate the bigRegistrations where a.type = r.cageID.
I'd suggest that you don't need a temp table, your first query is quite simple. Rather, you may just need an index. You can determine this manually by studying the estimated execution plan, or running your query in the database tuning advisor. My guess is you need to create an index similar to below. Notice I index by cageId first since that is what you join to amusementParks, so that would help SQL narrow the results down the quickest. But I'm guessing a bit - view the query plan or tuning advisor to be sure.
CREATE NONCLUSTERED INDEX IX_bigRegistrations ON bigRegistrations
(cageId, name, type, dateOfEntry)
Also, if you want the animal with the latest entry date, I think you want this query instead of the one you're using. I'm assuming the PK is all 4 columns.
SELECT name, type, cageID, dateOfEntry
FROM bigRegistrations BR
WHERE BR.dateOfEntry =
(SELECT MAX(BR1.dateOfEntry)
FROM bigRegistrations BR1
WHERE BR1.name = BR.name
AND BR1.type = BR.type
AND BR1.cageID = BR.cageID)
I have these 2 queries. As you can see its doing a lookup in TabRedemption for orderItemID. The select takes a fraction of a second while the update takes ~30 seconds.
Why does MySQL resort to a full index scan in the update, and how can I stop this. It already has a foreign key constraint and and index.
select RedemptionID from TabRedemption where orderItemID in
(SELECT OrderItemID FROM TabOrderDetails WHERE OrderId = 4559775);
UPDATE TabRedemption SET active = 1 where orderItemID in
(SELECT OrderItemID FROM TabOrderDetails WHERE OrderId = 4559775);
Strangely if I resolve the subquery manually its fast.
UPDATE TabRedemption SET active = 1 where orderItemID in (2579027);
I've noticed that if I use a update with join query its fast, but I dont want to do that because its not supported in h2database.
On a side note MS SQLServer does this fine.
The best workaround:
UPDATE TabRedemption
JOIN TabOrderDetails USING(orderItemID)
SET TabRedemption.active = 1
WHERE TabOrderDetails.OrderId = 4559775;
(or something close to that)
The answer is that SELECT and UPDATE use different parsers. The workaround is to add a second table to the UPDATE because it will then use the SELECT parser.
The difference in parsers is being addressed by Oracle in MySQL 5.7.
Keep in mind that the pattern "IN ( SELECT ... )" optimizes poorly in many cases (although apparently not your case).