I found a strange behavior on following query:
UPDATE llx_socpeople SET no_email=1 WHERE rowid IN (SELECT source_id FROM llx_mailing_cibles where tag = "68d74c3bc618ebed67919ed5646d0ffb");
takes 1 min and 30 seconds.
When I split up the commands to 2 queries:
SELECT source_id FROM llx_mailing_cibles where tag = "68d74c3bc618ebed67919ed5646d0ffb";
Result is 10842
UPDATE llx_socpeople SET no_email=1 WHERE rowid = 10842;
Result is shown in milliseconds.
Table llx_socpeople has about 7.000 records, llx_mailing_cibles has about 10.000 records.
MySQL Version is: 5.7.20-0ubuntu0.16.04.1
I already tried to optimize/repair both tables with no effect.
Any ideas?
Currently, as the subquery is being run for each row of the main query, we can expect a longer execution time.
What I would suggest would be to rely on a inner join for performing the update:
UPDATE llx_socpeople AS t1
INNER JOIN llx_mailing_cibles AS t2
ON t1.rowid = t2.source_id
SET t1.no_email=1
WHERE t2.tag = "68d74c3bc618ebed67919ed5646d0ffb";
This way you will definitely get far better performance.
You can troubleshoot your slow queries using the EXPLAIN MySql statement. Find more details about it on it's dedicated page from the official documentation. It might help you discover any missing indexes.
Related
we switched our database from mySQL8 to MariaDB10 a week ago and now we have massive performance problems. We figured out why: we are working with subqueries in select statements and ORDER BY pretty often. Here is an example:
SELECT id, (SELECT id2 FROM table2 INNER JOIN [...] WHERE column.foreignkey = table.id) queryResult
FROM table
WHERE status = 5
ORDER BY column
LIMIT 10
imagine, there are 1.000.000 entries in table which are affected if status = 5.
What happens in mySQL8: ORDER BY and LIMIT execute and after that the subquery (10 rows affected)
What happens in MariaDB10: the subquery executes (1.000.000 rows affected) and after that ORDER BY and LIMIT
Both queries are returning 10 rows but under MariaDB10 it is incredible slow because of that. Why is this happing? And is there an option in MariaDB which we should activate to avoid this? I know from mySQL8 that select subqueries will be executed when they are mentioned in ORDER BY. But if not they will be executed when the resultset is there.
Info: if we do this, everything is fine:
SELECT *, (SELECT id2 FROM table2 INNER JOIN [...] WHERE column.foreignkey = outerTable.id)
FROM (
SELECT id
FROM table
WHERE status = 5
ORDER BY column
LIMIT 10
) outerTable
Thank you so much for any help.
This is because table a by nature unsorted bunch of rows
A "table" (and subquery in the FROM clause too) is - according to the SQL standard - an unordered set of rows. Rows in a table (or in a subquery in the FROM clause) do not come in any specific order. That's why the optimizer can ignore the ORDER BY clause that you have specified. In fact, the SQL standard does not even allow the ORDER BY clause to appear in this subquery (we allow it, because ORDER BY ... LIMIT ... changes the result, the set of rows, not only their order).
mariadb manual
So the optimizer removes and ignores the ORDER BY.
You found already a method to circumvent it using LIMIT and ORDER By in the subquery
After searching and searching I finally found a solution to make the mariaDB10 database working as I knew it from mySQL8.
For those which have similar problems: set this each time you connect to the server and everything works like in mySQL8:
SET optimizer_use_condition_selectivity = 1
Long version: the problem I described at the top was suddenly solved and the subquery was executed like it was in the past under mySQL8. I did exactly nothing!
But there were soon new problems: we have a statistic page, which was incredible slow. I noticed that an index was missing and I add it. I executed the query and it was working. Without index 100.000 rows affected for finding the results, after adding 38. Well done.
Then strange things started to happen: I executed the query again and the database didn't use the index. So I executed it again and again. This was the result:
1st query execution (I did it with ANALYZE): 100.000 rows affected
2nd query execution: 38 rows affected
3rd query execution: 38 rows affected
4th query execution: 100.000 rows affected
5th query execution: 100.000 rows affected
It was complete random, even in our SaaS solution! So I startet to search how the optimizer determine an execution plan. I found this: optimizer_use_condition_selectivity
the default for mariaDB10.4 server is 4 which means, that histograms are used to calculate the result set. I saw a few videos about it and recognized that this will not work in our case (although we stuck to database normalization). Mode 1 works well:
use selectivity of index backed range conditions to calculate the cardinality of a partial join if the last joined table is accessed by full table scan or an index scan
I hope this will help some other guys which despair with this like I did.
At 5.6, MariaDB and MySQL went off in different directions for the Optimizer. MariaDB focused a lot on subqueries, though perhaps to the detriment of this particular query.
Do you have INDEX(status, column)? It would help most variants of this query.
Yes, the subquery has to be evaluated for every row before the order by. The subquery only seems to need id, so you can phrase this as:
SELECT id,
(SELECT id2 FROM table2 INNER JOIN [...] WHERE column.foreignkey = t.id) as queryResult
FROM (SELECT t.*
FROM table t
WHERE status = 5
ORDER BY column
LIMIT 10
) t
This evaluates the subquery only after the rows have been selected from the table.
I have trawled many of the similar responses on this site and have improved my code at several stages along the way. Unfortunately, this 3-row query still won't run.
I have one table with 100k+ rows and about 30 columns of which I can filter down to 3-rows (in this example) and then perform INNER JOINs across 21 small lookup tables.
In my first attempt, I was lazy and used implicit joins.
SELECT `master_table`.*, `lookup_table`.`data_point` x 21
FROM `lookup_table` x 21
WHERE `master_table`.`indexed_col` = "value"
AND `lookup_table`.`id` = `lookup_col` x 21
The query looked to be timing out:
#2013 - Lost connection to MySQL server during query
Following this, I tried being explicit about the joins.
SELECT `master_table`.*, `lookup_table`.`data_point` x 21
FROM `master_table`
INNER JOIN `lookup_table` ON `lookup_table`.`id` = `master_table`.`lookup_col` x 21
WHERE `master_table`.`indexed_col` = "value"
Still got the same result. I then realised that the query was probably trying to perform the joins first, then filter down via the WHERE clause. So after a bit more research, I learned how I could apply a subquery to perform the filter first and then perform the joins on the newly created table. This is where I got to, and it still returns the same error. Is there any way I can improve this query further?
SELECT `temp_table`.*, `lookup_table`.`data_point` x 21
FROM (SELECT * FROM `master_table` WHERE `indexed_col` = "value") as `temp_table`
INNER JOIN `lookup_table` ON `lookup_table`.`id` = `temp_table`.`lookup_col` x 21
Is this the best way to write up this kind of query? I tested the subquery to ensure it only returns a small table and can confirm that it returns only three rows.
First, at its most simple aspect you are looking for
select
mt.*
from
Master_Table mt
where
mt.indexed_col = 'value'
That is probably instantaneous provided you have an index on your master table on the given indexed_col in the first position (in case you had a compound index of many fields)…
Now, if I am understanding you correctly on your different lookup columns (21 in total), you have just simplified them for redundancy in this post, but actually doing something in the effect of
select
mt.*,
lt1.lookupDescription1,
lt2.lookupDescription2,
...
lt21.lookupDescription21
from
Master_Table mt
JOIN Lookup_Table1 lt1
on mt.lookup_col1 = lt1.pk_col1
JOIN Lookup_Table2 lt2
on mt.lookup_col2 = lt2.pk_col2
...
JOIN Lookup_Table21 lt21
on mt.lookup_col21 = lt21.pk_col21
where
mt.indexed_col = 'value'
I had a project well over a decade ago dealing with a similar situation... the Master table had about 21+ million records and had to join to about 30+ lookup tables. The system crawled and queried died after running a query after more than 24 hrs.
This too was on a MySQL server and the fix was a single MySQL keyword...
Select STRAIGHT_JOIN mt.*, ...
By having your master table in the primary position, where clause and its criteria directly on the master table, you are good. You know the relationships of the tables. Do the query in the exact order I presented it to you. Don't try to think for me on this and try to optimize based on a subsidiary table that may have smaller record count and somehow think that will help the query faster... it won't.
Try the STRAIGHT_JOIN keyword. It took the query I was working on and finished it in about 1.5 hrs... it was returning all 21 million rows with all corresponding lookup key descriptions for final output, hence still needed a longer duration than just 3 records.
First, don't use a subquery. Write the query as:
SELECT mt.*, lt.`data_point`
FROM `master_table` mt INNER JOIN
`lookup_table` l
ON l.`id` = mt.`lookup_col`
WHERE mt.`indexed_col` = value;
The indexes that you want are master_table(value, lookup_col) and lookup_table(id, data_point).
If you are still having performance problems, then there are multiple possibilities. High among them is that the result set is simply too big to return in a reasonable amount of time. To see if that is the case, you can use select count(*) to count the number of returned rows.
I have an issue on creating tables by using select keyword (it runs so slow). The query is to take only the details of the animal with the latest entry date. that query will be used to inner join another query.
SELECT *
FROM amusementPart a
INNER JOIN (
SELECT DISTINCT name, type, cageID, dateOfEntry
FROM bigRegistrations
GROUP BY cageID
) r ON a.type = r.cageID
But because of slow performance, someone suggested me steps to improve the performance. 1) use temporary table, 2)store the result and use it and join it the the other statement.
use myzoo
CREATE TABLE animalRegistrations AS
SELECT DISTINCT name, type, cageID, MAX(dateOfEntry) as entryDate
FROM bigRegistrations
GROUP BY cageID
unfortunately, It is still slow. If I only use the select statement, the result will be shown in 1-2 seconds. But if I add the create table, the query will take ages (approx 25 minutes)
Any good approach to improve the query time?
edit: the size of big registration table is around 3.5 million rows
Can you please try the query in the way below to achieve The query is to take only the details of the animal with the latest entry date. that query will be used to inner join another query, the query you are using is not fetching records as per your requirement and it will faster:
SELECT a.*, b.name, b.type, b.cageID, b.dateOfEntry
FROM amusementPart a
INNER JOIN bigRegistrations b ON a.type = b.cageID
INNER JOIN (SELECT c.cageID, max(c.dateOfEntry) dateofEntry
FROM bigRegistrations c
GROUP BY c.cageID) t ON t.cageID = b.cageID AND t.dateofEntry = b.dateofEntry
Suggested indexing on cageID and dateofEntry
This is a multipart question.
Use Temporary Table
Don't use Distinct - group all columns to make distinct (dont forget to check for index)
Check the SQL Execution plans
Here you are not creating a temporary table. Try the following...
CREATE TEMPORARY TABLE IF NOT EXISTS animalRegistrations AS
SELECT name, type, cageID, MAX(dateOfEntry) as entryDate
FROM bigRegistrations
GROUP BY cageID
Have you tried doing an explain to see how the plan is different from one execution to the next?
Also, I have found that there can be locking issues in some DB when doing insert(select) and table creation using select. I ran this in MySQL, and it solved some deadlock issues I was having.
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
The reason the query runs so slow is probably because it is creating the temp table based on all 3.5 million rows, when really you only need a subset of those, i.e. the bigRegistrations that match your join to amusementPart. The first single select statement is faster b/c SQL is smart enough to know it only needs to calculate the bigRegistrations where a.type = r.cageID.
I'd suggest that you don't need a temp table, your first query is quite simple. Rather, you may just need an index. You can determine this manually by studying the estimated execution plan, or running your query in the database tuning advisor. My guess is you need to create an index similar to below. Notice I index by cageId first since that is what you join to amusementParks, so that would help SQL narrow the results down the quickest. But I'm guessing a bit - view the query plan or tuning advisor to be sure.
CREATE NONCLUSTERED INDEX IX_bigRegistrations ON bigRegistrations
(cageId, name, type, dateOfEntry)
Also, if you want the animal with the latest entry date, I think you want this query instead of the one you're using. I'm assuming the PK is all 4 columns.
SELECT name, type, cageID, dateOfEntry
FROM bigRegistrations BR
WHERE BR.dateOfEntry =
(SELECT MAX(BR1.dateOfEntry)
FROM bigRegistrations BR1
WHERE BR1.name = BR.name
AND BR1.type = BR.type
AND BR1.cageID = BR.cageID)
From my debian terminal I try to execute in mysql client a query like:
SELECT *
FROM stop_times_lazio_set2_withtime2 AS B
WHERE EXISTS
(SELECT *
FROM stop_times_lazio_set2_emptytime2 AS A
WHERE B.trip_id=A.trip_id);
table A contains around 3 million records.
table B is a sub set of A of around 400000 records.
I'd like to select every records of A thats have a row "parent" with the same id (yes its not an unique/primary id)
Now it takes more than hours...now I'm around 2h and i still seen just a blinking pointer... is it the query correct? Even I can't access to others mysql client like phpmyadmin.
There is any way to speed up the process?
There is a way to check how many records are processed at running times?
I guess you have already indexed trip_id? There is another way writing the query, maybe it helps:
SELECT *
FROM stop_times_lazio_set2_withtime
WHERE trip_id IN (SELECT trip_id FROM stop_times_lazio_set2_emptytime2)
I would expect a straight JOIN to be much much faster...
SELECT B.*
FROM stop_times_lazio_set2_withtime2 AS B
JOIN stop_times_lazio_set2_emptytime2 AS A ON B.trip_id=A.trip_id
Why not using a simpler request ?
SELECT A.*
FROM stop_times_lazio_set2_emptytime2 AS A, stop_times_lazio_set_withtime2 AS B
WHERE B.trip_id=A.trip_id;
With that many records, it will obviously take time.
You can actually prevent it by processing only a few at a time by adding this at the end of the request
LIMIT <beginning>, <number of records>
Did you tried "left join"??
sample:
select columns from withtime
left join emptytime on withtime.tripid=emptytime.tripid;
I have a sql below, it runs about 30 mins, which is too long for me.
SELECT LPP.learning_project_pupilID, SL.serviceID, MAX(LPPO.start_date), SUM(LPPOT.license_mode_value) totalAssignedLicenses
FROM t_services_licenses SL
INNER JOIN t_pupils_offers_services POS ON POS.service_licenseID = SL.service_licenseID
INNER JOIN j_learning_projects_pupils_offers LPPO ON LPPO.learning_project_pupil_offerID = POS.learning_project_pupil_offerID
INNER JOIN j_learning_projects_pupils LPP ON LPPO.learning_project_pupilID = LPP.learning_project_pupilID
INNER JOIN j_learning_projects_pupils_offers_tracking LPPOT ON LPPOT.pupil_offer_serviceID = POS.pupil_offer_serviceID
INNER JOIN t_filters_items FI ON FI.itemID = LPP.learning_project_pupilID_for_filter_join
WHERE FI.filterID = '4dce2235-aafd-4ba2-b248-c137ad6ce8ca'
AND SL.serviceID IN ('OnlineConversationClasses', 'TwentyFourSeven')
GROUP BY LPP.learning_project_pupilID, SL.serviceID
The explain result below(tell me if you can't view the image):
http://images0.cnblogs.com/blog2015/47012/201508/140920298959608.png
I have viewed the profile result, "copying temp data " wasted almost all the time. I know the reason is caused by "group by" functionality, So I did some changes below to verify it:
I removed the MAX, SUM functions as well as the Group By sql and ran it, the time is only cost about 40 seconds, which is ok for us.
So here , I want to know, if there are some other methods to make above sql execute much more faster?
more info, you can find here: http://www.cnblogs.com/scy251147/p/4728995.html
EDIT:
From the explain view, I can see that in t_filters_items table, there are about 50802 rows filtered, And this table is not luckily Using temporary to store temp data, which is not a good choice for me . I really don't like "Group By" in MySQL very much.
Do not use CHARACTER SET utf8 on UUID columns. Change to ascii. Further discussion of uuids and how to further shrink them: http://mysql.rjweb.org/doc.php/uuid
Are there really 50K rows with FI.filterID = '4dce2235-aafd-4ba2-b248-c137ad6ce8ca'?
The GROUP BY spans two table (LPP and SL) making it impossible to optimize. Can that be changed?
The SUM(...) is likely to have a bigger value than you expect. This is because of the JOINs. Try to rewrite the computation of the SUM in a subquery.
Are you using InnoDB? Is innodb_buffer_pool_size set to about 70% of available RAM?
Approximately how many rows in each table?