MYSQL joining to empty temp table is very slow - mysql

does anyone know why joining to an empty temp table is very slow? When I have data in the temp table, the query runs in 0.2 seconds, but when the temp table is empty it takes 62 seconds to return an empty table. In my code, table1 is the empty table. Joining to an empty table should always result in an empty table, so why does this take so long?
drop table if exists table1;
CREATE TEMPORARY TABLE IF NOT EXISTS table1 AS
(
select
username, channelnumber, LINKEDCHANNELDATA.ID
from
voijavuusers.tbluserdata USERDATA
left join
voijavuusers.tbllinkedchanneldata LINKEDCHANNELDATA ON USERDATA.userguid = LINKEDCHANNELDATA.userguid
where
USERDATA.username = 'tatatata'
);
select
CALLDATA.id,
CALLDATA.chanid,
POPUPDATA.textboxfield1
from
trmsmain.tblcalldata CALLDATA
left join trmsmain.tblpopupdata POPUPDATA on CALLDATA.recordguid = POPUPDATA.recordguid
join
(select
username,
channelnumber,
ID
from
table1
where
username = 'tatatata') LINKEDCHANNELS ON CALLDATA.chanid = LINKEDCHANNELS.channelnumber
order by CALLDATA.id desc limit 100000;

This is a question about how the query is optimized -- the query plan. I would suggest removing the subquery around TABLE1. This should help the optimizer:
select cd.id, cd.chanid, pud.textboxfield1
from trmsmain.tblcalldata cd left join
trmsmain.tblpopupdata pud
on cd.recordguid = pud.recordguid join
from table1 t1
on cd.chanid = t1.channelnumber
where t1.username = 'Installer'
order by cd.id desc
limit 100000;
This doesn't guarantee that the optimizer will choose the right execution path, but it gives it more information to go on. It shouldn't really make a difference, but I would also be inclined to put table1 first in the from clause.

Related

optimize subquery with count and order by field

I am looking to optimize below query which has a subquery from relation table and has a order by on subquery count data. Please see the below query:
SELECT table1.*,
( SELECT COUNT(*)
FROM table2
WHERE table2.user_id=table1.id
AND table2.deleted = 0) AS table2_total
FROM table1
WHERE table1.parent_id = 0
ORDER BY table2_total DESC LIMIT 0, 50
This query works well but it stuck when table2 has more than 50K data. I have also tried to use left join instead of sub query but that is even more slower:
SELECT table1.*,
COUNT(DISTINCT table2.id) as table2_total
FROM table1
LEFT JOIN table2 ON table2.user_id=table1.id
AND table2.deleted = 0
WHERE table1.parent_id = 0
ORDER BY table2_total DESC LIMIT 0, 50
table2 already has indexes on user_id and deleted column. Please see below table2 structure:
Is there any way to optimize this query in better way?
As written, it will go through the entirety of table1, and probe table2 that many times.
Add this composite index to table2: INDEX(user_id, deleted) and remove the INDEX(user_id) that you currently seem to have.
You can try to add index to the column table2.deleted And table1.parent_id. The index is going to impact the performance of the insert .

Deleting duplicate rows on MySQL, getting a max row error

I am deleting duplicate rows on MySQL and only leaving behind the old row (least id) but I am getting a max row error
DELETE n1
FROM item_audit n1, item_audit n2
WHERE n1.id > n2.id AND n1.description = n2.description
Keep in mind, with that join condition you are joining each row to every row before it (with the same description). This is one of those cases where a subquery will be much more effective than a join.
DELETE a
FROM item_audit a
WHERE (a.id, a.description) NOT IN
(SELECT * FROM
(
SELECT MIN(id), description
FROM item_audit
GROUP BY description
) AS realSubQ
)
Actually, assuming id is unique, it can even be simplier:
DELETE a
FROM item_audit a
WHERE a.id NOT IN
(SELECT * FROM
( SELECT MIN(id)
FROM item_audit
GROUP BY description
) AS realSubQ
)
As you discovered, MySQL needs to be "tricked" into being able to use the delete target in a subquery with the extra select * wrapper.
Alternatively, a join on the subquery could be used to reduce the size of the intermediate result set created behind the scenes.
DELETE a
FROM item_audit a
LEFT JOIN (SELECT MIN(id) AS firstId FROM item_audit GROUP BY description) AS aFirst
ON a.id = aFirst.firstId
WHERE aFirst.firstId IS NULL
;
If that fails, you can insert the first id's into a temp table, and should be able to do subquery version with that.
CREATE TEMPORARY TABLE `old_ids`
SELECT MIN(ID) AS id
FROM item_audit
GROUP BY description;
DELETE a
FROM item_audit a
LEFT JOIN old_ids ON a.id = old_ids.id
WHERE old_ids.id IS NULL
;
In any of these cases, a LIMIT clause can be placed very last to accomplish an incremental delete. The last, temp table, version has the benefit that the subquery will not need re-evaluated after every incremental delete (and the temporary table can be indexed to speed things up as well).

mySQL index and preparing state

I get a complicate query:
SELECT * FROM
(
SELECT Transaction
FROM table1
WHERE
Transaction IN (SELECT Transaction FROM table2 WHERE Plugin='XXX' AND Server='XXX')
AND
Transaction NOT IN (SELECT Transaction FROM table1 WHERE Detail IN ('Monitor','Version','monitor','version'))
ORDER BY Date DESC, Millisecond DESC LIMIT 10)
AS res
I get indexes on table1:Detail and the "Transaction" is the primary key of table2.
It will take a while(5-10 secs) for the database to return result. So I create another index on table2:Plugin, the query is fasted now, but a preparing state shows up and also takes 5-10 secs. So after I create a new index, the time does not change at all.
Can someone tell me what`s going on and how can I optimize this query? Thank you!
Could you not simply rewrite the query as follows:
SELECT a.Transaction
FROM table1 a
INNER JOIN table2 b ON b.Transaction = a.Transaction
WHERE (b.Plugin='XXX' AND b.Server='XXX')
AND a.Detail NOT IN ('Monitor','Version','monitor','version')
ORDER BY a.Date DESC, a.Millisecond DESC LIMIT 10
So you join the table2 (which will be faster) and remove all the subqueries.
This should be much faster.

percentile by COUNT(DISTINCT) with correlated WHERE only works with a view (or without DISTINCT)

I've got a weird one, and I don't know if it's my syntax (which seems straightforward) or a bug (or just unsupported).
Here's my query that works but is needlessly slow:
UPDATE table1
SET table1column1 =
(SELECT COUNT(DISTINCT table2column1) FROM table2view WHERE table2column1 <= (SELECT table2column1 FROM table2 WHERE table2.id = table1.id) )
/
(SELECT COUNT(DISTINCT table2column1) FROM table2)
+ (SELECT COUNT(DISTINCT table2column2) FROM table2view WHERE table2column2 <= (SELECT table2column2 FROM table2 WHERE table2.id = table1.id) )
/
(SELECT COUNT(DISTINCT table2column2) FROM table2)
+ (SELECT COUNT(DISTINCT table2column3) FROM table2view WHERE table2column3 <= (SELECT table2column3 FROM table2 WHERE table2.id = table1.id) )
/ (SELECT COUNT(DISTINCT table2column3) FROM table2);
It's just the sum of three percentiles (of table2column1, table2column2, and table2column3) with duplicates removed.
Here's where it gets weird. I have to use a view for this to work on the subquery with the WHERE or it will only UPDATE the first row of table1, and set the rest of the rows' table1column1 to 0. That table2view is an exact duplicate of table2. Yeah, weird.
If I don't use DISTINCT, I can do it without the view. Does that make sense? Note: I have to have DISTINCT because I have lots of duplicates.
I tried making it SELECT only from the view, but that slowed it down worse.
Does anyone know what the problem is and the best way to rework this query so it doesn't take so long? It's in a TRIGGER, and the updated data is pretty on demand.
Many thanks in advance!
Details
I'm testing the speed in phpMyAdmin's command line.
I'm pretty sure the degradation is coming from the view since the more of the view and the less of the actual table I use, the slower it gets.
When I do the one without DISTINCT, it's lightning fast.
Only works on views?
OK, so I just set up a copy of table2. I tried first to do the original query substituting the view with the copy. No go.
I tried to do the query below with the copy instead of the view. No go.
Hopefully the introduction of these constants will better show what I'm trying to do.
SET #table2column1_distinct_count = (SELECT COUNT(DISTINCT table2column1) FROM table2);
SET #table2column2_distinct_count = (SELECT COUNT(DISTINCT table2column2) FROM table2);
SET #table2column3_distinct_count = (SELECT COUNT(DISTINCT table2column3) FROM table2);
UPDATE table1, table2
SET table1.table1column1 = (SELECT COUNT(DISTINCT table2column1) FROM table2view WHERE table2column1 <= table2.table2column1) / #table2column1_distinct_count
+ (SELECT COUNT(DISTINCT table2column2) FROM table2view WHERE table2column2 <= table2.table2column2) / #table2column2_distinct_count
+ (SELECT COUNT(DISTINCT table2column3) FROM table2view WHERE table2column3 <= table2.table2column3) / #table2column3_distinct_count
WHERE table1.id = table2.id;
Again, when I use table2 instead of the table2view, it only updates the first row properly and sets all other rows' table1.table1column1 = 0.
Math
I'm trying to set table1.table1column1 = to the sum of the percentiles of table2column1, table2column2, and table2column3 by id.
I do a percentile by (counting the distinct values of a table2columnX <= to the current table2columnX ) / (the total count of distinct table2columnXs).
I use DISTINCT to get rid of the excessive duplicates.
View
Here's the SELECT for the view. Does this help?
CREATE VIEW myTable.table2view AS SELECT
table2.table2column1 AS table2column1,
table2.table2column2 AS table2column2,
table2.table2column2 AS table2column3,
FROM table2
GROUP BY table2.id;
Is there something special about the GROUP BY in the view's SELECT that makes this work (that I'm not seeing)?
I would probably say that the query is slow because it is repeatedly accessing the table when the trigger fires.
I am no SQL expert but I have tried to put together a query using temporary tables. You can see if it helps speed up the query. I have used different but similar sounding column names in my code sample below.
EDIT : There was a calculation error in my earlier code. Updated now.
SELECT COUNT(id) INTO #no_of_attempts from tb2;
-- DROP TABLE IF EXISTS S1Percentiles;
-- DROP TABLE IF EXISTS S2Percentiles;
-- DROP TABLE IF EXISTS S3Percentiles;
CREATE TEMPORARY TABLE S1Percentiles (
s1 FLOAT NOT NULL,
percentile FLOAT NOT NULL DEFAULT 0.00
);
CREATE TEMPORARY TABLE S2Percentiles (
s2 FLOAT NOT NULL,
percentile FLOAT NOT NULL DEFAULT 0.00
);
CREATE TEMPORARY TABLE S3Percentiles (
s3 FLOAT NOT NULL,
percentile FLOAT NOT NULL DEFAULT 0.00
);
INSERT INTO S1Percentiles (s1, percentile)
SELECT A.s1, ((COUNT(B.s1)/#no_of_attempts)*100)
FROM (SELECT DISTINCT s1 from tb2) A
INNER JOIN tb2 B
ON B.s1 <= A.s1
GROUP BY A.s1;
INSERT INTO S2Percentiles (s2, percentile)
SELECT A.s2, ((COUNT(B.s2)/#no_of_attempts)*100)
FROM (SELECT DISTINCT s2 from tb2) A
INNER JOIN tb2 B
ON B.s2 <= A.s2
GROUP BY A.s2;
INSERT INTO S3Percentiles (s3, percentile)
SELECT A.s3, ((COUNT(B.s3)/#no_of_attempts)*100)
FROM (SELECT DISTINCT s3 from tb2) A
INNER JOIN tb2 B
ON B.s3 <= A.s3
GROUP BY A.s3;
-- select * from S1Percentiles;
-- select * from S2Percentiles;
-- select * from S3Percentiles;
UPDATE tb1 A
INNER JOIN
(
SELECT B.tb1_id AS id, (C.percentile + D.percentile + E.percentile) AS sum FROM tb2 B
INNER JOIN S1Percentiles C
ON B.s1 = C.s1
INNER JOIN S2Percentiles D
ON B.s2 = D.s2
INNER JOIN S3Percentiles E
ON B.s3 = E.s3
) F
ON A.id = F.id
SET A.sum = F.sum;
-- SELECT * FROM tb1;
DROP TABLE S1Percentiles;
DROP TABLE S2Percentiles;
DROP TABLE S3Percentiles;
What this does is that it records the percentile for each score group and then finally just updates the tb1 column with the requisite data instead of recalculating the percentile for each student row.
You should also index columns s1, s2 and s3 for optimizing the queries on these columns.
Note: Please update the column names according to your db schema. Also note that each percentile calculation has been multiplied by 100 as I believe that percentile is usually calculated that way.

MySQL Join Best Practice on Large Data

table1_shard1 (1,000,000 rows per shard x 120 shards)
id_user hash
table2 (100,000 rows)
value hash
Desired Output:
id_user hash value
I am trying to find the fastest way to associate id_user with value from the tables above.
My current query ran for 30 hours without result.
SELECT
table1_shard1.id_user, table1_shard1.hash, table2.value
FROM table1_shard1
LEFT JOIN table2 ON table1_shard1.hash=table2.hash
GROUP BY id_user
UNION
SELECT
table1_shard2.id_user, table1_shard2.hash, table2.value
FROM table1_shard1
LEFT JOIN table2 ON table1_shard2.hash=table2.hash
GROUP BY id_user
UNION
( ... )
UNION
SELECT
table1_shard120.id_user, table1_shard120.hash, table2.value
FROM table1_shard1
LEFT JOIN table2 ON table1_shard120.hash=table2.hash
GROUP BY id_user
Firstly, do you have indexes on the hash fields
I think you should merge your tables in one before the query (at least temporarily)
CREATE TEMPORARY TABLE IF NOT EXISTS tmp_shards
SELECT * FROM table1_shard1;
CREATE TEMPORARY TABLE IF NOT EXISTS tmp_shards
SELECT * FROM table1_shard2;
# ...
Then do the main query
SELECT
table1_shard120.id_user
, table1_shard120.hash
, table2.value
FROM tmp_shards AS shd
LEFT JOIN table2 AS tb2 ON (shd.hash = tb2.hash)
GROUP BY id_user
;
Not sure for the performance gain but it'll be at least more maintainable.