MySQL: Why my query can't finish? - mysql

I don't know why the query I've wroten, doesn't give me any output and just can't finish. Here http://sqlfiddle.com/#!9/8656d2/1 is the sample of my database, in real I have there about 260k records(rows). So you can see that query works with that table in the link, but in my whole database something is wrong. I waited almost 30 minutes for any results, but the query process is interminably. I don't know what should I do now, what can be the reason of described problem?

I don't actually know why the query isn't completing on your MySQL with 260K records. But I can speculate that the correlated subquery you have in the SELECT statement is the culprit. Let's look closely at that guy:
SELECT DATE_SUB(MAX(EVENT_TIME), INTERVAL 12 HOUR)
FROM my_table mt
WHERE
EVENT_TYPE = '2' AND
mt.ID = my_table.ID
You are basically telling MySQL to do a MAX() calculation across the entire table, for every record in your my_table table. Note that because the subquery is correlated, it might have to be run fresh for literally all 260K records. Hopefully you can see that 260K x 260K operations would be a bit slow.
If I be correct, then a possible fix would be to rephrase your query using a join to a subquery table which finds the max event times for each ID in your table. This query would be run once, and only once, and then it would be up to MySQL to find an efficient way to join back to your original table. But in any case, this approach should be eons faster than what you were using.
SELECT t1.*
FROM my_table t1
INNER JOIN
(
SELECT ID, MAX(EVENT_TIME) AS max_event_time
FROM my_table
WHERE EVENT_TYPE = '2'
GROUP BY ID
) t2
ON t1.ID = t2.ID AND
t1.EVENT_TIME BETWEEN
DATE_SUB(t2.max_event_time, INTERVAL 12 HOUR) AND
t2.max_event_time
WHERE t1.EVENT_TYPE != 3
ORDER BY t1.ID;
Here is link to your updated Fiddle:
Demo

Related

Are SQL Subquery more efficient

Trying to understand this, but code efficiency increased more than 10x when I stopped using subquery. Table2 has about 5000 rows, while table1 is pretty huge, a few hundred thousand.
Original Statement
SELECT *
FROM table1
WHERE indexedCol IN (
SELECT indexedCol
FROM table2
WHERE iCol2 = "somevalue"
)
So somehow this is way more efficient.
SELECT *
FROM table1
WHERE indexedCol IN
(*comma separated result of SELECT FROM table2*)
Is there something I am missing here? Or subquery is never a good idea.
The real issue is the sub-query correlated? What do I mean by that? If the sub-query references table1. If it doesn't then then answer is simple -- if you have two queries
SELECT *
FROM table1
and
SELECT indexedCol
FROM table2
WHERE iCol2 = "somevalue"
The time it take to run one of them is less than the time it takes to run both of them. This could be even worse (as suggested in the comments) if one of them is run for every row.
This query could be rewriten to use a join like this:
SELECT *
FROM TABLE1
JOIN TABLE2 on TABLE1.indexedCol = TABLE2.indexedCol and TABLE2.iCol2 = 'some value'
Which will probably solve your problem.

Keep the result of a correlated sub-query

Imagine I have a query like this:
Select
(SELECT a FROM table_10 LIMIT 1) AS sb1,
(SELECT a FROM table_11 WHERE a=sb1 LIMIT 1) AS sb2,
(SELECT a FROM table_12 WHERE a=sb2 LIMIT 1) AS sb3
FROM my_table WHERE 1
I far as I found out the values for sb1,sb2 and sb3 are not saved in the memory and when the second sub-query refers to sb1 it re-runs the first sub-query again. when the third sub-query refers to sb2, the second sub-query re-runs thus the first one will re-run many times.
My reason for this is when I hard code the result instead of referring to the result of sb1 and sb2 I see a very huge difference in query time. (Like 30 seconds!)
My first question: Am I right?
My second question: How can I force mysql to keep the value in sb1 and sb2 and not to run the query each time?
My third question: If I'm not right, then what is causing this difference in time and performance?
How can I force mysql to keep the value in sb1 and sb2 and not to run the query each time?
Convert your correlated queries to JOIN. Formally (ignoring ambiguities) it will be
Select
table_10.a AS sb1,
table_11.a AS sb2,
table_12.a AS sb3
FROM my_table
CROSS JOIN table_10
INNER JOIN table_11 ON a=sb1
INNER JOIN table_12 ON a=sb2
WHERE 1
LIMIT 1
PS. LIMIT without ORDER BY makes no sense. Both in original code and provided one.
PPS. Specify table alias for EACH column name.

Fast query when executed individually, slow when nested

I'm trying to find the most recent entry time at a bunch of specific dates. When I run
select max(ts) as maxts from factorprice where ts <= '2011-1-5'
It returns very quickly.
EXPLAIN gives select_type SIMPLE and "Select tables optimized away".
But when I run
select (select max(ts) from factorprice where ts <= dates.dt) as maxts, dates.dt
from
trends.dates where dates.dt in ('2011-1-6');
It takes a long time to return (~10 seconds).
Explain gives:
select_type=PRIMARY table=dates rows=506 Extra=Using where
select_type=DEPENDENT SUBQUERY table=factorprice type=index
possible_keys=PRIMARY key=PRIMARY keylen=8 rows=26599224 Extra=Using
where; Using index
This query also takes a long time (10 sec)
select dt, max(ts) as maxts from factorprice as f inner join trends.dates as d
where ts <= dt and dt in ('2011-1-6')
group by dt;
Explain gives:
select_type=SIMPLE table=d type=ALL rows =509 Extra=Using where
select_type=SIMPLE table=f type=range possible_keys=PRIMARY key=PRIMARY keylen=8 rows=26599224 Extra=Using
where; Using index
I'd like to do this same operation on many different dates. Is there a way I can do that efficiently?
It looks like this bug:
http://bugs.mysql.com/bug.php?id=32665
Maybe if you create an index on dates.dt, it will go away.
This part of your SQL is a dependent query
select max(ts) from factorprice where ts <= dates.dt
which is executed for each row in the resultset. So the total time is approximately the time of the standalone query times the rows in the result set.
Judging from your EXPLAIN output. This query is visiting 506 rows in the dates table and then for each of those rows, over 26 million rows in the factorprice table. 10 seconds to do all of that isn't too bad.
My guess is that you are inadvertently creating a CROSS JOIN situation where every row of one table is matched up with every row in another table.

MySQL query optimization

I am having join query that seems fetching slowly. How can I optimize it, or it is
reasonable?
time to execute
29 total, Query took 1.6956 sec
mysql query
SELECT SQL_CALC_FOUND_ROWS
t2.AuctionID ,t2.product_name ,t3.user_name ,t1.date_time ,t1.owned_price
,t2.specific_product_id
FROM table_user_ownned_auction AS t1
INNER JOIN table_product AS t2 ON t1.specific_product_id=t2.specific_product_id
INNER JOIN table_user_information AS t3 ON t3.user_id=t1.user_id
ORDER BY ownned_id DESC
Here's the explain output
Looking at the explain output, your problem is in
The second line: The join with table t1.
Put an index on t1.specific_product_id and t2.specific_product_id.
the first line has one 3 rows in it, using filesort on that is actually faster than using the index because it saves on I/O-time.
The following code will add an index to t2.specific_product_id.
ALTER TABLE table_product ADD INDEX spi(specific_product_id);
Because you only have 29 rows of output, using the index should speed up your query to instantaneous.
If you want to understand the performance issues of a query, just use the EXPLAIN keyword in front of your query:
EXPLAIN SELECT SQL_CALC_FOUND_ROWS
,t2.AuctionID ,t2.product_name ,t3.user_name ,t1.date_time ,t1.owned_price
,t2.specific_product_id
FROM table_user_ownned_auction AS t1 inner
JOIN table_product AS t2 ON t1.specific_product_id=t2.specific_product_id
INNER JOIN table_user_information AS t3 ON t3.user_id=t1.user_id
ORDER BY ownned_id DESC
It will tell you important information about your query. The most important columns are "key" and "Extra".
If "key" is NULL you need an index. Mostly for columns that are used in WHERE or GROUP BY or ORDER BY statements. "Extra" tells you about resource-consuming (CPU or Memory) operations.
So, add an index on the "ownned_id" (which i presume should be "owned_id") and explain it again. Then look at the performance gain.
If you have problems, I can help you better if you paste the EXPLAIN output.
By looking at your explain table ,type are all which is very bad if you have more than 10 000 row in your table .I Strongly advise to index this column in your table
t1.specific_product_id
t2.specific_product_id
t3.user_id
t1.user_id
If should your table reach 10 000 row , you should be be able to see perfomance boost. For more information please this video from 00:00 to 02:04 minutes , As you can in the below video , before indexing the query have to search more than 90000 row of data and after index it will search less than 5 row of data hope it will help.
https://www.youtube.com/edit?o=U&video_id=ojyEcNMAj8k

How do I optimize this query?

The following query gets the info that I need. However, I noticed that as the tables grow, my code gets slower and slower. I'm guessing it is this query. Can this written a different way to make it more efficient? I've heard a lot about using joins instead of subqueries, however, I don't "get" how to do it.
SELECT * FROM
(SELECT MAX(T.id) AS MAXid
FROM transactions AS T
GROUP BY T.position
ORDER BY T.position) AS result1,
(SELECT T.id AS id, T.symbol, T.t_type, T.degree, T.position, T.shares, T.price, T.completed, T.t_date,
DATEDIFF(CURRENT_DATE, T.t_date) AS days_past,
IFNULL(SUM(S.shares), 0) AS subtrans_shares,
T.shares - IFNULL(SUM(S.shares),0) AS due_shares,
(SELECT IFNULL(SUM(IF(SO.t_type = 'sell', -SO.shares, SO.shares )), 0)
FROM subtransactions AS SO WHERE SO.symbol = T.symbol) AS owned_shares
FROM transactions AS T
LEFT OUTER JOIN subtransactions AS S
ON T.id = S.transid
GROUP BY T.id
ORDER BY T.position) AS result2
WHERE MAXid = id
Your code:
(SELECT MAX(T.id) AS MAXid
FROM transactions AS T [<--- here ]
GROUP BY T.position
ORDER BY T.position) AS result1,
(SELECT T.id AS id, T.symbol, T.t_type, T.degree, T.position, T.shares, T.price, T.completed, T.t_date,
DATEDIFF(CURRENT_DATE, T.t_date) AS days_past,
IFNULL(SUM(S.shares), 0) AS subtrans_shares,
T.shares - IFNULL(SUM(S.shares),0) AS due_shares,
(SELECT IFNULL(SUM(IF(SO.t_type = 'sell', -SO.shares, SO.shares )), 0)
FROM subtransactions AS SO WHERE SO.symbol = T.symbol) AS owned_shares
FROM transactions AS T [<--- here ]
Notice the [<---- here ] marks I added to your code.
The first T is not in any way related to the second T. They have the same correlation alias, they refer to the same table, but they're entirely independent selects and results.
So what you're doing in the first, uncorrelated, subquery is getting the max id for all positions in transactions.
And then you're joining all transaction.position.max(id)s to result2 (which result2 happens to be a join of all transaction.positions to subtransactions). (And the internal order by is pointless and costly, too, but that's not the main problem.)
You're joining every transaction.position.max(id) to every (whatever result 2 selects).
On Edit, after getting home: Ok, you're not Cartesianing, the "where MAXid = id" does join result1 to result2. But you're still rolling up all rows of transaction in both queries.
So you're getting a Cartesian join -- every result1 joined to every result2, unconditionally (nothing tells the database, for example, that they ought to be joined by (max) id or by position).
So if you have ten unique position.max(id)s in transaction, you're getting 100 rows. 1000 unique positions, a million rows. Etc.
When you want to write a complicated query like this, it's a lot easier if you compose it out of simpler views. in particular, you can test each view on its own, to make sure you're getting reasonable results, and then just join the views.
I would split the query into smaller chunks, probably using a stored proc. For example get the max ids from transaction and put this in a table variable. Then join this with subtransactions. This will make it easier for you and the compiler to work out what is going on.
Also without knowing what indexes are on your table it is hard to offer more advice
Put a benchmark function in the code. Then time each section of the code to determine where the slow down is happening. Often times the slow down happens in a different query than you first guess. Determine the correct query that needs to be optimized before posting to stackoverflow.