I have table like,
id | OpenDate | CloseDate
------------------------------------------------
1 | 2013-01-16 07:30:48 | 2013-01-16 10:49:48
2 | 2013-01-16 08:30:00 | NULL
I needed to get combined result as below
id | date | type
---------------------------------
1 | 2013-01-16 07:30:48 | Open
1 | 2013-01-16 10:49:48 | Close
2 | 2013-01-16 08:30:00 | Open
I used UNION to get above output (can we achieve without UNION?)
SELECT id,date,type FROM(
SELECT id,`OpenDate` as date, 'Open' as 'type' FROM my_table
UNION ALL
SELECT id,`CloseDate` as date, 'Close' as 'type' FROM my_table
)AS `tab` LIMIT 0,15
I am getting the desired output, but now in performance side--> i have 4000 records in my table and by doing UNION its combining and giving around 8000 records, which is making very slow to load the site(more than 13 sec). How can i optimize this query to fasten the output?
I tried LIMIT in sub-query also, pagination offset is not working properly as it should if i use LIMIT in sub-query. Please help me to resolve this.
Update
EXPLAIN result
id select_type table type key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL 8858
2 DERIVED orders index OpenDate 4 NULL 4588 Using index
3 UNION orders index CloseDate 4 NULL 4588 Using index
NULL UNION RESULT<union2,3> ALL NULL NULL NULL NULL
I would do something like the following:
SELECT
id,
IF(act, t1.OpenDate, t1.CloseDate) as `date`,
IF(act, 'Open', 'Close') as `type`
FROM my_table t1
JOIN (SELECT 1 as act UNION SELECT 0) as _
JOIN my_table t2 USING (id);
Related
My question is how can i speed this up. There should be more elegant way to handle it.
The inner select is done in 0,038 sec but this one is done in 6,007 sec i dont know how can i improve this performace
select * FROM table
where number1 in (
SELECT number1
FROM table
WHERE `date` = 'yyyy-mm-dd'
AND value1 = 'variable1'
AND value2 = 1
)
the thing is that i need range of values from the same table if one contains in the value2 the variable 1
so from table like that
id|number1| value1 | value2
1 | 11403 | exempl1 | null
2 | 11404 | exempl1 | 1
3 | 11404 | exempl1 | null
4 | 11405 | exempl1 | null
5 | 11405 | exempl1 | null
i get only this
id|number1| value1 | value2
2 | 11404 | exempl1 | 1
3 | 11404 | exempl1 | null
You can convert it into Correlated Subquery with Exists. MySQL optimizer may be able to use Indexes (if defined) in this case.
SELECT t1.*
FROM table AS t1
WHERE EXISTS( SELECT 1
FROM table AS t2
WHERE t2.number1 = t1.number1 AND
t2.`date` = 'yyyy-mm-dd' AND
t2.value1 = 'variable1' AND
t2.value2 = 1 )
If indexes are not defined, you can define a Composite Index on (number1, date, value1, value2), for better performance.
ALTER TABLE `table`
ADD INDEX comp_idx1(number1, `date`, value1, value2);
Another optimization possibility is to fetch only those columns which are really required in your application code. Avoid using SELECT *. You may read: Why is SELECT * considered harmful?
I am Newbie to mysql....it may look dump question....but i have been trying this from 3 hours...here what i am trying to do....
SELECT
MERCHANT_ID,
IFNULL(COUNT(SUBSCRIBE_ID),0)
FROM SUBSCRIBE_TABLE
WHERE
MERCHANT_ID IS NULL OR
MERCHANT_ID IN(1000000000066,1000000000104,1000000000103,1000000000105)
GROUP BY MERCHANT_ID
ORDER BY
FIND_IN_SET(MERCHANT_ID,'1000000000066,1000000000104,1000000000103,1000000000105');
AND the output is...
+------------------+---------------------------------+
| MERCHANT_ID | IFNULL(COUNT(SUBSCRIBE_ID),0) |
+------------------+---------------------------------+
| 1000000000066 | 2 |
| 1000000000103 | 1 |
+------------------+---------------------------------+
but i am expecting in following manner...
+------------------+---------------------------------+
| MERCHANT_ID | IFNULL(COUNT(SUBSCRIBE_ID),0) |
+------------------+---------------------------------+
| 1000000000066 | 2 |
| 1000000000104 | 0 |
| 1000000000103 | 1 |
| 1000000000105 | 0 |
+------------------+---------------------------------+
i tried adding MERCHANT_ID IS NULL... but not able get the result with default value... :(
You will only get records that are actually in SUBSCRIBE_TABLE. If you want to get records for all your ids, you have to "create a temporary table" (or use a subquery with UNION in thise case) with those values first, and then join your results to it.
Your query could look like this:
SELECT
merchant_id,
COUNT(subscribe_id)
FROM
(SELECT 1000000000066 AS merchant_id, 1 AS SortKey
UNION ALL
SELECT 1000000000104 AS merchant_id, 2 AS SortKey
UNION ALL
SELECT 1000000000103 AS merchant_id, 3 AS SortKey
UNION ALL
SELECT 1000000000105 AS merchant_id, 4 AS SortKey
) AS temp
LEFT JOIN subscribe_table USING (merchant_id)
GROUP BY merchant_id
ORDER BY SortKey ASC
I replaced your FIND_IN_SET with the column SortKey in the subquery. COUNT will only count non-null rows and will return 0 if none are found. You don't need the IFNULL around it.
If you have more than those 4 merchant_ids you might want to look into doing the same thing with a temporary table. See here for examples:
Mysql: Create inline table within select statement?
For example I have created 3 index:
click_date - transaction table, daily_metric table
order_date - transaction table
I want to check does my query use index, I use EXPLAIN function and get this result:
+----+--------------+--------------+-------+---------------+------------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+--------------+-------+---------------+------------+---------+------+--------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 668 | Using temporary; Using filesort |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 645 | |
| 2 | DERIVED | <derived4> | ALL | NULL | NULL | NULL | NULL | 495 | |
| 4 | DERIVED | transaction | ALL | order_date | NULL | NULL | NULL | 291257 | Using where; Using temporary; Using filesort |
| 3 | DERIVED | daily_metric | range | click_date | click_date | 3 | NULL | 812188 | Using where; Using temporary; Using filesort |
| 5 | UNION | <derived7> | ALL | NULL | NULL | NULL | NULL | 495 | |
| 5 | UNION | <derived6> | ALL | NULL | NULL | NULL | NULL | 645 | Using where; Not exists |
| 7 | DERIVED | transaction | ALL | order_date | NULL | NULL | NULL | 291257 | Using where; Using temporary; Using filesort |
| 6 | DERIVED | daily_metric | range | click_date | click_date | 3 | NULL | 812188 | Using where; Using temporary; Using filesort |
| NULL | UNION RESULT | <union2,5> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+-------+---------------+------------+---------+------+--------+----------------------------------------------+
In EXPLAIN results I see, that index order_date of transaction table is not used, do I correct understand ?
Index click_date of daily_metric table was used correct ?
Please tell my how to understand from EXPLAIN result does my created index is used in query properly ?
My query:
SELECT
partner_id,
the_date,
SUM(clicks) as clicks,
SUM(total_count) as total_count,
SUM(count) as count,
SUM(total_sum) as total_sum,
SUM(received_sum) as received_sum,
SUM(partner_fee) as partner_fee
FROM (
SELECT
clicks.partner_id,
clicks.click_date as the_date,
clicks,
orders.total_count,
orders.count,
orders.total_sum,
orders.received_sum,
orders.partner_fee
FROM
(SELECT
partner_id, click_date, sum(clicks) as clicks
FROM
daily_metric WHERE DATE(click_date) BETWEEN '2013-04-01' AND '2013-04-30'
GROUP BY partner_id , click_date) as clicks
LEFT JOIN
(SELECT
partner_id,
DATE(order_date) as order_dates,
SUM(order_sum) as total_sum,
SUM(customer_paid_sum) as received_sum,
SUM(partner_fee) as partner_fee,
count(*) as total_count,
count(CASE
WHEN status = 1 THEN 1
ELSE NULL
END) as count
FROM
transaction WHERE DATE(order_date) BETWEEN '2013-04-01' AND '2013-04-30'
GROUP BY DATE(order_date) , partner_id) as orders ON orders.partner_id = clicks.partner_id AND clicks.click_date = orders.order_dates
UNION ALL SELECT
orders.partner_id,
orders.order_dates as the_date,
clicks,
orders.total_count,
orders.count,
orders.total_sum,
orders.received_sum,
orders.partner_fee
FROM
(SELECT
partner_id, click_date, sum(clicks) as clicks
FROM
daily_metric WHERE DATE(click_date) BETWEEN '2013-04-01' AND '2013-04-30'
GROUP BY partner_id , click_date) as clicks
RIGHT JOIN
(SELECT
partner_id,
DATE(order_date) as order_dates,
SUM(order_sum) as total_sum,
SUM(customer_paid_sum) as received_sum,
SUM(partner_fee) as partner_fee,
count(*) as total_count,
count(CASE
WHEN status = 1 THEN 1
ELSE NULL
END) as count
FROM
transaction WHERE DATE(order_date) BETWEEN '2013-04-01' AND '2013-04-30'
GROUP BY DATE(order_date) , partner_id) as orders ON orders.partner_id = clicks.partner_id AND clicks.click_date = orders.order_dates
WHERE
clicks.partner_id is NULL
ORDER BY the_date DESC
) as t
GROUP BY the_date ORDER BY the_date DESC LIMIT 50 OFFSET 0
Although I can't explain what the EXPLAIN has dumped, I thought there must be an easier solution to what you have and came up with the following. I would suggest the following indexes to optimize your existing query for the WHERE date range and grouping by partner.
Additionally, when you have a query that uses a FUNCTION on a field, it doesn't take advantage of the index. Such as your DATE(order_date) and DATE(click_date). To allow the index to better be used, qualify the full date/time such as 12:00am (morning) up to 11:59pm. I would typically to this via
x >= someDate #12:00 and x < firstDayAfterRange.
in your example would be (notice less than May 1st which gets up to April 30th at 11:59:59pm)
click_date >= '2013-04-01' AND click_date < '2013-05-01'
Table Index
transaction (order_date, partner_id)
daily_metric (click_date, partner_id)
Now, an adjustment. Since your clicks table may have entries the transactions dont, and vice-versa, I would adjust this query to do a pre-query of all possible date/partners, then left-join to respective aggregate queries such as:
SELECT
AllParnters.Partner_ID,
AllParnters.the_Date,
coalesce( clicks.clicks, 0 ) Clicks,
coalesce( orders.total_count, 0 ) TotalCount,
coalesce( orders.count, 0 ) OrderCount,
coalesce( orders.total_sum, 0 ) OrderSum,
coalesce( orders.received_sum, 0 ) ReceivedSum,
coalesce( orders.partner_fee 0 ) PartnerFee
from
( select distinct
dm.partner_id,
DATE( dm.click_date ) as the_Date
FROM
daily_metric dm
WHERE
dm.click_date >= '2013-04-01' AND dm.click_date < '2013-05-01'
UNION
select
t.partner_id,
DATE(t.order_date) as the_Date
FROM
transaction t
WHERE
t.order_date >= '2013-04-01' AND t.order_date < '2013-05-01' ) AllParnters
LEFT JOIN
( SELECT
dm.partner_id,
DATE( dm.click_date ) sumDate,
sum( dm.clicks) as clicks
FROM
daily_metric dm
WHERE
dm.click_date >= '2013-04-01' AND dm.click_date < '2013-05-01'
GROUP BY
dm.partner_id,
DATE( dm.click_date ) ) as clicks
ON AllPartners.partner_id = clicks.partner_id
AND AllPartners.the_date = clicks.sumDate
LEFT JOIN
( SELECT
t.partner_id,
DATE(t.order_date) as sumDate,
SUM(t.order_sum) as total_sum,
SUM(t.customer_paid_sum) as received_sum,
SUM(t.partner_fee) as partner_fee,
count(*) as total_count,
count(CASE WHEN t.status = 1 THEN 1 ELSE NULL END) as COUNT
FROM
transaction t
WHERE
t.order_date >= '2013-04-01' AND t.order_date < '2013-05-01'
GROUP BY
t.partner_id,
DATE(t.order_date) ) as orders
ON AllPartners.partner_id = orders.partner_id
AND AllPartners.the_date = orders.sumDate
order by
AllPartners.the_date DESC
limit 50 offset 0
This way, the first query will be quick on the index to get all possible combinations from EITHER table. Then the left-join will AT MOST join to one row per set. If found, get the number, if not, I am applying COALESCE() so if null, defaults to zero.
CLARIFICATION.
Like you when building your pre-aggregate queries of "clicks" and "orders", the "AllPartners" is the ALIAS result of the select distinct of partners and dates within the date range you were interested in. The resulting columns of that where were "partner_id" and "the_date" respective to your next queries. So this is the basis of joining to the aggregates of "clicks" and "orders". So, since I have these two columns in the alias "AllParnters", I just grabbed those for the field list since they are LEFT-JOINed to the other aliases and may not exist in either/or the respective others.
Suppose I have a table like this:
id | price | group1
1 | 6 | some_group
2 | 7 | some_group
3 | 8 | some_group
4 | 9 | some_other_group
If I want to select the lowest price grouped by group1
I can just do this:
SELECT id, min(price), group1 FROM some_table GROUP BY group1;
The problem is when I have a table which is not sorted by price like this:
id | price | group1
1 | 8 | some_group
2 | 7 | some_group
3 | 6 | some_group
4 | 9 | some_other_group
Then my query returns this result set:
id | price | group1
1 | 6 | some_group
4 | 9 | some_other_group
The problem is that I get 1 in the id column but the id of the row with the price of 6 is not 1 but 3.
My question is that how can I get the values from the row which contains the minimum price when I use GROUP BY?
I tried this:
SELECT f.id, min(f.price), f.group1 FROM (SELECT * FROM some_table ORDER BY price) f
GROUP BY f.group1;
but this is really slow and if I have multiple columns and aggregations it may fail.
Please note that the names above are just for demonstration purposes. My real query looks like this:
SELECT depdate, retdate, min(totalprice_eur) price FROM
(SELECT * FROM flight_results
WHERE (
fromcity = 30001350
AND tocity = 30001249
AND website = 80102118
AND roundtrip = 1
AND serviceclass = 1
AND depdate > date(now()))
ORDER BY totalprice_eur) F
WHERE (
fromcity = 30001350
AND tocity = 30001249
AND website = 80102118
AND roundtrip = 1
AND serviceclass = 1
AND depdate > date(now()))
GROUP BY depdate,retdate
and there is a concatenated primary key including website, fromcity, tocity, roundtrip, depdate, and retdate. There are no other indexes.
Explain says:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 2837 Using where; Using temporary; Using filesort
2 DERIVED flight_results ALL PRIMARY NULL NULL NULL 998378 Using filesort
You can do this instead:
SELECT t1.id, t1.price, t1.group1
FROM some_table AS t1
INNER JOIN
(
SELECT min(price) minprice, group1
FROM some_table
GROUP BY group1
) AS t2 ON t1.price = t2.minprice AND t1.group1 = t2.group1;
Could you please help me optimize this query. I've spent lots of time and still cannot rephrase it to be fast enough (say running in the matters of seconds, not minutes as it is now).
The query:
SELECT m.my_id, m.my_value, m.my_timestamp
FROM (
SELECT my_id, MAX(my_timestamp) AS most_recent_timestamp
FROM my_table
WHERE my_timestamp < '2011-03-01 08:00:00'
GROUP BY my_id
) as tmp
LEFT OUTER JOIN my_table m
ON tmp.my_id = m.my_id AND tmp.most_recent_timestamp = m.my_timestamp
ORDER BY m.my_timestamp;
my_table is defined as follows:
CREATE TABLE my_table (
my_id INTEGER NOT NULL,
my_value VARCHAR(4000),
my_timestamp TIMESTAMP default CURRENT_TIMESTAMP NOT NULL,
INDEX MY_ID_IDX (my_id),
INDEX MY_TIMESTAMP_IDX (my_timestamp),
INDEX MY_ID_MY_TIMESTAMP_IDX (my_id, my_timestamp)
);
The goal of this query is to select the most recent my_value for each my_idbefore some timestamp. my_table contains ~100 million entries and it takes ~8 minutes to perform it.
explain:
+----+-------------+-------------+-------+------------------------------------------------+-------------------------+---------+---------------------------+-------+---------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+------------------------------------------------+-------------------------+---------+---------------------------+-------+---------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 90721 | Using temporary; Using filesort |
| 1 | PRIMARY | m | ref | MY_ID_IDX,MY_TIMESTAMP_IDX,MY_ID_TIMESTAMP_IDX | MY_TIMESTAMP_IDX | 4 | tmp.most_recent_timestamp | 1 | Using where |
| 2 | DERIVED | my_table | range | MY_TIMESTAMP_IDX | MY_ID_MY_TIMESTAMP_IDX | 8 | NULL | 61337 | Using where; Using index for group-by |
+----+-------------+-------------+-------+------------------------------------------------+-----------------------+---------+---------------------------+------+---------------------------------------+
If I understand correctly, you should be able to drop the nested select completely, and move the where clause to the main query, order by my_timestamp descending and limit 1.
SELECT my_id, my_value, max(my_timestamp)
FROM my_table
WHERE my_timestamp < '2011-03-01 08:00:00'
GROUP BY my_id
*edit - added max and group by
a trick to get a most recent record can be to use order by together with 'limit 1' instead of max aggregation together with "self" join
somthing like this (not tested):
SELECT m.my_id, m.my_value, m.my_timestamp
FROM my_table m
WHERE my_timestamp < '2011-03-01 08:00:00'
ORDER BY m.my_timestamp DESC
LIMIT 1
;
update above doesn't work because a grouping is required...
other solution that has WHERE-IN-SubSelect instead of the JOIN you've used.
could be faster. please test with your data.
SELECT m.my_id, m.my_value, m.my_timestamp
FROM my_table m
WHERE ( m.my_id, m.my_timestamp ) IN (
SELECT i.my_id, MAX(i.my_timestamp)
FROM my_table i
WHERE i.my_timestamp < '2011-03-01 08:00:00'
GROUP BY i.my_id
)
ORDER BY m.my_timestamp;
I notice in the explain plan that the optimizer is using the MY_ID_MY_TIMESTAMP_IDX index for the sub-query, but not the outer query.
You may be able to speed it up using an index hint. I also updated the ON clause to refer to tmp.most_recent_timestamp using its alias.
SELECT m.my_id, m.my_value, m.my_timestamp
FROM (
SELECT my_id, MAX(my_timestamp) AS most_recent_timestamp
FROM my_table
WHERE my_timestamp < '2011-03-01 08:00:00'
GROUP BY my_id
) as tmp
LEFT OUTER JOIN my_table m use index (MY_ID_MY_TIMESTAMP_IDX)
ON tmp.my_id = m.my_id AND tmp.most_recent_timestamp = m.my_timestamp
ORDER BY m.my_timestamp;