I see a big difference in speed between these two queries; the first one runs in 0.3 seconds, and the second in 76 seconds.
The first query only selects the key, whereas the second query select an additional field, which is an int(11). I can substitute the second field for any other field, same result. Selecting only the key is much faster for some reason?
Can anyone possibly explain the huge difference in speed? I'm stumped by this.
Q1:
SELECT ID
FROM TRMSMain.tblcalldata
WHERE (
CallStarted BETWEEN '2014/06/13' AND '2014/06/13 23:59:59')
ORDER BY ID DESC LIMIT 0 , 50
Q2:
SELECT ID, Chanid
FROMTRM SMain.tblcalldata
WHERE (
CallStarted BETWEEN '2014/06/13' AND '2014/06/13 23:59:59')
ORDER BY ID DESC LIMIT 0 , 50
Regards
I suppose you have an index on your CallStarted column and ID is the primary key. Your first query can do a so-called range scan on that index, and retrieve the row identities quickly because secondary indexes on InnoDB also include the primary key.
So it ends up doing just a little bit of work.
Your second query has to fetch data from the main table. In particular it has to fetch the ChanID variable. It then has to sort the whole mess, grab 50 values from the end, and discard the rest of the sort.
Do a deferred join to pick up the extra columns. That is, just sort the ID numbers, then grab the rest of the data you need from the table. That way you only have to grab 50 rows' worth of data.
Like so:
SELECT a.ID, a.Chanid, a.WhatEver, a.WhatElse
FROM TRMSMain.tblcalldata a
JOIN (
SELECT ID
FROM TRMSMain.tblcalldata
WHERE CallStarted BETWEEN '2014/06/13' AND '2014/06/13 23:59:59'
ORDER BY ID DESC
LIMIT 0, 50
) b ON a.ID = b.ID
ORDER BY a.ID DESC
You know the inner query is fast; you've proven that. The JOIN merely exploits that quickness (based on good use of indexes) to get the detail data for the rows it needs.
Pro tip: Avoid BETWEEN for date/time ranges, because as you know it handles the end of the range poorly. This will perform just as well and avoid the 59:59 nonsense.
WHERE CallStarted >= '2014/06/13'
AND CallStarted < '2014/06/13' + INTERVAL 1 DAY
It grabs records starting at midnight on June 13, and gets them all up to but not including (<) midnight on the next day.
Related
I need to know how many orders made to each product within a day by their ids. I tried select all the product_today.id. And count each of them from the second table - product_today_order.hid. I'm now have 20k+ rows of data. It took me 10s+ only this query.
Is there any way to make the query faster?
SELECT t.id,(select count(o.hid) from product_today_order o where o.hid=t.id) as zid
FROM product_today t
where date(t.dtime)='2021-11-26'
group by t.id
5 tips:
Probably the main slowdown is the un-sargable date(t.dtime)='...'. Change that to
WHERE t.dtime >= '2021-11-26'
AND t.dtime < '2021-11-26' + INTERVAL 1 DAY
Also, get rid of the GROUP BY. It is unnecessary (if t.id is the PRIMARY KEY).
Do you have an index on t that starts with dtime?
Do you need to check o.hid for being not-NULL? If not, simply say COUNT(*).
Do you have an index on o that starts with hid?
I have around 6 million rows in the table and I am query the table with below query.
SELECT * FROM FD_CPC_HISTORICAL_DATA WHERE id IN (SELECT MAX(id) FROM FD_CPC_HISTORICAL_DATA WHERE fb_ads_account_id=1462257067274960 AND created_at BETWEEN '2019-12-13 00:00:00' AND '2019-12-13 23:59:59' GROUP BY source_text) \G
I have created index for fb_ads_account_id, created_at, source_text. id is primary key.
My question is why this query takes around 9 seconds to get the result even though I have created indexes?
Is there any other way to create this query more efficient?
Here is mysql explain command explanation
This is your query:
SELECT hd.*
FROM FD_CPC_HISTORICAL_DATA hd
WHERE hd.id IN (SELECT MAX(hd2.id)
FROM FD_CPC_HISTORICAL_DATA hd2
WHERE hd2.fb_ads_account_id = 1462257067274960 AND
hd2.created_at >= '2019-12-13' AND
hd2.created_at < '2019-12-14'
GROUP BY source_text
);
I would recommend writing this as:
SELECT hd.*
FROM FD_CPC_HISTORICAL_DATA hd
WHERE hd.fb_ads_account_id = 1462257067274960 AND
hd.id = (SELECT MAX(hd2.id)
FROM FD_CPC_HISTORICAL_DATA hd2
WHERE hd2.fb_ads_account_id = hd.hd.fb_ads_account_id AND
hd2.source_text = hd.source_tx AND
hd2.created_at >= '2019-12-13' AND
hd2.created_at < '2019-12-14'
);
For this query, you want an index on FD_CPC_HISTORICAL_DATA(fb_ads_account_id, source_text,created_at).
This query probably can be performed without a subquery against the same table ie:
SELECT * FROM FD_CPC_HISTORICAL_DATA
WHERE fb_ads_account_id=1462257067274960
AND created_at BETWEEN '2019-12-13 00:00:00' AND '2019-12-13 23:59:59'
ORDER BY id DESC LIMIT 1
if you want the max ID. Or something similar, I am not sure you need the GROUP BY to get the desired result.
I think the index is exactly what you need. The part in the EXPLAIN that confuses me is the (guesstimated?) amount of rows from the subquery being so different from the one in the primary query.
To be honest, I'm not very familiar with MYSQL, but in MSSQL I would give it a try to first dump the results from the subquery into a temporary table, put a unique clustered index on it and then select everything from the original table joined to said temporary table on the ID column. (Don't use IN, use JOIN as there can't be any doubles in the temporary table)
This might also show where all the time is being spent.
My guess is that this is mostly a statistics issue but I don't really know how to force an update of the statistics on the index in MYSQL.
(there is some talk about FLUSH TABLE in https://dzone.com/articles/updating-innodb-table-statistics-manually but it seems to come with some downsides too, use with care)
SELECT f.*
FROM
( SELECT source_text, MAX(created_at) AS mx
FROM FD_CPC_HISTORICAL_DATA
WHERE fb_ads_account_id=1462257067274960
AND created_at >= '2019-12-13'
AND created_at < '2019-12-13' + INTERVAL 1 DAY
GROUP BY source_text
) AS x
JOIN FD_CPC_HISTORICAL_DATA AS f
ON f.account_id = x.account_id
AND f.source_text = x.source_text
AND f.created_at = x.mx
Then you need this composite index:
INDEX(account_id, source_text, created_at) -- in this order
If this does not quite work because of duplicate entries with the same created_at, then a tweak may be possible.
I have InnoDB table Items with multi-column non-unique index (group_id, type_id, expiry_date).
In case of making query
SELECT * FROM Items WHERE group_id = 1 AND type_id IN (1,2,3) AND expiry_date BETWEEN '2017-01-01' AND '2018-01-01'
Will index work fine as I'm using IN for second field of index and them additionally have range for 3rd will or I should benefit from spitting it to?
SELECT * FROM Items WHERE group_id = 1 AND type_id = 1 AND expiry_date BETWEEN '2017-01-01' AND '2018-01-01'
UNION
SELECT * FROM Items WHERE group_id = 1 AND type_id = 2 AND expiry_date BETWEEN '2017-01-01' AND '2018-01-01'
UNION
SELECT * FROM Items WHERE group_id = 1 AND type_id = 3 AND expiry_date BETWEEN '2017-01-01' AND '2018-01-01'
EXPLAIN shows identical query plans for both queries, but I have quite a small table for testing and not sure if query optimiser will act same way on big amount of data.
And how in general does index works for case using IN/OR/BETWEEN on 2 consequent field in index?
For your second query, use union all rather than union. You always want union all, unless you want to incur overhead for removing duplicates.
I would guess that you would benefit from the second query on larger data. I don't think MySQL supports skip-scans on indexes, so the index is only being used to group_id and type_id, but not directly for date.
What version of MySQL/MariaDB? There have been Optimizations recently; don't now if they would help here.
You have a possible bug -- in including an extra day in AND expiry_date BETWEEN '2017-01-01' AND '2018-01-01'. Change to
AND expiry_date >= '2017-01-01'
AND expiry_date < '2017-01-01' + INTERVAL 1 YEAR
(This count as a single 'range' test. BETWEEN is also a range test, but it is 'inclusive', hence the 'bug'.)
I would simply have two composite indexes (if I could not find the real answer to your Question):
(group_id, type_id, expiry_date)
(group_id, expiry_date)
Case 1: The Optimizer can get past IN: then the first index works.
Case 2: The optimizer cannot get past IN: Then one of these happens:
The IN list has only one item. Then it is converted from IN to =, and the first index is optimal, with all 3 columns used.
The optimizer decides the first index is better -- small IN list, large date range.
The optimizer decides that the date range is better (smaller range) and picks the second index.
The UNION approach may or may not be better in this situation. There is a lot of overhead of gathering the data into a temp table. The temp table was recently eliminated, but only for certain cases of UNION ALL.
Yes, use UNION ALL. That eliminates an sort and possibly an extra temp table.
Test with a large dataset. For under 1K rows, the performance is not likely to matter.
Rule of Thumb in ordering columns in an index:
= test(s)
IN, if any
one 'range' (BETWEEN, <, etc), if any
Consider making a "covering" index.
My Cookbook
There are other optimizations that depend on what is in the * in SELECT *.
I have a very big unindexed table called table with rows like this:
IP entrypoint timestamp
171.128.123.179 /page-title/?kw=abc 2016-04-14 11:59:52
170.45.121.111 /another-page/?kw=123 2016-04-12 04:13:20
169.70.121.101 /a-third-page/ 2016-05-12 09:43:30
I want to make the fastest query that, given 30 IPs and one date, will search rows as far back a week before that date and return the most recent row that contains "?kw=" for each IP. So I want DISTINCT entrypoints but only the most recent one.
I'm stuck by this I know it's a relatively simple INNER JOIN but I don't know the fastest way to do it.
By the way: I can't add the index right now because it's very big and on a db that serves a website. I'm going to replace it with an indexed table don't worry.
Rows from the table
SELECT ...
FROM very_big_unindexed_table t
only within the past week...
WHERE t.timestamp >= NOW() + INTERVAL - 1 WEEK
that contains '?kw=' in the entry point
AND t.entrypoint LIKE '%?kw=%'
only the latest row for each IP. There's a couple of approaches to that. A correlated subquery on a very big unindexed table is going to eat your lunch and your lunch box. And without an index, there's no getting around a full scan of the table and a "Using filesort" operation.
Given the unfortunate circumstances, our best bet for performance is likely going to be getting the set whittled down as small as we can, and then perform the sort, and avoid any join operations (back to that table) and avoid correlated subqueries.
So, let's start with something like this, to return all of the rows from the past week with '?kw=' in entry point. This is going to be full scan of the table, and a sort operation...
SELECT t.ip
, t.timestamp
, t.entry_point
FROM very_big_unindexed_table t
WHERE t.timestamp >= NOW() + INTERVAL -1 WEEK
AND t.entrypoint LIKE '%?kw=%'
ORDER BY t.ip DESC, t.timestamp DESC
We can use an unsupported trick with user-defined variables. (The MySQL Reference Manual specifically warns against using a pattern like this, because the behavior is (officially) undefined. Unofficially, the optimizer in MySQL 5.1 and 5.5 (at least) is very predictable.
I think this is going to be about as good as you are going to get, if the number of rows from the past week are significant subset of the entire table. This is going to create a sizable intermediate resultset (derived table), if there are lot of rows that satisfy the predicates.
SELECT q.ip
, q.entrypoint
, q.timestamp
FROM (
SELECT IF(t.ip = #prev_ip, 0, 1) AS new_ip
, #prev_ip := t.ip AS ip
, t.timestamp AS timestamp
, t.entrypoint AS entrypoint
FROM (SELECT #prev_ip := NULL) i
CROSS
JOIN very_big_unindexed_table t
WHERE t.timestamp >= NOW() + INTERVAL -1 WEEK
AND t.entrypoint LIKE '%?kw=%'
ORDER BY t.ip DESC, t.timestamp DESC
) q
WHERE q.new_ip
Execution of that query will require (in terms of what's going to take the time)
a full scan of the table (there's no way to get around that)
a sort operation (again, there's no way around that)
materializing a derived table containing all of the rows that satisfy the predicates
a pass through the derived table to pull out the "latest" row for each IP
I have a table "A" with a "date" field. I want to make a select query and order the rows with previous dates in a descending order, and then, the rows with next dates in ascending order, all in the same query. Is it possible?
For example, table "A":
id date
---------------------
a march-20
b march-21
c march-22
d march-23
e march-24
I'd like to get, having as a starting date "march-22", this result:
id date
---------------------
c march-22
b march-21
a march-20
d march-23
e march-24
In one query, because I'm doing it with two of them and it's slow, because the only difference is the sorting, and the joins I have to do are a bit "heavy".
Thanks a lot.
You could use something like this -
SELECT *
FROM test
ORDER BY IF(
date <= '2012-03-22',
DATEDIFF('2000-01-01', date),
DATEDIFF(date, '2000-01-01')
);
Here is a link to a test on SQL Fiddle - http://sqlfiddle.com/#!2/31a3f/13
That's wrong, sorry :(
From documentation:
However, use of ORDER BY for individual SELECT statements implies nothing about the order in which the rows appear in the final result because UNION by default produces an unordered set of rows. Therefore, the use of ORDER BY in this context is typically in conjunction with LIMIT, so that it is used to determine the subset of the selected rows to retrieve for the SELECT, even though it does not necessarily affect the order of those rows in the final UNION result. If ORDER BY appears without LIMIT in a SELECT, it is optimized away because it will have no effect anyway.
This should do the trick. I'm not 100% sure about adding an order in a UNION...
SELECT * FROM A where date <= now() ORDER BY date DESC
UNION SELECT * FROM A where date > now() ORDER BY date ASC
I think the real question here is how to do the joining once. Create a temporary table with the result of joining, and make the 2 selects from that table. So it will be be time consuming only on creation (once) not on select query (twice).
CREATE TABLE tmp SELECT ... JOIN -- do the heavy duty here
With this you can make the two select statenets as you originally did.