I am making a MySQL query of a table with thousands of records. What I'm really trying to do is find the next and previous rows that surround a particular ID. The issue is that when sorting the table in a specific way, there is no correlation between IDs (I can't just search for id > $current_id LIMIT 1, for example, because the needed ID in the next row might or might not actually be higher. Here is an example:
ID Name Date
4 Fred 1999-01-04
6 Bill 2002-04-02
7 John 2002-04-02
3 Sara 2002-04-02
24 Beth 2007-09-18
1 Dawn 2007-09-18
Say I know I want the records that come directly before and after John (ID = 7). In this case, the ID of the record after that row is actually a lower number. The table is sorted by date first and then by name, but there are many entires with the same date - so I can't just look for the next date, either. What is the best approach to find, in this case, the row before and (separately) the row after ID 7?
Thank you for any help.
As others have suggested you can use window functions for this, but I would use LEAD() and LAG() instead of ROW_NUMBER().
SELECT *
FROM (
SELECT
*,
LAG(ID) OVER (ORDER BY `Date` ASC, `Name` ASC) `prev`,
LEAD(ID) OVER (ORDER BY `Date` ASC, `Name` ASC) `next`
FROM `tbl`
) t
WHERE `ID` = 7;
With thousands of records (very small) this should be very fast but if you expect it to grow to hundreds of thousands, or even millions of rows you should try to limit the amount of work being done in the inner query. Sorting millions of rows and assigning prev and next values to all of them, just to use one row would be excessive.
Assuming your example of John (ID = 7) you could use the Date to constrain the inner query. If the adjacent records would always be within one month then you could do something like -
SELECT *
FROM (
SELECT
*,
LAG(ID) OVER (ORDER BY `Date` ASC, `Name` ASC) `prev`,
LEAD(ID) OVER (ORDER BY `Date` ASC, `Name` ASC) `next`
FROM `tbl`
WHERE `Date` BETWEEN '2002-04-02' - INTERVAL 1 MONTH AND '2002-04-02' + INTERVAL 1 MONTH
) t
WHERE `ID` = 7;
Without knowing more detail about the distribution of your data, I am only guessing but hopefully you get the idea.
You can use a window function called ROW_NUM in this way. ROW_NUM() OVER(). This will number every row in the table consecutively. Now you search for your I'd and you also get the Row number. For example, you search for ID=7 and you get row number 35. Now you can search for row number from 34 to 36 to get rows below and above the one with I'd 7.
This is what comes to mind:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY `date`, `name`) AS row_num
FROM people
) p1
WHERE row_num > (SELECT row_num FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY `date`, `name`) AS row_num
FROM people
) p2 WHERE p2.id = 7)
LIMIT 1;
Using the row number window function, you can compare two view of the table with id = 7 and get the row you need. You can change the condition in the subquery to suit your needs, e.g., p2.name = 'John' and p2.date = '2002-04-02'.
Here's a dbfiddle demonstrating: https://www.db-fiddle.com/f/mpQBcijLFRWBBUcWa3UcFY/2
Alternately, you can simplify the syntax a bit and avoid the redundancy using a CTE like this:
WITH p AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY `date`, `name`) AS row_num
FROM people
)
SELECT *
FROM p
WHERE row_num > (SELECT row_num FROM p WHERE p.id = 7)
LIMIT 1;
I'm trying to make a query for get some trend stats, but the benchmark is really slow. The query execution time is around 134 seconds.
I have a MySQL table called table_1.
Below the create statement
CREATE TABLE `table_1` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`original_id` bigint(11) DEFAULT NULL,
`invoice_num` bigint(11) DEFAULT NULL,
`registration` timestamp NULL DEFAULT NULL,
`paid_amount` decimal(10,6) DEFAULT NULL,
`cost_amount` decimal(10,6) DEFAULT NULL,
`profit_amount` decimal(10,6) DEFAULT NULL,
`net_amount` decimal(10,6) DEFAULT NULL,
`customer_id` bigint(11) DEFAULT NULL,
`recipient_id` text,
`cashier_name` text,
`sales_type` text,
`sales_status` text,
`sales_location` text,
`invoice_duration` text,
`store_id` double DEFAULT NULL,
`is_cash` int(11) DEFAULT NULL,
`is_card` int(11) DEFAULT NULL,
`brandid` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_registration_compound` (`id`,`registration`)
) ENGINE=InnoDB AUTO_INCREMENT=47420958 DEFAULT CHARSET=latin1;
I have set a compound index made of id+registration.
Below the query
SELECT
store_id,
CONCAT('[',GROUP_CONCAT(tot SEPARATOR ','),']') timeline_transactions,
SUM(tot) AS total_transactions,
CONCAT('[',GROUP_CONCAT(totalRevenues SEPARATOR ','),']') timeline_revenues,
SUM(totalRevenues) AS revenues,
CONCAT('[',GROUP_CONCAT(totalProfit SEPARATOR ','),']') timeline_profit,
SUM(totalProfit) AS profit,
CONCAT('[',GROUP_CONCAT(totalCost SEPARATOR ','),']') timeline_costs,
SUM(totalCost) AS costs
FROM (select t1.md,
COALESCE(SUM(t1.amount+t2.revenues), 0) AS totalRevenues,
COALESCE(SUM(t1.amount+t2.profit), 0) AS totalProfit,
COALESCE(SUM(t1.amount+t2.costs), 0) AS totalCost,
COALESCE(SUM(t1.amount+t2.tot), 0) AS tot,
t1.store_id
from
(
SELECT a.store_id,b.md,b.amount from ( SELECT DISTINCT store_id FROM table_1) AS a
CROSS JOIN
(
SELECT
DATE_FORMAT(a.DATE, "%m") as md,
'0' as amount
from (
select curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) month as Date
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
) a
where a.Date >='2019-01-01' and a.Date <= '2019-01-14'
group by md) AS b
)t1
left join
(
SELECT
COUNT(epl.invoice_num) AS tot,
SUM(paid_amount) AS revenues,
SUM(profit_amount) AS profit,
SUM(cost_amount) AS costs,
store_id,
date_format(epl.registration, '%m') md
FROM table_1 epl
GROUP BY store_id, date_format(epl.registration, '%m')
)t2
ON t2.md=t1.md AND t2.store_id=t1.store_id
group BY t1.md, t1.store_id) AS t3 GROUP BY store_id ORDER BY total_transactions desc
Below the EXPLAIN
Maybe I should change from timestamp to datetime in registration column?
About 90% of your execution time will be used to execute GROUP BY store_id, date_format(epl.registration, '%m').
Unfortunately, you cannot use an index to group by a derived value, and since this is vital to your report, you need to precalculate this. You can do this by adding that value to your table, e.g. using a generated column:
alter table table_1 add md varchar(2) as (date_format(registration, '%m')) stored
I kept the varchar format you used for the month here, you could also use a number (e.g. tinyint) for the month.
This requires MySQL 5.7, otherwise you can use triggers to achieve the same thing:
alter table table_1 add md varchar(2) null;
create trigger tri_table_1 before insert on table_1
for each row set new.md = date_format(new.registration,'%m');
create trigger tru_table_1 before update on table_1
for each row set new.md = date_format(new.registration,'%m');
Then add an index, preferably a covering index, starting with store_id and md, e.g.
create index idx_table_1_storeid_md on table_1
(store_id, md, invoice_num, paid_amount, profit_amount, cost_amount)
If you have other, similar reports, you may want to check if they use additional columns and could profit from covering more columns. The index will require about 1.5GB of storage space (and how long it takes your drive to read 1.5GB will basically single-handedly define your execution time, short of caching).
Then change your query to group by this new indexed column, e.g.
...
SUM(cost_amount) AS costs,
store_id,
md -- instead of date_format(epl.registration, '%m') md
FROM table_1 epl
GROUP BY store_id, md -- instead of date_format(epl.registration, '%m')
)t2 ...
This index will also take care of another other 9% of your execution time, SELECT DISTINCT store_id FROM table_1, which will profit from an index starting with store_id.
Now that 99% of your query is taken care of, some further remarks:
the subquery b and your date range where a.Date >='2019-01-01' and a.Date <= '2019-01-14' might not do what you think it does. You should run the part SELECT DATE_FORMAT(a.DATE, "%m") as md, ... group by md separately to see what it does. In its current state, it will give you one row with the tuple '01', 0, representing "january", so it is basically a complicated way of doing select '01', 0. Unless today is the 15th or later, then it returns nothing (which is probably unintended).
Particularly, it will not limit the invoice dates to that specific range, but to all invoices that are from (the whole) january of any year. If that is what you intended, you should (additionally) add that filter directly, e.g. by using FROM table_1 epl where epl.md = '01' GROUP BY ..., reducing your execution time by an additional factor of about 12. So (apart from the 15th and up-problem), with your current range you should get the same result if you use
...
SUM(cost_amount) AS costs,
store_id,
md
FROM table_1 epl
WHERE md = '01'
GROUP BY store_id, md
)t2 ...
For different date ranges you will have to adjust that term. And to emphasize my point, this is significantly different from filtering invoices by their date, e.g.
...
SUM(cost_amount) AS costs,
store_id,
md
FROM table_1 epl
WHERE epl.registration >='2019-01-01'
and epl.registration <= '2019-01-14'
GROUP BY store_id, md
)t2 ...
which you may (or may not) have tried to do. You would need a different index in that case though (and it would be a slightly different question).
there might be some additional optimizations, simplifications or beautifications in the rest of your query, e.g group BY t1.md, t1.store_id looks redundant and/or wrong (indicating you are actually not on MySQL 5.7), and the b-subquery can only give you values 1 to 12, so generating 1000 dates and reducing them again could be simplified. But since they are operating on 100-ish rows, they will not affect execution time significantly, and I haven't checked those in detail. Some of it is probably due to getting the right output format or to generalizations (although, if you are dynamically grouping by other formats than by month, you need other indexes/columns, but that would be a different question).
An alternative way to precalculate your values would be a summary table where you e.g. run your inner query (the expensive group by) once a day and store the result in a table and then reuse it (by selecting from this table instead of doing the group by). This is especially viable for data like invoices that never change (although otherwise you can use triggers to keep the summary tables up to date). It also becomes more viable if you have several scenarios, e.g. if your user can decide to group by weekday, year, month or zodiac sign, since otherwise you would need to add an index for each of those. It becomes less viable if you need to dynamically limit your invoice range (to e.g. 2019-01-01 ... 2019-01-14). If you need to include the current day in your report, you can still precalculate and then add the values for the current date from the table (which should only involve a very limited number of rows, which is fast if you have an index starting with your date column), or use triggers to update your summary table on-the-fly.
With PRIMARY KEY(id), having INDEX(id, anything) is virtually useless.
See if you can avoid nesting subqueries.
Consider building that 'date' table permanently and have a PRIMARY KEY(md) on it. Currently, neither subquery has an index on the join column (md).
You may have the "explode-implode" syndrome. This is where JOINs expand the number of rows, only to have the GROUP BY collapse them.
Don't use COUNT(xx) unless you need to check xx for being NULL. Simply do COUNT(*).
store_id double -- Really?
TIMESTAMP vs DATETIME -- they perform about the same; don't bother changing it.
Since you are only looking at 2019-01, get rid of
date_format(epl.registration, '%m')
That, alone, may speed it up a lot. (However, you lose generality.)
mytable has an auto-incrementing id column which is an integer, and for all intents and purposes in this case you can safely assume that the higher ID represents a more recent value. mytable also has an indexed column called group_id which is a foreign key to the groups table.
I want a quick and dirty query to select the 5 most recent rows for each group_id from mytable.
If there were only three groups, this would be easy, as I could do this:
SELECT * FROM `mytable` WHERE `group_id` = 1 ORDER BY `id` DESC LIMIT 5
UNION ALL
SELECT * FROM `mytable` WHERE `group_id` = 2 ORDER BY `id` DESC LIMIT 5
UNION ALL
SELECT * FROM `mytable` WHERE `group_id` = 3 ORDER BY `id` DESC LIMIT 5
However, there is not a fixed number of groups. Groups are determined by the what's in the groups table, so there is an indeterminate number of them.
My thoughts so far:
I could grab a CURSOR on the groups table and build a new SQL query string, then EXECUTE it. However, that seems really messy and I'm hoping there's a better way of doing it.
I could grab a CURSOR on the groups table and insert things into a temporary table, then select from that. However, that also seems really messy.
I don't know if I could just grab a CURSOR and then start returning rows directly from there. Is there perhaps something similar to SQL Server's #table type variables?
What I'm hoping most of all is that I'm overthinking this and there is a way to do this in a SELECT statement.
To get n most recent rows per group can be best handled by window functions in other RDBMS (SQL Server,Postgre Sql,Oracle etc), But unfortunately MySql don't have any window functions so for alternative there is a solution to use user defined variables to assign a rank for rows that belong to same group in this case ORDER BY group_id,id desc is important to order the results properly per group
SELECT c.*
FROM (
SELECT *,
#r:= CASE WHEN #g = group_id THEN #r + 1 ELSE 1 END rownum,
#g:=group_id
FROM mytable
CROSS JOIN(SELECT #g:=NULL ,#r:=0) t
ORDER BY group_id,id desc
) c
WHERE c.rownum <=5
Above query will give you 5 recent rows for each group_id and if you want to get more than 5 rows just change where filter of outer query to your desired number WHERE c.rownum <= n
tbl_contacts: -
user_id - int
contact_id - int
first_name - varchar
last_name - varchar
date_backup - TIMESTAMP
I am having lots of data and i want to get the latest one from the database.
Currently I am having data of 2 different dates, 1 is 2014-02-12 04:47:39 and another is 2014-01-12 04:47:39. I am having total 125 records from which 5 are of 2014-01-12 04:47:39 date and rest are of 2014-02-12 04:47:39. I am using below query to get the latest date data but its returning all the data somehow. I am trying since long and unable to successfully achieve my goal. If anyone has any idea please kindly help me.
Query
SELECT `contact_id`, `user_id`, `date_backup`, `first_name`, `last_name`
FROM tbl_contacts WHERE `date_backup` IN (
SELECT MAX(`date_backup`)
FROM tbl_contacts WHERE `user_id`= 1 GROUP BY `contact_id`
)
ORDER BY `contact_id`ASC, `date_backup` DESC
By using ORDER BY date_backup DESC, I am getting the old data at the end of list. But i just don't want the old date record at all if new date record is available.
user MySql UNIX_TIMESTAMP() with ORDER BY Clause.
SELECT `contact_id`, `user_id`, `date_backup`, `first_name`, `last_name`
FROM tbl_contacts WHERE `date_backup` IN (
SELECT MAX(`date_backup`)
FROM tbl_contacts WHERE `user_id`= 1 GROUP BY `contact_id`
)
ORDER BY UNIX_TIMESTAMP(`date_backup`) DESC, `contact_id`ASC
If require change all date_backup with UNIX_TIMESTAMP(`date_backup`)
LIMIT will do the trick in MySQL database (in other databases it would probably be TOP clause). So use LIMIT 10:
SELECT TOP 1 `contact_id`, `user_id`, `date_backup`, `first_name`, `last_name`
FROM tbl_contacts WHERE `date_backup` IN (
SELECT MAX(`date_backup`)
FROM tbl_contacts WHERE `user_id`= 1 GROUP BY `contact_id`
)
ORDER BY `contact_id`ASC, `date_backup` DESC
LIMIT 10
If you want ten of the most recent ones.
The guide for the LIMIT clause can be found at MySQL reference
Similar kind of issue
Just add LIMIT 1 at the end of your query to select only the first line of results.
In a table that has the columns: 'product_id', 'price', ... (and others). I need to obtain the record with the lowest and unique price for a given product_id.
I tried different sentences without having the expected result. Finally I came across this sentence (for product_id = 2):
SELECT `price` , `product_id`
FROM bids
WHERE `product_id` = 2
GROUP BY `price`
HAVING COUNT( `price` ) = 1
ORDER BY `price`
LIMIT 1;
But it seems not to be an elegant solution having recourse to LIMIT and ORDER. How can I obtain the same record using the MIN() function or even something else?
This should work because you are already specifying the product_id to analyze:
SELECT MIN(t1.price) AS price, t1.product_id
FROM
(
SELECT price, product_id
FROM bids
WHERE product_id = 1
GROUP BY price, product_id
HAVING COUNT(price) = 1
) t1
Notes: MIN/MAX vs ORDER BY and LIMIT
Skydreamer, I'm not sure but as I understand it the op wants the the first unique value.
If the prices are 1,1,2,2,3 the query should return the row with the price of 3.