I am looking for the most efficient way to find the next or previous ID of the following query:
SELECT *
FROM transactions
ORDER
BY CASE order_status
WHEN 'order_accepted' THEN 1
WHEN 'processing_order' THEN 2
WHEN 'order_send_mailer' THEN 3
WHEN 'order_send' THEN 4
WHEN 'order_received' THEN 5
WHEN 'order_refunded' THEN 6
ELSE 7 END
, id DESC limit 1;
I tried adding a where id > '$id' or where id < '$id' claus to the query but it didn't give me te next or previous ID I was looking for.
For those that need some explanation of what I am trying to do: It's to go to the next or previous order by case with a forward of backward button.
What it currently looks like:
-id- -order_status-
9399 order_accepted
9398 processing_order
9363 processing_order
9403 order_send_mailer
9318 order_send
9346 order_received
9345 order_received
9050 order_refunded
The next ID for example of 9403 would be 9363 and previous ID would be 9318
Change your order_status into an enum column. This will save disk space and make sorting by order_status simpler and faster.
-- Add a new version of the column using an enum.
-- These strings are aliases for ordered numbers.
-- 'order_accepted' is 1, 'processing_order' is 2, etc.
alter table transactions add column enum_order_status enum(
'order_accepted',
'processing_order',
'order_send_mailer',
'order_send',
'order_received',
'order_refunded'
) not null;
-- Copy the status into the new enum column.
-- MySQL will translate the string into the number for you.
update transactions
set enum_order_status = order_status;
-- Drop the old column.
alter table transactions drop column order_status;
-- Rename the new enum column.
alter table transactions rename column enum_order_status to order_status;
-- Index it.
create index transactions_order_status on transactions(order_status);
-- Enjoy your vastly simplified and much faster query.
select *
from transactions
order by order_status, id desc
That's not actually necessary, but it makes everything much simpler.
With that out of the way, use the window functions lead and lag to refer to the previous and next rows in a query.
select
id, order_status,
lead(id) over w, lead(order_status) over w,
lag(id) over w, lag(order_status) over w
from transactions
window w as (order by order_status, id desc);
Note, window functions were added in MySQL 8. If you're using an older version I recommend upgrading ASAP; MySQL 8 has many big improvements. Otherwise you can simulate it with correlated subqueries and self-joins.
If you want the previous and next rows of a specific row, use the technique from this answer. We add row_numbers to the table in the desired order, and then fetch 9403 and its previous and next row by row number.
-- Add a row number to your table in the desired order.
with ordered_transactions as (
select
*, row_number() over w as rn
from transactions
window w as (order by order_status, id desc)
)
select *
from ordered_transactions
-- Find the row number for ID 9403, then add -1, 0, and 1.
-- If 9403 is row number 5 you'll fetch row numbers 4, 5, and 6.
where ot.rn in (
select rn+i
from ordered_transactions ot
-- All this is doing is making us three "rows" where i = -1, 0, and 1.
cross join (SELECT -1 AS i UNION ALL SELECT 0 UNION ALL SELECT 1) cj
where ot.id = 9403
);
Try it.
Related
I want to improve my current query. So I have this table called Incomes. Where I have a sourceId varchar field. I have a single SELECT for the fields I need, but I needed to add an extra field called isFirstTime to represent if it was the first time on the row on what that sourceId was used. This is my current query:
SELECT DISTINCT
`income`.*,
CASE WHEN (
SELECT
`income2`.id
FROM
`income` as `income2`
WHERE
`income2`."sourceId" = `income`."sourceId"
ORDER BY
`income2`.created asc
LIMIT 1
) = `income`.id THEN true ELSE false END
as isFirstIncome
FROM
`income` as `income`
WHERE `income`.incomeType IN ('passive', 'active') AND `income`.status = 'paid'
ORDER BY `income`.created desc
LIMIT 50
The query works but slows down if I keep increasing the LIMIT or OFFSET. Any suggestions?
UPDATE 1:
Added WHERE statements used on the original query
UPDATE 2:
MYSQL version 5.7.22
You can achieve it using Ordered Analytical Function.
You can use ROW_NUMBER or RANK to get the desired result.
Below query will give the desired output.
SELECT *,
CASE
WHEN Row_number()
OVER(
PARTITION BY sourceid
ORDER BY created ASC) = 1 THEN true
ELSE false
END AS isFirstIncome
FROM income
WHERE incomeType IN ('passive', 'active') AND status = 'paid'
ORDER BY created desc
DB Fiddle: See the result here
My first thought is that isFirstIncome should be an extra column in the table. It should be populated as the data is inserted.
If you don't like that, let's try to optimize the query...
Let's avoid doing the subquery more than 50 times. This requires turning the query inside-out. (It's like "explode-implode", where the query gathers lots of stuff, then sorts it and throws most of the rows away.)
To summarize:
do the least amount of effort to just identify the 5 rows.
JOIN to whatever tables are needed (including itself if appropriate); this is to get any other columns desired (including isFirstIncome).
SELECT i3.*,
( ... using i3 ... ) as isFirstIncome
FROM (
SELECT i1.id, i1.sourceId
FROM `income` AS i1
WHERE i1.incomeType IN ('passive', 'active')
AND i1.status = 'paid'
ORDER BY i1.created DESC
LIMIT 50
) AS i2
JOIN income AS i3 USING(id)
ORDER BY i2.created DESC -- yes, repeated
(I left out the computation of isFirstIncome; it is discussed in other Answers. But note that it will be executed at most 50 times.)
(The aliases -- i1, i2, i3 -- are numbered in the order they will be "used"; this is to assist in following the SQL.)
To assist in performance, add
INDEX(status, incomeType, created, id, sourceId)
It should help with my formulation, but probably not for the other versions. Your version would benefit from
INDEX(sourceId, created, id)
I need to limit records based on percentage but MYSQL does not allow that. I need 10 percent User Id of (count(User Id)/max(Total_Users_bynow)
My code is as follows:
select * from flavia.TableforThe_top_10percent_of_the_user where `User Id` in (select distinct(`User Id`) from flavia.TableforThe_top_10percent_of_the_user group by `User Id` having count(distinct(`User Id`)) <= round((count(`User Id`)/max(Total_Users_bynow))*0.1)*count(`User Id`));
Kindly help.
Consider splitting your problem in pieces. You can use user variables to get what you need. Quoting from this question's answers:
You don't have to solve every problem in a single query.
So... let's get this done. I'll not put your full query, but some examples:
-- Step 1. Get the total of the rows of your dataset
set #nrows = (select count(*) from (select ...) as a);
-- --------------------------------------^^^^^^^^^^
-- The full original query (or, if possible a simple version of it) goes here
-- Step 2. Calculate how many rows you want to retreive
-- You may use "round()", "ceiling()" or "floor()", whichever fits your needs
set #limrows = round(#nrows * 0.1);
-- Step 3. Run your query:
select ...
limit #limrows;
After checking, I found this post which says that my above approach won't work. There's, however, an alternative:
-- Step 1. Get the total of the rows of your dataset
set #nrows = (select count(*) from (select ...) as a);
-- --------------------------------------^^^^^^^^^^
-- The full original query (or, if possible a simple version of it) goes here
-- Step 2. Calculate how many rows you want to retreive
-- You may use "round()", "ceiling()" or "floor()", whichever fits your needs
set #limrows = round(#nrows * 0.1);
-- Step 3. (UPDATED) Run your query.
-- You'll need to add a "rownumber" column to make this work.
select *
from (select #rownum := #rownum+1 as rownumber
, ... -- The rest of your columns
from (select #rownum := 0) as init
, ... -- The rest of your FROM definition
order by ... -- Be sure to order your data
) as a
where rownumber <= #limrows
Hope this helps (I think it will work without a quirk this time)
Before anything, I am not looking for a re-write. This was presented to me, and I can't seem to figure out if this is a bug in general or some kind of syntactic craziness that occurs due to the peculiarity of the script. Okay with that said on with the setup:
Microsoft SQL Server Standard Edition (64-bit)
Version 10.50.2500.0
On a table located in a generic database, defined as:
CREATE TABLE [dbo].[Regions](
[RegionID] [int] NOT NULL,
[RegionGroupID] [int] NOT NULL,
[IsDefault] [bit] NOT NULL,
CONSTRAINT [PK_Regions] PRIMARY KEY CLUSTERED
(
[RegionID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
insert some values:
INSERT INTO [dbo].[Regions]
([RegionID],[RegionGroupID],[IsDefault])
VALUES
(0,1,0),
(1,1,0),
(2,1,0),
(3,2,0),
(4,2,0),
(5,2,0),
(6,3,0),
(7,3,0),
(8,3,0)
Now run the query (to select a single from each group, remember no rewrite suggestions!):
SELECT RXXID FROM (
SELECT
RXX.RegionID as RXXID,
ROW_NUMBER() OVER (PARTITION BY RXX.RegionGroupID ORDER BY RXX.RegionGroupID) AS RXXNUM
FROM Regions as RXX
) AS tmp
WHERE tmp.RXXNUM = 1
You should get:
RXXID
-----------
0
3
6
Now stick that inside an update statement (with a preset to 0 and a select all after):
UPDATE Regions SET IsDefault = 0
UPDATE Regions
SET IsDefault = 1
WHERE RegionID IN (
SELECT RXXID FROM (
SELECT
RXX.RegionID as RXXID,
ROW_NUMBER() OVER (PARTITION BY RXX.RegionGroupID ORDER BY RXX.RegionGroupID) AS RXXNUM
FROM Regions as RXX
) AS tmp
WHERE tmp.RXXNUM = 1
)
SELECT * FROM Regions
ORDER BY RegionGroupID
and get this result:
RegionID RegionGroupID IsDefault
----------- ------------- ---------
0 1 1
1 1 1
2 1 1
3 2 1
4 2 1
5 2 1
6 3 1
7 3 1
8 3 1
zomg wtf lamaz?
While I don't claim to be a SQL guru, this seems neither proper nor correct. And to make things more crazy, if you drop the primary key it seems to work:
Drop primary key:
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[Regions]') AND name = N'PK_Regions')
ALTER TABLE [dbo].[Regions] DROP CONSTRAINT [PK_Regions]
And re-run update statement set, result:
RegionID RegionGroupID IsDefault
----------- ------------- ---------
0 1 1
1 1 0
2 1 0
3 2 1
4 2 0
5 2 0
6 3 1
7 3 0
8 3 0
Isn't that a b?
Does anyone have any clue what is going on here? My guess is some kind of sub-query caching and is this a bug? It sure doesn't seem like what SQL should be doing?
Just update as a CTE directly:
WITH tmp AS (
SELECT
RegionID as RXXID,
RegionGroupID,
IsDefault,
ROW_NUMBER() OVER (PARTITION BY RegionGroupID ORDER BY RegionID) AS RXXNUM
FROM Regions
)
UPDATE tmp SET IsDefault = 1 WHERE RXXNUM = 1
select * from Regions
Added more columns to illustrate. You can see this on http://sqlfiddle.com/#!3/03913/9
Not 100% sure what is going on in your example, but since you partition and order by the same column, you're not really certain to get the same order back, since they are all tied. Shouldn't you order by RegionID or some other column, as i did on sqlfiddle?
Back to your question:
If you change your UPDATE (with the clustered index) to a SELECT, you'll get all 9 rows back.
If you drop the PK, and do the SELECT, you only get 3 rows. Back to your update statement. Inspecting the execution plans show that they differ slightly:
What you can see here is that in the first (with PK) query, you'll scan the clustered index for the outer reference, note that it does not have the alias RXX. Then for each row in the top, do a lookup to the RXX. And yes, because of your row number ordering, every RegionID can be row_number() 1 for each RegionGroupID. SQL Server would know this based on your PK, i guess, and can say that For every RegionID, this RegionID can be row number 1. Therefore the statement is rather valid.
In the second query, there is no index, and you get a table scan on Regions, then it builds a probe table using the RXX, and joins differently (single pass, ROW_NUMBER() can only be 1 for one row per regiongroupid now). This way in that scan, every RegionID has only one ROW_NUMBER(), though you cannot be 100% certain it'll be the same every time.
This means:
Using your subquery which doesn't have a deterministic order for every execution, you should avoid using a multiple pass (NESTED LOOP) join type, but a single pass (MERGE OR HASH) join.
To fix this without changing the structure of your query, add OPTION (HASH JOIN) or OPTION (MERGE JOIN) to the first UPDATE:
So, you'll need the following update statement (when you have the PK):
UPDATE Regions SET IsDefault = 0
UPDATE Regions
SET IsDefault = 1
WHERE RegionID IN (
SELECT RXXID FROM (
SELECT
RXX.RegionID as RXXID,
ROW_NUMBER() OVER (PARTITION BY RXX.RegionGroupID ORDER BY RXX.RegionGroupID) AS RXXNUM
FROM Regions as RXX
) AS tmp
WHERE tmp.RXXNUM = 1
)
OPTION (HASH JOIN)
SELECT * FROM Regions
ORDER BY RegionGroupID
Here are the execution plans using these two join types (note actual number of rows: 3 in the properties):
Your query in plain language is something like:
For each row in Regions check if RegionID exists in some sub query. Meaning that the sub query is executed for each row in Regions. (I know that is not the case but it is the semantics of the query).
Since you are using RegionGroupID as order and partition you really have no idea what RegionID will be returned so it might very well be a new ID for each time the sub-query is checked against.
Update:
Doing the update with a join against the derived table instead instead of using in changes the semantics of the query and it changed the result as well.
This works as expected:
UPDATE R
SET IsDefault = 1
FROM Regions as R
inner join
(
SELECT RXXID FROM (
SELECT
RXX.RegionID as RXXID,
ROW_NUMBER() OVER (PARTITION BY RXX.RegionGroupID ORDER BY RXX.RegionGroupID) AS RXXNUM
FROM Regions as RXX
) AS tmp
WHERE tmp.RXXNUM = 1
) as C
on R.RegionID = C.RXXID
The issue here is related to another question I had...
I have millions of records, and the ID of each of those records is auto-incremented, unfortunately sometimes the ID that is generated is sometimes thrown away so there are many many gaps between IDs.
I want to find the gaps, and re-use the ids that were abandoned.
What's an efficient way to do this in MySQL?
First of all, what advantage are you trying to get by reusing the skipped values? An ordinary INT UNSIGNED will let you count up to 4,294,967,295. With "millions of records" your database would have to grow a thousand times over before running out of valid IDs. (And then using a BIGINT UNSIGNED will bump you up to 18,446,744,073,709,551,615 values.)
Trying to recycle values MySQL has skipped is likely to use up a lot of your time trying to compensate for something that really doesn't bother MySQL in the first place.
With that said, you can find missing IDs with something like:
SELECT id + 1
FROM the_table
WHERE NOT EXISTS (SELECT 1 FROM the_table t2 WHERE t2.id = the_table.id + 1);
This will find only the first missing number in each sequence (e.g., if you have {1, 2, 3, 8, 10} it will find {4,9}) but it's likely to be efficient, and of course once you've filled in an ID you can always run it again.
The following will return a row for each gap in the integer field "n" in mytab:
/* cs will contain 1 row for each contiguous sequence of integers in mytab.n
and will have the start of that chain.
ce will contain the end of that chain */
create temporary table cs (row int auto_increment primary key, n int);
create temporary table ce like cs;
insert into cs (n) select n from mytab where n-1 not in (select n from mytab) order by n;
insert into ce (n) select n from mytab where n+1 not in (select n from mytab) order by n;
select ce.n + 1 as bgap, cs.n - 1 as egap
from cs, ce where cs.row = ce.row + 1;
If instead of the gaps you want the contiguous chains then the final select should be:
select cs.n as bchain, ce.n as echain from cs,ce where cs.row=ce.row;
This solution is better, in case you need to include the first element as 1:
SELECT
1 AS gap_start,
MIN(e.id) - 1 AS gap_end
FROM
factura_entrada e
WHERE
NOT EXISTS(
SELECT
1
FROM
factura_entrada
WHERE
id = 1
)
LIMIT 1
UNION
SELECT
a.id + 1 AS gap_start,
MIN(b.id)- 1 AS gap_end
FROM
factura_entrada AS a,
factura_entrada AS b
WHERE
a.id < b.id
GROUP BY
a.id
HAVING
gap_start < MIN(b.id);
If you are using an MariaDB you have a faster option
SELECT * FROM seq_1_to_50000 where seq not in (select col from table);
docs: https://mariadb.com/kb/en/mariadb/sequence/
I'd like to use a single SQL query (in MySQL) to find the record which comes after one that I specify.
I.e., if the table has:
id, fruit
-- -----
1 apples
2 pears
3 oranges
I'd like to be able to do a query like:
SELECT * FROM table where previous_record has id=1 order by id;
(clearly that's not real SQL syntax, I'm just using pseudo-SQL to illustrate what I'm trying to achieve)
which would return:
2, pears
My current solution is just to fetch all the records, and look through them in PHP, but that's slower than I'd like. Is there a quicker way to do it?
I'd be happy with something that returned two rows -- i.e. the one with the specified value and the following row.
EDIT: Sorry, my question was badly worded. Unfortunately, my definition of "next" is not based on ID, but on alphabetical order of fruit name. Hence, my example above is wrong, and should return oranges, as it comes alphabetically next after apples. Is there a way to do the comparison on strings instead of ids?
After the question's edit and the simplification below, we can change it to
SELECT id FROM table WHERE fruit > 'apples' ORDER BY fruit LIMIT 1
SELECT * FROM table WHERE id > 1 ORDER BY id LIMIT 1
Even simpler
UPDATE:
SELECT * FROM table WHERE fruit > 'apples' ORDER BY fruit LIMIT 1
So simple, and no gymnastics required
Select * from Table
where id =
(Select Max(id) from Table
where id < #Id)
or, based on the string #fruitName = 'apples', or 'oranges' etc...
Select * from Table
where id =
(Select Max(id) from Table
where id < (Select id from Table
Where fruit = #fruitName))
I'm not familiar with the MySQL syntax, but with SQL Server you can do something with "top", for example:
SELECT TOP 1 * FROM table WHERE id > 1 ORDER BY id;
This assumes that the id field is unique. If it is not unique (say, a foreign key), you can do something similar and then join back against the same table.
Since I don't use MySQL, I am not sure of the syntax, but would imagine it to be similar.
Unless you specify a sort order, I don't believe the concepts of "previous" or "next" are available to you in SQL. You aren't guaranteed a particular order by the RDBMS by default. If you can sort by some column into ascending or descending order that's another matter.
This should work. The string 'apples' will need to be a parameter.
Fill in that parameter with a string, and this query will return the entire record for the first fruit after that item, in alphabetical order.
Unlike the LIMIT 1 approach, this should be platform-independent.
--STEP THREE: Get the full record w/the ID we found in step 2
select *
from
fruits fr
,(
--STEP TWO: Get the ID # of the name we found in step 1
select
min(vendor_id) min_id
from
fruits fr1
,(
--STEP ONE: Get the next name after "apples"
select min(name) next_name
from fruits frx
where frx.name > 'apples'
) minval
where fr1.name = minval.next_name
) x
where fr.vendor_id = x.min_id;
The equivalent to the LIMIT 1 approach in Oracle (just for reference) would be this:
select *
from
(
select *
from fruits frx
where frx.name > 'apples'
order by name
)
where rownum = 1
I don't know MySQL SQL but I still try
select n.id
from fruit n
, fruit p
where n.id = p.id + 1;
edit:
select n.id, n.fruitname
from fruits n
, fruits p
where n.id = p.id + 1;
edit two:
Jason Lepack has said that that doesn't work when there are gaps and that is true and I should read the question better.
I should have used analytics to sort the results on fruitname
select id
, fruitname
, lead(id) over (order by fruitname) id_next
, lead(fruitname) over (order by fruitname) fruitname_next
from fruits;
If you are using MS SQL Server 2008 (not sure if available for previous versions)...
In the event that you are trying to find the next record and you do not have a unique ID to reference in an applicable manner, try using ROW_NUMBER(). See this link
Depending on how savvy your T-SQL skill is, you can create row numbers based on your sorting order. Then you can find more than just the previous and next record. Utilize it in views or sub-queries to find another record relative to the current record's row number.
SELECT cur.id as id, nxt.id as nextId, prev.id as prevId FROM video as cur
LEFT JOIN video as nxt ON nxt.id > cur.id
LEFT JOIN video as prev ON prev.id < cur.id
WHERE cur.id = 12
ORDER BY prev.id DESC, nxt.id ASC
LIMIT 1
If you want the item with previous and next item this query lets you do just that.
This also allows You to have gaps in the data!
How about this:
Select * from table where id = 1 + 1