Why doesn't this sub-query seem to work? - sql-server-2008

Before anything, I am not looking for a re-write. This was presented to me, and I can't seem to figure out if this is a bug in general or some kind of syntactic craziness that occurs due to the peculiarity of the script. Okay with that said on with the setup:
Microsoft SQL Server Standard Edition (64-bit)
Version 10.50.2500.0
On a table located in a generic database, defined as:
CREATE TABLE [dbo].[Regions](
[RegionID] [int] NOT NULL,
[RegionGroupID] [int] NOT NULL,
[IsDefault] [bit] NOT NULL,
CONSTRAINT [PK_Regions] PRIMARY KEY CLUSTERED
(
[RegionID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
insert some values:
INSERT INTO [dbo].[Regions]
([RegionID],[RegionGroupID],[IsDefault])
VALUES
(0,1,0),
(1,1,0),
(2,1,0),
(3,2,0),
(4,2,0),
(5,2,0),
(6,3,0),
(7,3,0),
(8,3,0)
Now run the query (to select a single from each group, remember no rewrite suggestions!):
SELECT RXXID FROM (
SELECT
RXX.RegionID as RXXID,
ROW_NUMBER() OVER (PARTITION BY RXX.RegionGroupID ORDER BY RXX.RegionGroupID) AS RXXNUM
FROM Regions as RXX
) AS tmp
WHERE tmp.RXXNUM = 1
You should get:
RXXID
-----------
0
3
6
Now stick that inside an update statement (with a preset to 0 and a select all after):
UPDATE Regions SET IsDefault = 0
UPDATE Regions
SET IsDefault = 1
WHERE RegionID IN (
SELECT RXXID FROM (
SELECT
RXX.RegionID as RXXID,
ROW_NUMBER() OVER (PARTITION BY RXX.RegionGroupID ORDER BY RXX.RegionGroupID) AS RXXNUM
FROM Regions as RXX
) AS tmp
WHERE tmp.RXXNUM = 1
)
SELECT * FROM Regions
ORDER BY RegionGroupID
and get this result:
RegionID RegionGroupID IsDefault
----------- ------------- ---------
0 1 1
1 1 1
2 1 1
3 2 1
4 2 1
5 2 1
6 3 1
7 3 1
8 3 1
zomg wtf lamaz?
While I don't claim to be a SQL guru, this seems neither proper nor correct. And to make things more crazy, if you drop the primary key it seems to work:
Drop primary key:
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[Regions]') AND name = N'PK_Regions')
ALTER TABLE [dbo].[Regions] DROP CONSTRAINT [PK_Regions]
And re-run update statement set, result:
RegionID RegionGroupID IsDefault
----------- ------------- ---------
0 1 1
1 1 0
2 1 0
3 2 1
4 2 0
5 2 0
6 3 1
7 3 0
8 3 0
Isn't that a b?
Does anyone have any clue what is going on here? My guess is some kind of sub-query caching and is this a bug? It sure doesn't seem like what SQL should be doing?

Just update as a CTE directly:
WITH tmp AS (
SELECT
RegionID as RXXID,
RegionGroupID,
IsDefault,
ROW_NUMBER() OVER (PARTITION BY RegionGroupID ORDER BY RegionID) AS RXXNUM
FROM Regions
)
UPDATE tmp SET IsDefault = 1 WHERE RXXNUM = 1
select * from Regions
Added more columns to illustrate. You can see this on http://sqlfiddle.com/#!3/03913/9
Not 100% sure what is going on in your example, but since you partition and order by the same column, you're not really certain to get the same order back, since they are all tied. Shouldn't you order by RegionID or some other column, as i did on sqlfiddle?
Back to your question:
If you change your UPDATE (with the clustered index) to a SELECT, you'll get all 9 rows back.
If you drop the PK, and do the SELECT, you only get 3 rows. Back to your update statement. Inspecting the execution plans show that they differ slightly:
What you can see here is that in the first (with PK) query, you'll scan the clustered index for the outer reference, note that it does not have the alias RXX. Then for each row in the top, do a lookup to the RXX. And yes, because of your row number ordering, every RegionID can be row_number() 1 for each RegionGroupID. SQL Server would know this based on your PK, i guess, and can say that For every RegionID, this RegionID can be row number 1. Therefore the statement is rather valid.
In the second query, there is no index, and you get a table scan on Regions, then it builds a probe table using the RXX, and joins differently (single pass, ROW_NUMBER() can only be 1 for one row per regiongroupid now). This way in that scan, every RegionID has only one ROW_NUMBER(), though you cannot be 100% certain it'll be the same every time.
This means:
Using your subquery which doesn't have a deterministic order for every execution, you should avoid using a multiple pass (NESTED LOOP) join type, but a single pass (MERGE OR HASH) join.
To fix this without changing the structure of your query, add OPTION (HASH JOIN) or OPTION (MERGE JOIN) to the first UPDATE:
So, you'll need the following update statement (when you have the PK):
UPDATE Regions SET IsDefault = 0
UPDATE Regions
SET IsDefault = 1
WHERE RegionID IN (
SELECT RXXID FROM (
SELECT
RXX.RegionID as RXXID,
ROW_NUMBER() OVER (PARTITION BY RXX.RegionGroupID ORDER BY RXX.RegionGroupID) AS RXXNUM
FROM Regions as RXX
) AS tmp
WHERE tmp.RXXNUM = 1
)
OPTION (HASH JOIN)
SELECT * FROM Regions
ORDER BY RegionGroupID
Here are the execution plans using these two join types (note actual number of rows: 3 in the properties):

Your query in plain language is something like:
For each row in Regions check if RegionID exists in some sub query. Meaning that the sub query is executed for each row in Regions. (I know that is not the case but it is the semantics of the query).
Since you are using RegionGroupID as order and partition you really have no idea what RegionID will be returned so it might very well be a new ID for each time the sub-query is checked against.
Update:
Doing the update with a join against the derived table instead instead of using in changes the semantics of the query and it changed the result as well.
This works as expected:
UPDATE R
SET IsDefault = 1
FROM Regions as R
inner join
(
SELECT RXXID FROM (
SELECT
RXX.RegionID as RXXID,
ROW_NUMBER() OVER (PARTITION BY RXX.RegionGroupID ORDER BY RXX.RegionGroupID) AS RXXNUM
FROM Regions as RXX
) AS tmp
WHERE tmp.RXXNUM = 1
) as C
on R.RegionID = C.RXXID

Related

Create new unique value in column

I have a (MYSQL) table in the following format; assume the name of the table is mytable:
id
name
group
123
name1
1
124
name2
2
125
name3
1
126
name4
id is unique and auto-increments. name is a unique string, group is just an integer
I now want to assign name4 to a new group that does not exist yet, so the group for name4cannot be 1 or 2 in this example.
The result could,for example, be:
id
name
group
126
name4
3
At the moment I am sorting by group descending and just insert the highest number + 1 manually, but I was wondering if there was a better/quicker way to generate a new, unique value in a column. group has no other constraints, besides being an integer.
I am using the MySQL Workbench, so I can work with both SQL commands, as well as Workbench-specific options, if there are any.
If anything is unclear I'll gladly provide clarification.
In MySQL 8.0, you can get help with two window functions:
MAX, to retrieve the maximum "group" value
ROW_NUMBER, to retrieve the incremental value for each NULL existing in your table.
You can then sum up these two values and update your table where your "group" field is null.
WITH cte AS (
SELECT id, name, MAX(group_) OVER() + ROW_NUMBER() OVER(PARTITION BY group_ IS NULL ORDER BY name) AS new_group
FROM tab
)
UPDATE tab
INNER JOIN cte
ON tab.id = cte.id AND tab.name = cte.name
SET tab.group_ = cte.new_group
WHERE tab.group_ IS NULL;
Check the demo here.
In MySQL 5.X you can instead use a variable, initialized with your maximum "group" value, then updated incrementally inside the UPDATE statement, in the SET clause.
SET #maxgroup = NULL;
SELECT MAX(group_) INTO #maxgroup FROM tab;
UPDATE tab
SET group_ = (#maxgroup:= #maxgroup + 1)
WHERE group_ IS NULL;
ORDER BY id;
Check the demo here.

MySQL - Add flag column to identify the first payment

I want to improve my current query. So I have this table called Incomes. Where I have a sourceId varchar field. I have a single SELECT for the fields I need, but I needed to add an extra field called isFirstTime to represent if it was the first time on the row on what that sourceId was used. This is my current query:
SELECT DISTINCT
`income`.*,
CASE WHEN (
SELECT
`income2`.id
FROM
`income` as `income2`
WHERE
`income2`."sourceId" = `income`."sourceId"
ORDER BY
`income2`.created asc
LIMIT 1
) = `income`.id THEN true ELSE false END
as isFirstIncome
FROM
`income` as `income`
WHERE `income`.incomeType IN ('passive', 'active') AND `income`.status = 'paid'
ORDER BY `income`.created desc
LIMIT 50
The query works but slows down if I keep increasing the LIMIT or OFFSET. Any suggestions?
UPDATE 1:
Added WHERE statements used on the original query
UPDATE 2:
MYSQL version 5.7.22
You can achieve it using Ordered Analytical Function.
You can use ROW_NUMBER or RANK to get the desired result.
Below query will give the desired output.
SELECT *,
CASE
WHEN Row_number()
OVER(
PARTITION BY sourceid
ORDER BY created ASC) = 1 THEN true
ELSE false
END AS isFirstIncome
FROM income
WHERE incomeType IN ('passive', 'active') AND status = 'paid'
ORDER BY created desc
DB Fiddle: See the result here
My first thought is that isFirstIncome should be an extra column in the table. It should be populated as the data is inserted.
If you don't like that, let's try to optimize the query...
Let's avoid doing the subquery more than 50 times. This requires turning the query inside-out. (It's like "explode-implode", where the query gathers lots of stuff, then sorts it and throws most of the rows away.)
To summarize:
do the least amount of effort to just identify the 5 rows.
JOIN to whatever tables are needed (including itself if appropriate); this is to get any other columns desired (including isFirstIncome).
SELECT i3.*,
( ... using i3 ... ) as isFirstIncome
FROM (
SELECT i1.id, i1.sourceId
FROM `income` AS i1
WHERE i1.incomeType IN ('passive', 'active')
AND i1.status = 'paid'
ORDER BY i1.created DESC
LIMIT 50
) AS i2
JOIN income AS i3 USING(id)
ORDER BY i2.created DESC -- yes, repeated
(I left out the computation of isFirstIncome; it is discussed in other Answers. But note that it will be executed at most 50 times.)
(The aliases -- i1, i2, i3 -- are numbered in the order they will be "used"; this is to assist in following the SQL.)
To assist in performance, add
INDEX(status, incomeType, created, id, sourceId)
It should help with my formulation, but probably not for the other versions. Your version would benefit from
INDEX(sourceId, created, id)

Find next or previous ID when query contains multiple cases

I am looking for the most efficient way to find the next or previous ID of the following query:
SELECT *
FROM transactions
ORDER
BY CASE order_status
WHEN 'order_accepted' THEN 1
WHEN 'processing_order' THEN 2
WHEN 'order_send_mailer' THEN 3
WHEN 'order_send' THEN 4
WHEN 'order_received' THEN 5
WHEN 'order_refunded' THEN 6
ELSE 7 END
, id DESC limit 1;
I tried adding a where id > '$id' or where id < '$id' claus to the query but it didn't give me te next or previous ID I was looking for.
For those that need some explanation of what I am trying to do: It's to go to the next or previous order by case with a forward of backward button.
What it currently looks like:
-id- -order_status-
9399 order_accepted
9398 processing_order
9363 processing_order
9403 order_send_mailer
9318 order_send
9346 order_received
9345 order_received
9050 order_refunded
The next ID for example of 9403 would be 9363 and previous ID would be 9318
Change your order_status into an enum column. This will save disk space and make sorting by order_status simpler and faster.
-- Add a new version of the column using an enum.
-- These strings are aliases for ordered numbers.
-- 'order_accepted' is 1, 'processing_order' is 2, etc.
alter table transactions add column enum_order_status enum(
'order_accepted',
'processing_order',
'order_send_mailer',
'order_send',
'order_received',
'order_refunded'
) not null;
-- Copy the status into the new enum column.
-- MySQL will translate the string into the number for you.
update transactions
set enum_order_status = order_status;
-- Drop the old column.
alter table transactions drop column order_status;
-- Rename the new enum column.
alter table transactions rename column enum_order_status to order_status;
-- Index it.
create index transactions_order_status on transactions(order_status);
-- Enjoy your vastly simplified and much faster query.
select *
from transactions
order by order_status, id desc
That's not actually necessary, but it makes everything much simpler.
With that out of the way, use the window functions lead and lag to refer to the previous and next rows in a query.
select
id, order_status,
lead(id) over w, lead(order_status) over w,
lag(id) over w, lag(order_status) over w
from transactions
window w as (order by order_status, id desc);
Note, window functions were added in MySQL 8. If you're using an older version I recommend upgrading ASAP; MySQL 8 has many big improvements. Otherwise you can simulate it with correlated subqueries and self-joins.
If you want the previous and next rows of a specific row, use the technique from this answer. We add row_numbers to the table in the desired order, and then fetch 9403 and its previous and next row by row number.
-- Add a row number to your table in the desired order.
with ordered_transactions as (
select
*, row_number() over w as rn
from transactions
window w as (order by order_status, id desc)
)
select *
from ordered_transactions
-- Find the row number for ID 9403, then add -1, 0, and 1.
-- If 9403 is row number 5 you'll fetch row numbers 4, 5, and 6.
where ot.rn in (
select rn+i
from ordered_transactions ot
-- All this is doing is making us three "rows" where i = -1, 0, and 1.
cross join (SELECT -1 AS i UNION ALL SELECT 0 UNION ALL SELECT 1) cj
where ot.id = 9403
);
Try it.

select unique GROUP_CONCAT-ed rows based on different column

Given table can have following rows.
i.e. for a given filename, there can be two unique version_id(s).
file_id version_id filename
1 OS_v1 abc.update
1 App_v1 abc.update
2 OS_v2 xyz.update
2 App_v2 xyz.update
3 OS_v1 abc(1).update
3 App_v1 abc(1).update
PRIMARY KEY (`version_id`, `filename`)
How to detect there are no two different filename's having same combination of OS_App (versions) ?
In the given example, row set with file_id=3 is a duplicate of file_id=1.
Note: It's easy to define separate columns for OS and App version, but that requires a lot of code change which we dont wanted to go through.
Question: is there a SELECT query which would return just file_id = 1 and file_id = 2 and omit file_id = 3 ?
So far I have come up with this query which selects a combination of version_id grouped by filename, but row-2 is a duplicate of row-1
SELECT DISTINCT(GROUP_CONCAT(version_id SEPARATOR '-')) ,
filename
FROM schema_name.table_name
GROUP BY filename;
Returns :
concat_version patch_filename
OS_V1-APP_V1 xyz.update
OS_V2-APP_V2 abc(1).update
OS_V1-APP_V1 abc.update
Question: Is there a SELECT query which would return just file_id = 1 and file_id = 2 and omit file_id = 3
If you are using MySQL 8.0, you can take advantage of window function ROW_NUMBER() :
SELECT x.file_id, x.version_id, x.filename
FROM (
SELECT t.*, ROW_NUMBER() OVER(PARTITION BY version_id ORDER BY file_id) rn
FROM master_logs.system_patches t
) x
WHERE x.rn = 1
The inner query assigns a row number to each record in version_id groups, ordered by file_id, and the outer query filters in records with row number 1.
With earlier versions of MySQL, one typical solution is to use a correlated subquery with a NOT EXISTS condition to filter out unwanted records :
SELECT t.file_id, t.version_id, t.filename
FROM master_logs.system_patches t
WHERE NOT EXISTS (
SELECT 1
FROM master_logs.system_patches t1
WHERE t1.version_id = t.version_id AND t1.file_id < t.file_id
)

How to find the next record after a specified one in SQL?

I'd like to use a single SQL query (in MySQL) to find the record which comes after one that I specify.
I.e., if the table has:
id, fruit
-- -----
1 apples
2 pears
3 oranges
I'd like to be able to do a query like:
SELECT * FROM table where previous_record has id=1 order by id;
(clearly that's not real SQL syntax, I'm just using pseudo-SQL to illustrate what I'm trying to achieve)
which would return:
2, pears
My current solution is just to fetch all the records, and look through them in PHP, but that's slower than I'd like. Is there a quicker way to do it?
I'd be happy with something that returned two rows -- i.e. the one with the specified value and the following row.
EDIT: Sorry, my question was badly worded. Unfortunately, my definition of "next" is not based on ID, but on alphabetical order of fruit name. Hence, my example above is wrong, and should return oranges, as it comes alphabetically next after apples. Is there a way to do the comparison on strings instead of ids?
After the question's edit and the simplification below, we can change it to
SELECT id FROM table WHERE fruit > 'apples' ORDER BY fruit LIMIT 1
SELECT * FROM table WHERE id > 1 ORDER BY id LIMIT 1
Even simpler
UPDATE:
SELECT * FROM table WHERE fruit > 'apples' ORDER BY fruit LIMIT 1
So simple, and no gymnastics required
Select * from Table
where id =
(Select Max(id) from Table
where id < #Id)
or, based on the string #fruitName = 'apples', or 'oranges' etc...
Select * from Table
where id =
(Select Max(id) from Table
where id < (Select id from Table
Where fruit = #fruitName))
I'm not familiar with the MySQL syntax, but with SQL Server you can do something with "top", for example:
SELECT TOP 1 * FROM table WHERE id > 1 ORDER BY id;
This assumes that the id field is unique. If it is not unique (say, a foreign key), you can do something similar and then join back against the same table.
Since I don't use MySQL, I am not sure of the syntax, but would imagine it to be similar.
Unless you specify a sort order, I don't believe the concepts of "previous" or "next" are available to you in SQL. You aren't guaranteed a particular order by the RDBMS by default. If you can sort by some column into ascending or descending order that's another matter.
This should work. The string 'apples' will need to be a parameter.
Fill in that parameter with a string, and this query will return the entire record for the first fruit after that item, in alphabetical order.
Unlike the LIMIT 1 approach, this should be platform-independent.
--STEP THREE: Get the full record w/the ID we found in step 2
select *
from
fruits fr
,(
--STEP TWO: Get the ID # of the name we found in step 1
select
min(vendor_id) min_id
from
fruits fr1
,(
--STEP ONE: Get the next name after "apples"
select min(name) next_name
from fruits frx
where frx.name > 'apples'
) minval
where fr1.name = minval.next_name
) x
where fr.vendor_id = x.min_id;
The equivalent to the LIMIT 1 approach in Oracle (just for reference) would be this:
select *
from
(
select *
from fruits frx
where frx.name > 'apples'
order by name
)
where rownum = 1
I don't know MySQL SQL but I still try
select n.id
from fruit n
, fruit p
where n.id = p.id + 1;
edit:
select n.id, n.fruitname
from fruits n
, fruits p
where n.id = p.id + 1;
edit two:
Jason Lepack has said that that doesn't work when there are gaps and that is true and I should read the question better.
I should have used analytics to sort the results on fruitname
select id
, fruitname
, lead(id) over (order by fruitname) id_next
, lead(fruitname) over (order by fruitname) fruitname_next
from fruits;
If you are using MS SQL Server 2008 (not sure if available for previous versions)...
In the event that you are trying to find the next record and you do not have a unique ID to reference in an applicable manner, try using ROW_NUMBER(). See this link
Depending on how savvy your T-SQL skill is, you can create row numbers based on your sorting order. Then you can find more than just the previous and next record. Utilize it in views or sub-queries to find another record relative to the current record's row number.
SELECT cur.id as id, nxt.id as nextId, prev.id as prevId FROM video as cur
LEFT JOIN video as nxt ON nxt.id > cur.id
LEFT JOIN video as prev ON prev.id < cur.id
WHERE cur.id = 12
ORDER BY prev.id DESC, nxt.id ASC
LIMIT 1
If you want the item with previous and next item this query lets you do just that.
This also allows You to have gaps in the data!
How about this:
Select * from table where id = 1 + 1