Given table can have following rows.
i.e. for a given filename, there can be two unique version_id(s).
file_id version_id filename
1 OS_v1 abc.update
1 App_v1 abc.update
2 OS_v2 xyz.update
2 App_v2 xyz.update
3 OS_v1 abc(1).update
3 App_v1 abc(1).update
PRIMARY KEY (`version_id`, `filename`)
How to detect there are no two different filename's having same combination of OS_App (versions) ?
In the given example, row set with file_id=3 is a duplicate of file_id=1.
Note: It's easy to define separate columns for OS and App version, but that requires a lot of code change which we dont wanted to go through.
Question: is there a SELECT query which would return just file_id = 1 and file_id = 2 and omit file_id = 3 ?
So far I have come up with this query which selects a combination of version_id grouped by filename, but row-2 is a duplicate of row-1
SELECT DISTINCT(GROUP_CONCAT(version_id SEPARATOR '-')) ,
filename
FROM schema_name.table_name
GROUP BY filename;
Returns :
concat_version patch_filename
OS_V1-APP_V1 xyz.update
OS_V2-APP_V2 abc(1).update
OS_V1-APP_V1 abc.update
Question: Is there a SELECT query which would return just file_id = 1 and file_id = 2 and omit file_id = 3
If you are using MySQL 8.0, you can take advantage of window function ROW_NUMBER() :
SELECT x.file_id, x.version_id, x.filename
FROM (
SELECT t.*, ROW_NUMBER() OVER(PARTITION BY version_id ORDER BY file_id) rn
FROM master_logs.system_patches t
) x
WHERE x.rn = 1
The inner query assigns a row number to each record in version_id groups, ordered by file_id, and the outer query filters in records with row number 1.
With earlier versions of MySQL, one typical solution is to use a correlated subquery with a NOT EXISTS condition to filter out unwanted records :
SELECT t.file_id, t.version_id, t.filename
FROM master_logs.system_patches t
WHERE NOT EXISTS (
SELECT 1
FROM master_logs.system_patches t1
WHERE t1.version_id = t.version_id AND t1.file_id < t.file_id
)
Related
I have a (MYSQL) table in the following format; assume the name of the table is mytable:
id
name
group
123
name1
1
124
name2
2
125
name3
1
126
name4
id is unique and auto-increments. name is a unique string, group is just an integer
I now want to assign name4 to a new group that does not exist yet, so the group for name4cannot be 1 or 2 in this example.
The result could,for example, be:
id
name
group
126
name4
3
At the moment I am sorting by group descending and just insert the highest number + 1 manually, but I was wondering if there was a better/quicker way to generate a new, unique value in a column. group has no other constraints, besides being an integer.
I am using the MySQL Workbench, so I can work with both SQL commands, as well as Workbench-specific options, if there are any.
If anything is unclear I'll gladly provide clarification.
In MySQL 8.0, you can get help with two window functions:
MAX, to retrieve the maximum "group" value
ROW_NUMBER, to retrieve the incremental value for each NULL existing in your table.
You can then sum up these two values and update your table where your "group" field is null.
WITH cte AS (
SELECT id, name, MAX(group_) OVER() + ROW_NUMBER() OVER(PARTITION BY group_ IS NULL ORDER BY name) AS new_group
FROM tab
)
UPDATE tab
INNER JOIN cte
ON tab.id = cte.id AND tab.name = cte.name
SET tab.group_ = cte.new_group
WHERE tab.group_ IS NULL;
Check the demo here.
In MySQL 5.X you can instead use a variable, initialized with your maximum "group" value, then updated incrementally inside the UPDATE statement, in the SET clause.
SET #maxgroup = NULL;
SELECT MAX(group_) INTO #maxgroup FROM tab;
UPDATE tab
SET group_ = (#maxgroup:= #maxgroup + 1)
WHERE group_ IS NULL;
ORDER BY id;
Check the demo here.
In my query I use join table category_attributes. Let's assume we have such rows:
category_id|attribute_id
1|1
1|2
1|3
I want to have the query which suites the two following needs. I have a variable (php) of allowed attribute_id's. If the array is subset of attribute_id then category_id should be selected, if not - no results.
First case:
select * from category_attributes where (1,2,3,4) in category_attributes.attribute_id
should give no results.
Second case
select * from category_attributes where (1,2,3) in category_attributes.attribute_id
should give all three rows (see dummy rows at the beginning).
So I would like to have reverse side of what standard SQL in does.
Solution
Step 1: Group the data by the field you want to check.
Step 2: Left join the list of required values with the records obtained in the previous step.
Step 3: Now we have a list with required values and corresponding values from the table. The second column will be equal to required value if it exist in the table and NULL otherwise.
Count null values in the right column. If it is equal to 0, then it means table contains all the required values. In that case return all records from the table. Otherwise there must be at least one required value is missing in the table. So, return no records.
Sample
Table "Data":
Required values:
10, 20, 50
Query:
SELECT *
FROM Data
WHERE (SELECT Count(*)
FROM (SELECT D.value
FROM (SELECT 10 AS value
UNION
SELECT 20 AS value
UNION
SELECT 50 AS value) T
LEFT JOIN (SELECT value
FROM Data
GROUP BY value) D
ON ( T.value = D.value )) J
WHERE value IS NULL) = 0;
You can use group by and having:
select ca.category_id
from category_attributes ca
where ca.attribute_id in (1, 2, 3, 4)
group by ca.category_id
having count(*) = 4; -- "4" is the size of the list
This assumes that the table has no duplicates (which is typical for attribute mapping tables). If that is a possibility, use:
having count(distinct ca.attribute_id) = 4
You can aggregate attribute_id into array and compare two array from php.
SELECT category_id FROM
(select category_id, group_concat(attribute_id) as attributes from category_attributes
order by attribute_id) t WHERE t.attributes = (1, 2, 3);
But you need to find another way to compare arrays or make sure that array is always sorted.
Lets say I have a table, myTable, like this:
ID1 Value ID2
1 6.5064 3
2 7.9000 3
3 9.9390 3
4 8.6585 3
What I'm trying to do is SELECT each of those Value's for a given ID2. However, the number of rows returned for Value can change. So, if ID2 = 2, only 1 row might get returned. If drID = 4, 3 rows might get returned.
The part of my query that is trying to handle this is nested, so when I run it I get a "Subquery returns more than 1 row" error. Any idea how I can select a variable number of rows in this way?
Thanks in advance!
Edit: here is what I have so far, and the commented out portion is what I expected to select those values for me, but it throws the above mentioned error:
SELECT drDateTime AS Date,
(SELECT fncName FROM functionlist
WHERE datarecord.fncID = functionlist.fncID) AS FunctionName,
(SELECT alText FROM alarmlevellist
WHERE datarecord.alID = alarmlevellist.alID) AS AlarmDescription
#(SELECT rdValue FROM rawdata
#WHERE datarecord.drID = rawdata.drID)
FROM datarecord
WHERE alID IS NOT NULL AND drSumFlag = 1;
You should show your query.
One common place this problem occurs is in where (or having) clauses. A solution to this problem is to use in rather than =, if the subquery is in the where clause. If you have something like:
where id = (select id2 . . .)
Then change it to:
where id in (select id2 . . .)
Use a join instead of a subquery. I would probably use a join for all tables but that is up to you.
SELECT drDateTime AS Date,
(SELECT fncName FROM functionlist WHERE datarecord.fncID = functionlist.fncID) AS FunctionName,
(SELECT alText FROM alarmlevellist WHERE datarecord.alID = alarmlevellist.alID) AS AlarmDescription,
rawData.drID
FROM datarecord
INNER JOIN rawdata
ON datarecord.drID = rawdata.drID)
WHERE alID IS NOT NULL AND drSumFlag = 1;
Before anything, I am not looking for a re-write. This was presented to me, and I can't seem to figure out if this is a bug in general or some kind of syntactic craziness that occurs due to the peculiarity of the script. Okay with that said on with the setup:
Microsoft SQL Server Standard Edition (64-bit)
Version 10.50.2500.0
On a table located in a generic database, defined as:
CREATE TABLE [dbo].[Regions](
[RegionID] [int] NOT NULL,
[RegionGroupID] [int] NOT NULL,
[IsDefault] [bit] NOT NULL,
CONSTRAINT [PK_Regions] PRIMARY KEY CLUSTERED
(
[RegionID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
insert some values:
INSERT INTO [dbo].[Regions]
([RegionID],[RegionGroupID],[IsDefault])
VALUES
(0,1,0),
(1,1,0),
(2,1,0),
(3,2,0),
(4,2,0),
(5,2,0),
(6,3,0),
(7,3,0),
(8,3,0)
Now run the query (to select a single from each group, remember no rewrite suggestions!):
SELECT RXXID FROM (
SELECT
RXX.RegionID as RXXID,
ROW_NUMBER() OVER (PARTITION BY RXX.RegionGroupID ORDER BY RXX.RegionGroupID) AS RXXNUM
FROM Regions as RXX
) AS tmp
WHERE tmp.RXXNUM = 1
You should get:
RXXID
-----------
0
3
6
Now stick that inside an update statement (with a preset to 0 and a select all after):
UPDATE Regions SET IsDefault = 0
UPDATE Regions
SET IsDefault = 1
WHERE RegionID IN (
SELECT RXXID FROM (
SELECT
RXX.RegionID as RXXID,
ROW_NUMBER() OVER (PARTITION BY RXX.RegionGroupID ORDER BY RXX.RegionGroupID) AS RXXNUM
FROM Regions as RXX
) AS tmp
WHERE tmp.RXXNUM = 1
)
SELECT * FROM Regions
ORDER BY RegionGroupID
and get this result:
RegionID RegionGroupID IsDefault
----------- ------------- ---------
0 1 1
1 1 1
2 1 1
3 2 1
4 2 1
5 2 1
6 3 1
7 3 1
8 3 1
zomg wtf lamaz?
While I don't claim to be a SQL guru, this seems neither proper nor correct. And to make things more crazy, if you drop the primary key it seems to work:
Drop primary key:
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[Regions]') AND name = N'PK_Regions')
ALTER TABLE [dbo].[Regions] DROP CONSTRAINT [PK_Regions]
And re-run update statement set, result:
RegionID RegionGroupID IsDefault
----------- ------------- ---------
0 1 1
1 1 0
2 1 0
3 2 1
4 2 0
5 2 0
6 3 1
7 3 0
8 3 0
Isn't that a b?
Does anyone have any clue what is going on here? My guess is some kind of sub-query caching and is this a bug? It sure doesn't seem like what SQL should be doing?
Just update as a CTE directly:
WITH tmp AS (
SELECT
RegionID as RXXID,
RegionGroupID,
IsDefault,
ROW_NUMBER() OVER (PARTITION BY RegionGroupID ORDER BY RegionID) AS RXXNUM
FROM Regions
)
UPDATE tmp SET IsDefault = 1 WHERE RXXNUM = 1
select * from Regions
Added more columns to illustrate. You can see this on http://sqlfiddle.com/#!3/03913/9
Not 100% sure what is going on in your example, but since you partition and order by the same column, you're not really certain to get the same order back, since they are all tied. Shouldn't you order by RegionID or some other column, as i did on sqlfiddle?
Back to your question:
If you change your UPDATE (with the clustered index) to a SELECT, you'll get all 9 rows back.
If you drop the PK, and do the SELECT, you only get 3 rows. Back to your update statement. Inspecting the execution plans show that they differ slightly:
What you can see here is that in the first (with PK) query, you'll scan the clustered index for the outer reference, note that it does not have the alias RXX. Then for each row in the top, do a lookup to the RXX. And yes, because of your row number ordering, every RegionID can be row_number() 1 for each RegionGroupID. SQL Server would know this based on your PK, i guess, and can say that For every RegionID, this RegionID can be row number 1. Therefore the statement is rather valid.
In the second query, there is no index, and you get a table scan on Regions, then it builds a probe table using the RXX, and joins differently (single pass, ROW_NUMBER() can only be 1 for one row per regiongroupid now). This way in that scan, every RegionID has only one ROW_NUMBER(), though you cannot be 100% certain it'll be the same every time.
This means:
Using your subquery which doesn't have a deterministic order for every execution, you should avoid using a multiple pass (NESTED LOOP) join type, but a single pass (MERGE OR HASH) join.
To fix this without changing the structure of your query, add OPTION (HASH JOIN) or OPTION (MERGE JOIN) to the first UPDATE:
So, you'll need the following update statement (when you have the PK):
UPDATE Regions SET IsDefault = 0
UPDATE Regions
SET IsDefault = 1
WHERE RegionID IN (
SELECT RXXID FROM (
SELECT
RXX.RegionID as RXXID,
ROW_NUMBER() OVER (PARTITION BY RXX.RegionGroupID ORDER BY RXX.RegionGroupID) AS RXXNUM
FROM Regions as RXX
) AS tmp
WHERE tmp.RXXNUM = 1
)
OPTION (HASH JOIN)
SELECT * FROM Regions
ORDER BY RegionGroupID
Here are the execution plans using these two join types (note actual number of rows: 3 in the properties):
Your query in plain language is something like:
For each row in Regions check if RegionID exists in some sub query. Meaning that the sub query is executed for each row in Regions. (I know that is not the case but it is the semantics of the query).
Since you are using RegionGroupID as order and partition you really have no idea what RegionID will be returned so it might very well be a new ID for each time the sub-query is checked against.
Update:
Doing the update with a join against the derived table instead instead of using in changes the semantics of the query and it changed the result as well.
This works as expected:
UPDATE R
SET IsDefault = 1
FROM Regions as R
inner join
(
SELECT RXXID FROM (
SELECT
RXX.RegionID as RXXID,
ROW_NUMBER() OVER (PARTITION BY RXX.RegionGroupID ORDER BY RXX.RegionGroupID) AS RXXNUM
FROM Regions as RXX
) AS tmp
WHERE tmp.RXXNUM = 1
) as C
on R.RegionID = C.RXXID
I'd like to use a single SQL query (in MySQL) to find the record which comes after one that I specify.
I.e., if the table has:
id, fruit
-- -----
1 apples
2 pears
3 oranges
I'd like to be able to do a query like:
SELECT * FROM table where previous_record has id=1 order by id;
(clearly that's not real SQL syntax, I'm just using pseudo-SQL to illustrate what I'm trying to achieve)
which would return:
2, pears
My current solution is just to fetch all the records, and look through them in PHP, but that's slower than I'd like. Is there a quicker way to do it?
I'd be happy with something that returned two rows -- i.e. the one with the specified value and the following row.
EDIT: Sorry, my question was badly worded. Unfortunately, my definition of "next" is not based on ID, but on alphabetical order of fruit name. Hence, my example above is wrong, and should return oranges, as it comes alphabetically next after apples. Is there a way to do the comparison on strings instead of ids?
After the question's edit and the simplification below, we can change it to
SELECT id FROM table WHERE fruit > 'apples' ORDER BY fruit LIMIT 1
SELECT * FROM table WHERE id > 1 ORDER BY id LIMIT 1
Even simpler
UPDATE:
SELECT * FROM table WHERE fruit > 'apples' ORDER BY fruit LIMIT 1
So simple, and no gymnastics required
Select * from Table
where id =
(Select Max(id) from Table
where id < #Id)
or, based on the string #fruitName = 'apples', or 'oranges' etc...
Select * from Table
where id =
(Select Max(id) from Table
where id < (Select id from Table
Where fruit = #fruitName))
I'm not familiar with the MySQL syntax, but with SQL Server you can do something with "top", for example:
SELECT TOP 1 * FROM table WHERE id > 1 ORDER BY id;
This assumes that the id field is unique. If it is not unique (say, a foreign key), you can do something similar and then join back against the same table.
Since I don't use MySQL, I am not sure of the syntax, but would imagine it to be similar.
Unless you specify a sort order, I don't believe the concepts of "previous" or "next" are available to you in SQL. You aren't guaranteed a particular order by the RDBMS by default. If you can sort by some column into ascending or descending order that's another matter.
This should work. The string 'apples' will need to be a parameter.
Fill in that parameter with a string, and this query will return the entire record for the first fruit after that item, in alphabetical order.
Unlike the LIMIT 1 approach, this should be platform-independent.
--STEP THREE: Get the full record w/the ID we found in step 2
select *
from
fruits fr
,(
--STEP TWO: Get the ID # of the name we found in step 1
select
min(vendor_id) min_id
from
fruits fr1
,(
--STEP ONE: Get the next name after "apples"
select min(name) next_name
from fruits frx
where frx.name > 'apples'
) minval
where fr1.name = minval.next_name
) x
where fr.vendor_id = x.min_id;
The equivalent to the LIMIT 1 approach in Oracle (just for reference) would be this:
select *
from
(
select *
from fruits frx
where frx.name > 'apples'
order by name
)
where rownum = 1
I don't know MySQL SQL but I still try
select n.id
from fruit n
, fruit p
where n.id = p.id + 1;
edit:
select n.id, n.fruitname
from fruits n
, fruits p
where n.id = p.id + 1;
edit two:
Jason Lepack has said that that doesn't work when there are gaps and that is true and I should read the question better.
I should have used analytics to sort the results on fruitname
select id
, fruitname
, lead(id) over (order by fruitname) id_next
, lead(fruitname) over (order by fruitname) fruitname_next
from fruits;
If you are using MS SQL Server 2008 (not sure if available for previous versions)...
In the event that you are trying to find the next record and you do not have a unique ID to reference in an applicable manner, try using ROW_NUMBER(). See this link
Depending on how savvy your T-SQL skill is, you can create row numbers based on your sorting order. Then you can find more than just the previous and next record. Utilize it in views or sub-queries to find another record relative to the current record's row number.
SELECT cur.id as id, nxt.id as nextId, prev.id as prevId FROM video as cur
LEFT JOIN video as nxt ON nxt.id > cur.id
LEFT JOIN video as prev ON prev.id < cur.id
WHERE cur.id = 12
ORDER BY prev.id DESC, nxt.id ASC
LIMIT 1
If you want the item with previous and next item this query lets you do just that.
This also allows You to have gaps in the data!
How about this:
Select * from table where id = 1 + 1