I have a table in which the data is stored as:
I want only the Branch column to be updated in a way that the sequence becomes continuous. Means the rows from ID 1 to 4 would have same value, but the rows from ID 5 to 7 will have Branch as 3, rows 8 and 9 would have Branch as 4, rows from 10 to 12 would have branch as 5 and so on.
My desired output would look like this:
I don't want the rows to be reordered, means the rows would have same sequence as they are now with continuous increasing ID column, and only the Branch column to be ordered.
I tried it doing with looping but that part is becoming so large and error prone that I was thinking of some other direct approach.
Is it possible through CTEs or any other approach?
How can I do so?
SQL DEMO
I use more columns than necesary just to show what is going on. rn is just to show how the grp is create. You only need grp to the final result.
The idea is create a group sequence based on Id. Then using DENSE_RANK() you get your desire sequence.
This assume ID is sequential number without holes, if your ID has holes, you need to use ROW_NUMBER() to create a sequence.
WITH cte as (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [Branch] ORDER BY [Id]) as rn,
[Id] - ROW_NUMBER() OVER (PARTITION BY [Branch] ORDER BY [Id]) as grp
FROM Table1
)
SELECT *, DENSE_RANK() OVER (ORDER BY grp) as new_branch
FROM cte
OUTPUT
Related
I have three columns in a my-sql table: Id, name and mark. All rows are distinct with each other.
I use the below sql statements. Inside the windowing function, I don't use order by in both the SQL statements. I have only partition and range frame.
Ideally they should give same results under the derived column from windowing function; but the first one always gives the maximum mark under the window; whereas the second one compares the previous row and current row+1 and gives the expected result. The first one is really weird even though I give unbounded preceding and current row; It in fact, considers the whole window rather than the given frame.
Can someone please help.
Statement-1:
select *
,max(mark) over( partition by name rows between unbounded preceding and current row) as w_f
from ( select * from student order by name, mark asc) a
Statement-2:
select *
,max(mark) over( partition by name rows between 1 preceding and 1 following) as w_f
from ( select * from student order by name, mark asc) a
A row (or range) frame without an order by clause does not make sense: how do you define which row is preceding or following if you don't specify which column(s) should be used for ordering.
Also note that the subquery the order by clause probably does not do what you expect it to do. There is no guarantee that the inner sort propagate to the outer query whatsoever.
In absence of sample data and desired results, it is a bit unclear what you actually are trying to do. Assuming that you have ordering column id, the first query would phrase as:
select s.*,
max(mark) over(partition by name rows order by id) as w_f
from student
order by name, id
rows between unbounded preceding and current row is the default window specification (actually that's range between ..., which is equivalent if you have a unique sorting key).
And the second query would go like:
select s.*,
max(mark) over(partition by name rows order by id rows between 1 preceding and 1 following) as w_f
from student
order by name, id
Running MYSQL 5.5 and trying to essentially return only 1 record from each of the conditions in my IN clause. I can't use the DISTINCT because there should be multiple distinct records that are attached to each code (namely cost will be different) from the IN clause. Below is a dummy query of what I was trying to do, but doesn't work in 5.5 because of the ROW_NUMBER() function.
'1b' may have multiple records with differing cost values. title should always be the same across every record with the same codes value.
Any thoughts?
SELECT codes, name_place, title, cost
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY codes) rn
FROM MyDB.MyTable
)
WHERE codes IN ('1b', '1c', '1d', '1e')
AND rn = 1;
I have a table "History" with about 300.000 rows, which is filled with new data daily. I want to keep only the last two lines of every refSchema/refId combination.
Actually I go this way:
First Step:
SELECT refSchema,refId FROM History GROUP BY refSchema,refId
With this statement I get all combinations (which are about 40.000).
Second Step:
I run a foreach which looks up for the existing rows for the query above like this:
SELECT id
FROM History
WHERE refSchema = ? AND refId = ? AND state = 'done'
ORDER BY importedAt
DESC LIMIT 2,2000
Please keep in mind, that I want to hold the last two rows in my table, so I limit 2,2000. If I find matching rows I put the id's in an array called idList.
Final Step
I delete all id's from the array in that way:
DELETE FROM History WHERE id in ($idList)
This all seems not to be the best performance, because I have to check every combination with an extra query. Is there a way to have one delete statement that does the magic to avoid the 40.000 extra queries?
Edit Update: I use AWS Aurora DB
If you are using MySQL 8+, then one conceptually simple way to proceed here is to use a CTE to identify the top two rows per group which you do want to retain. Then, delete any record whose schema/id pair do not appear in this whitelist:
WITH cte AS (
SELECT refSchema, refId
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY refSchema, refId ORDER BY importedAt DESC) rn
FROM History
) t
WHERE rn IN (1, 2)
)
DELETE
FROM History
WHERE (refSchema, refId) NOT IN (SELECT refSchema, refId FROM cte);
If you can't use CTE, then try inlining the above CTE:
DELETE
FROM History
WHERE (refSchema, refId) NOT IN (
SELECT refSchema, refId
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY refSchema, refId ORDER BY importedAt DESC) rn
FROM History
) t
WHERE rn IN (1, 2)
);
I have a very big subquery:
(SELECT Id, Count [...] FROM Something) Counts
I want to create a score for each Id that is the count divided by the max count.
I tried:
SELECT Id, Count/(SELECT MAX(Count)) AS Score
FROM (SELECT Id, Count [...] FROM Something) Counts
But this only returns the first row!
If I do a GROUP BY Id, all scores are equal to 1 (because the maximum is taken for each Id, and not for all Ids).
Do you know what I can do please? I know that in some contexts we can embed a subquery in a WITH clause, but this is not valid in MySQL.
I believe this is what you need:
Select Id, (Count/(SELECT MAX(Count) FROM Something)) As [Score] FROM Something
Explanation:
I believe you want to take max of all counts in the table. In order to do so you need to perform a subquery on the entire set of the table, versus limiting it to a specific id, or grouping it. When you performed your group by operating on ID, assuming each Id is unique, it is effectively returning Id, Count/Count. As you know any non-zero number divided by itself is 1.
I need to select sample rows from a set. For example if my select query returns x rows then if x is greater than 50 , I want only 50 rows returned but not just the top 50 but 50 that are evenly spread out over the resultset. The table in this case records routes - GPS locations + DateTime.
I am ordering on DateTime and need a reasonable sample of the Latitude & Longitude values.
Thanks in advance
[ SQL Server 2008 ]
To get sample rows in SQL Server, use this query:
SELECT TOP 50 * FROM Table
ORDER BY NEWID();
If you want to get every n-th row (10th, in this example), try this query:
SELECT * From
(
SELECT *, (Dense_Rank() OVER (ORDER BY Column ASC)) AS Rank
FROM Table
) AS Ranking
WHERE Rank % 10 = 0;
Source
More examples of queries selecting random rows for other popular RDBMS can be found here: http://www.petefreitag.com/item/466.cfm
Every n'th row to get 50:
SELECT *
FROM table
WHERE row_number() over() MOD (SELECT Count(*) FROM table) / 50 == 0
FETCH FIRST 50 ROWS ONLY
And if you want a random sample, go with jimmy_keen's answer.
UPDATE:
In regard to the requirement for it to run on MS SQL, I think it should be changed to this (no MS SQL Server around to test though):
SELECT TOP 50 *
FROM (
SELECT t.*, row_number() over() AS rn, (SELECT count(*) FROM table) / 50 AS step
FROM table t
)
WHERE rn % step == 0
I suggest that you add a calculated column to your resultset on selection that is obtained as a random number, and then select the top 50 sorted by that column. That will give you a random sample.
For example:
SELECT TOP 50 *, RAND(Id) AS Random
FROM SourceData
ORDER BY Random
where SourceData is your source data table or view. This assumes T-SQL on SQL Server 2008, by the way. It also assumes that you have an Id column with unique ids on your data source. If your ids are very low numbers, it is a good practice to multiply them by a large integer before passing them to RAND, like this:
RAND(Id * 10000000)
If you want an statically correct sample, tablesample is a wrong solution. A good solution as I described in here based on a Microsoft Research paper, is to create a materialized view over your table which includes an additional column like
CAST( ROW_NUMBER() OVER (...) AS BYTE ) AS RAND_COL_, then you can add an index on this column, plus other interesting columns and get statistically correct samples for your queries fairly quickly. (by using WHERE RAND_COL_ = 1).