I am looking to combine rows into a single row if the data is consecutive. I've looked over gaps and islands and I know I used to do this very regularly with SQL Server. The solution escapes me, but I recall doing some type of ROW_NUMBER() over (Partition BY groupId, name, email, phone ORDER BY id) - ROW_NUMBER() over (ORDER BY id) seq with a calculation. I was not successful in getting this to work.
Here is a sample data set:
Desired Result:
Any help would be greatly appreciated.
If you only want the last row in each group, you can use lead():
select t.*
from (select t.*,
lead(id) over (order by id) as next_id,
lead(id) over (partition by groupid, name, email, phone order by id) as next_id_grp
from t
) t
where next_id_grp is null or next_id_grp <> next_id;
This is looking at the next id and the next id for members in the same group. When these are different, then the row is the last row in the group.
This is better than the row_number() approach because that would typically require an aggregation.
Related
Lets say I have a table with the following rows/values:
I need a way to select the values in amount but only once if they're duplicated. So from this example I'd want to select A,B and C the amount once. The SQL result should look like this then:
Use LAG() function and compare previous amount with current row amount for name.
-- MySQL (v5.8)
SELECT t.name
, CASE WHEN t.amount = t.prev_val THEN '' ELSE amount END amount
FROM (SELECT *
, LAG(amount) OVER (PARTITION BY name ORDER BY name) prev_val
FROM test) t
Please check from url https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=8c1af9afcadf2849a85ad045df7ed580
You can handle situation like these with different function depending on what you need:
Case1 : If you have same values per name:
select distinct name, amount from [table name]
Case2 : You have duplicates with different values for each name and you want to pick the one with the highest value. Use min() if you need the minimum one to show up.
select name, max(amount) from [table name] group by 1
Case 3: The one you need with blanks for the rest of the duplications.
Row number will create rows based on values in amount and since the values are the same it will create it incrementally and you can then use IF to create a new column where rank_ > 1 then blanks. This will also cover the case where you would like to select just the minimum value and then have blanks for the rest of the name values
With ranking as (
select
*,
ROW_NUMBER() OVER(PARTITION BY NAME ORDER BY AMOUNT) AS RANK_
from [table]
)
SELECT
*,
IF(RANK_ > 1,"",AMOUNT) AS NEW_AMOUNT
FROM ranking
Case 4: You need to select maximum and put the other names as blank
You will just adjust the order by clause of ROW_NUMBER() to DESC. This will put the rank 1 to the highest amount per name and for the rest, the blank will be filled
With ranking as (
select
*,
ROW_NUMBER() OVER(PARTITION BY NAME ORDER BY AMOUNT DESC) AS RANK_
from [table]
)
SELECT
*,
IF(RANK_ > 1,"",AMOUNT) AS NEW_AMOUNT
FROM ranking
If you are using mysql 8 you can use row_number for this:
with x as (
select *, row_number() over(partition by name order by amount) rn
from t
)
select name, case when rn=1 then amount else '' end amount
from x
See example Fiddle
The other answers are missing a really important point: A SQL table returns an unordered set unless there is an explicit order by.
The data that you have provides has rows that are exact duplicates. For this reason, I think the best approach uses row_number() and an order by in the outer query:
select name, (case when seqnum = 1 then amount end) as amount
from (select t.*,
row_number() over (partition by name, amount) as seqnum
from t
) t
order by name, seqnum;
Note that MySQL does not require an order by argument for row_number().
More commonly, though, you would have some other column (say a date or id) that would be used for ordering. I should also emphasize that this type of formatting is often handled at the application layer and not in the database.
I am trying to write a SQL Query. I want to check if I have any duplicate for a SNO+SG Combination. If a duplicate record like that is found, we select the one which is not having the flag Y.(we dont want to select the record where Flag column is filled for that duplicate combination).
This should be followed only when a duplicate combination is found.
I have attached a sample input and output over here:enter image description here
Can someone help me out on this?
You can use row_number() and count():
select t.*
from (select t.*,
row_number() over (partition by sno, sg order by (flag = 'Y') desc) as seqnum,
count(*) over (partition by sno, sg) as cnt
from t
) t
where cnt > 1 and seqnum = 1;
I have data looking like this:
Is it possible to count using id as a column order (ASC) what is max value 0 occurrence in a ROW?
So expected result will be 0 was 3 times in a row; or if we count value 1 - 2 times in a row.
Well in php I could foreach all values and count occurences but i'm looking for solution to do that in database.
Kind regards
Mark
You can use some a gaps-and-island technique for this, using the difference between row numbers to build groups of consecutive records having the same value.
If ids are always incrementing by 1 (with no gaps):
select sl, max(no_rec)
from (
select t.*, count(*) over(partition by sl, id - rn) no_rec
from (
select t.*, row_number() over(partition by sl order by id) rn
from mytable t
) t
) t
group by sl
Otherwise, we can generate a fake autoincremented id with row_number():
select sl, max(no_rec)
from (
select t.*, count(*) over(partition by sl, rn1 - rn2) no_rec
from (
select
t.*,
row_number() over(order by id) rn
row_number() over(partition by sl order by id) rn
from mytable t
) t
) t
group by sl
Note: this uses window functions, which require MySQL 8.0. In earlier versions, such problem is much more cumbersome to solve.
I'm having a problem with grouping specific columns into one. When I use GROUP BY, the last row always gets selected when it should be the first row.
The main query is:
SELECT cpme_id,
medicine_main_tbl.med_id,
Concat(med_name, ' (', med_dosage, ') ', med_type) AS Medicine,
med_purpose,
med_quantity,
med_expiredate
FROM medicine_main_tbl
JOIN medicine_inventory_tbl
ON medicine_main_tbl.med_id = medicine_inventory_tbl.med_id
WHERE Coalesce(med_quantity, 0) != 0
AND Abs(Datediff(med_expiredate, Now()))
ORDER BY med_expiredate;
SELECT without GROUP BY
If I GROUP BY using any duplicate column value (in this case, I used med_id):
SELECT with GROUP BY
I'm trying to get this output
Expected Output
The output should only be the first two from the first query. Obviously, I cannot use LIMIT.
Since you are using MariaDB, I recommend using ROW_NUMBER here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY med_id ORDER BY med_expireDate) rn
FROM yourTable
)
SELECT cpme_id, med_id, Medicine, med_purpose, med_quantity, med_expireDate
FROM cte
WHERE rn = 1;
This assumes that the "first" row for a given medicine is the one having the earliest expire date. This was the only interpretation of your data which agreed with the expected output.
The following query is working fine without ',MAX(Row)'
WITH QResult AS
(SELECT
ROW_NUMBER() OVER (ORDER BY Ad_Date DESC) AS Row,
*
FROM [vw_ads]
)
SELECT *, MAX(Row)
FROM QResult
When MAX(Row) is added, SQL Server 2008 is throwing the following error :
Column 'QResult.Row' is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause.
When using an aggregate function like SUM, COUNT or MAX, and you want to also select other columns from your data, then you need to group your data by the other column(s) used in your query.
So you need to write something like:
WITH QResult AS
(SELECT
ROW_NUMBER() OVER (ORDER BY Ad_Date DESC) AS Row,
*
FROM [vw_ads]
)
SELECT Co1l, Col2, MAX(Row)
FROM QResult
GROUP BY Col1, Col2
This also means you need to explicitly spell out the columns you want - a good idea in any case. You cannot use * in a GROUP BY clause.
Update: based on your comment, I guess what you really want is something like this:
(see Update #2 - Martin Smith's suggestion is even better than my original idea here)
WITH QResult AS
(SELECT
ROW_NUMBER() OVER (ORDER BY Ad_Date DESC) AS Row,
*
FROM [vw_ads]
)
SELECT
Co1l, Col2,
MaxRow = (SELECT MAX(Row) FROM QResult)
FROM QResult
This will give you the maximum value of Row from the CTE, the same value, for each row of your result set.
Update #2: Martin Smith's suggestion would be this:
WITH QResult AS
(SELECT
ROW_NUMBER() OVER (ORDER BY Ad_Date DESC) AS Row,
*
FROM [vw_ads]
)
SELECT
Co1l, Col2,
MAX(Row) OVER()
FROM QResult
and of course, this works, too - and even more efficient than my solution. Thanks, Martin!
You will need to decide why you are obtaining MAX(Row). Is it the max row by Ad_Date? Is the max row overall?
If you change it to:
WITH QResult AS (SELECT ROW_NUMBER() OVER (ORDER BY Ad_Date DESC) AS Row,* FROM [vw_ads])
SELECT Ad_Date, MAX(Row) from QResult
GROUP BY Ad_Date
...that will return you the max row by Ad_Date which is what I'm assuming you are looking for.