This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Combine multiple results in a subquery into a single comma-separated value
Concat groups in SQL Server
I want to be able to get the duplication's removed
SELECT Count(Data) as Cnt, Id
FROM [db].[dbo].[View_myView]
Group By Data
HAVING Count(Data) > 1
In MySQL it was as simple as this:
SELECT Count(Data), group_concat(Id)
FROM View_myView
Group By Data
Having Cnt > 1
Does anyone know of a solution? Examples are a plus!
In SQL Server as of version 2005 and newer, you can use a CTE (Common Table Expression) with the ROW_NUMBER function to eliminate duplicates:
;WITH LastPerUser AS
(
SELECT
ID, UserID, ClassID, SchoolID, Created,
ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY Created DESC) AS 'RowNum'
FROM dbo.YourTable
)
SELECT
ID, UserID, ClassID, SchoolID, Created,
FROM LastPerUser
WHERE RowNum = 1
This CTE "partitions" your data by UserID, and for each partition, the ROW_NUMBER function hands out sequential numbers, starting at 1 and ordered by Created DESC - so the latest row gets RowNum = 1 (for each UserID) which is what I select from the CTE in the SELECT statement after it.
Using the same CTE, you can also easily delete duplicates:
;WITH LastPerUser AS
(
SELECT
ID, UserID, ClassID, SchoolID, Created,
ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY Created DESC) AS 'RowNum'
FROM dbo.YourTable
)
DELETE FROM dbo.YourTable t
FROM LastPerUser cte
WHERE t.ID = cte.ID AND cte.RowNum > 1
Same principle applies: you "group" (or partition) your data by some criteria, you consecutively number all the rows for each data partition, and those with values larger than 1 for the "partitioned row number" are weeded out by the DELETE.
Just use distinct to remove duplicates. It sounds like you were using group_concat to join duplicates without actually wanting to use its value. In that case, MySQL also has a distinct you could have been using:
SELECT DISTINCT Count(Data) as Cnt, Id
FROM [db].[dbo].[View_myView]
GROUP BY Id
HAVING Count(Data) > 1
Also, you can't group by something you use in an aggregate function; I think you mean to group by id. I corrected it in the example above.
Related
This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 2 months ago.
I have a table named Work_Items like this:
Assume there are lots of Names (i.e., E,F,G,H,I etc.,) and their respective Date and Produced Items in this table. It's a massive table, so I'd want to write an optimised query.
In this, I want to query the latest A,B,C,D records.
I was using the following query:
SELECT * FROM Work_Items WHERE Name IN ('A','B','C','D') ORDER BY Date DESC OFFSET 0 LIMIT 4
But the problem with this query is, since I'm ordering by Date, the latest 4 records I'm getting are:
I want to get this result:
Please help me in modifying the query. Thanks.
On MySQL 8+, we can use ROW_NUMBER:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Date DESC) rn
FROM Work_Items
WHERE Name IN ('A', 'B', 'C', 'D')
)
SELECT Name, Date, ProducedItems
FROM cte
WHERE rn = 1
ORDER BY Name;
You can use inner join as follows, its working on any mysql version:
select w.name, w.`date`, w.ProducedItems
from _Work_Items w
inner join (
select name, max(date) as `date`
from _Work_Items
group by name
) as s on s.name = w.name and s.`date` = w.`date` ;
Lets say I have a table with the following rows/values:
I need a way to select the values in amount but only once if they're duplicated. So from this example I'd want to select A,B and C the amount once. The SQL result should look like this then:
Use LAG() function and compare previous amount with current row amount for name.
-- MySQL (v5.8)
SELECT t.name
, CASE WHEN t.amount = t.prev_val THEN '' ELSE amount END amount
FROM (SELECT *
, LAG(amount) OVER (PARTITION BY name ORDER BY name) prev_val
FROM test) t
Please check from url https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=8c1af9afcadf2849a85ad045df7ed580
You can handle situation like these with different function depending on what you need:
Case1 : If you have same values per name:
select distinct name, amount from [table name]
Case2 : You have duplicates with different values for each name and you want to pick the one with the highest value. Use min() if you need the minimum one to show up.
select name, max(amount) from [table name] group by 1
Case 3: The one you need with blanks for the rest of the duplications.
Row number will create rows based on values in amount and since the values are the same it will create it incrementally and you can then use IF to create a new column where rank_ > 1 then blanks. This will also cover the case where you would like to select just the minimum value and then have blanks for the rest of the name values
With ranking as (
select
*,
ROW_NUMBER() OVER(PARTITION BY NAME ORDER BY AMOUNT) AS RANK_
from [table]
)
SELECT
*,
IF(RANK_ > 1,"",AMOUNT) AS NEW_AMOUNT
FROM ranking
Case 4: You need to select maximum and put the other names as blank
You will just adjust the order by clause of ROW_NUMBER() to DESC. This will put the rank 1 to the highest amount per name and for the rest, the blank will be filled
With ranking as (
select
*,
ROW_NUMBER() OVER(PARTITION BY NAME ORDER BY AMOUNT DESC) AS RANK_
from [table]
)
SELECT
*,
IF(RANK_ > 1,"",AMOUNT) AS NEW_AMOUNT
FROM ranking
If you are using mysql 8 you can use row_number for this:
with x as (
select *, row_number() over(partition by name order by amount) rn
from t
)
select name, case when rn=1 then amount else '' end amount
from x
See example Fiddle
The other answers are missing a really important point: A SQL table returns an unordered set unless there is an explicit order by.
The data that you have provides has rows that are exact duplicates. For this reason, I think the best approach uses row_number() and an order by in the outer query:
select name, (case when seqnum = 1 then amount end) as amount
from (select t.*,
row_number() over (partition by name, amount) as seqnum
from t
) t
order by name, seqnum;
Note that MySQL does not require an order by argument for row_number().
More commonly, though, you would have some other column (say a date or id) that would be used for ordering. I should also emphasize that this type of formatting is often handled at the application layer and not in the database.
How can i identify the last Row of a distinct set of data in the field for an Alias Column (signaling somehow, with "1" for example).
For this example i need to know, when the ordered GROUP "CARS, COLORS, DRINKS, FRUITS" ends.
Check my intended result on this image:
My base query:
SELECT * FROM `MY_DB` ORDER BY `ITEM`, `GROUP` ASC
As a starter: rows of a SQL table are unordered. There is no inherent ordering of rows. For your question to make sense, you need a column that defines the ordering of the rows in each group - I assumed id.
Then: in MySQL 8.0, one option uses window functions:
select t.*,
(row_number() over(partition by grp order by id desc) = 1) as last_group_flag
from mytable t
In earlier versions, you could use a subquery:
select t.*,
(id = (select max(t1.id) from mytable t1 where t1.grp = t.grp)) as last_group_flag
from mytable t
Note: group is a language keyword, hence not a good choice for a column name. I used grp instead in the query.
You need ordering by item column and order by group column to find the last record per distinct group column.
Use row_number as follows:
select t.*,
Case when row_number() over(partition by group
order by item desc) = 1
then 1 else 0 end as last_group_flag
from your_table t
I'm having a problem with grouping specific columns into one. When I use GROUP BY, the last row always gets selected when it should be the first row.
The main query is:
SELECT cpme_id,
medicine_main_tbl.med_id,
Concat(med_name, ' (', med_dosage, ') ', med_type) AS Medicine,
med_purpose,
med_quantity,
med_expiredate
FROM medicine_main_tbl
JOIN medicine_inventory_tbl
ON medicine_main_tbl.med_id = medicine_inventory_tbl.med_id
WHERE Coalesce(med_quantity, 0) != 0
AND Abs(Datediff(med_expiredate, Now()))
ORDER BY med_expiredate;
SELECT without GROUP BY
If I GROUP BY using any duplicate column value (in this case, I used med_id):
SELECT with GROUP BY
I'm trying to get this output
Expected Output
The output should only be the first two from the first query. Obviously, I cannot use LIMIT.
Since you are using MariaDB, I recommend using ROW_NUMBER here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY med_id ORDER BY med_expireDate) rn
FROM yourTable
)
SELECT cpme_id, med_id, Medicine, med_purpose, med_quantity, med_expireDate
FROM cte
WHERE rn = 1;
This assumes that the "first" row for a given medicine is the one having the earliest expire date. This was the only interpretation of your data which agreed with the expected output.
This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 4 years ago.
My Data Somewhat Looks like above.
I want to mind the latest entry having the maximum log_id as per each group of job_id,run_id,start_hour,end_hour.
I am trying to use the Below Query but unfortunately it is returning the minimum log_id record from the group instead of the maximum.
Please help
select * from
(select * from job_monitor_log order by job_id,log_id)t1
group by job_id,run_id,start_hour,end_hour having max(log_id);
Note - The Query should be as per MYSQL
Expected Output as Below-
One canonical way to do this is to join to a subquery which finds the latest log_id value for each group as you have defined it:
SELECT j1.*
FROM job_monitor_log j1
INNER JOIN
(
SELECT job_id, run_id, start_hour, end_hour, MAX(log_id) AS max_log_id
FROM job_monitor_log
GROUP BY job_id, run_id, start_hour, end_hour
) j2
ON j1.job_id = j2.job_id AND
j1.run_id = j2.run_id AND
j1.start_hour = j2.start_hour AND
j1.end_hour = j2.end_hour AND
j1.log_id = j2.max_log_id;
If you can use MySQL 8+ or later, then you may use analytic functions here:
SELECT log_id, job_id, run_id, run_Date, start_hour, end_hour, job_status
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY job_id, run_id, start_hour, end_hour
ORDER BY log_id DESC) rn
FROM job_monitor_log
) t
WHERE rn = 1;
If there could be two or more records per group which are tied regarding having the max log_id value, then you may replace ROW_NUMBER with RANK or DENSE_RANK to include all such ties.
To cover all bases, we could also use a correlated subquery approach, which is along the lines of what you were originally trying to do:
SELECT log_id, job_id, run_id, run_Date, start_hour, end_hour, job_status
FROM job_monitor_log j1
WHERE log_id = (SELECT MAX(t2.log_id)
FROM job_monitor_log j2
WHERE j1.job_id = j2.job_id AND
j1.run_id = j2.run_id AND
j1.start_hour = j2.start_hour AND
j1.end_hour = j2.end_hour);
This would include all ties for the maximum log_id value per group. But, this is probably the least performant approach of the three queries given. Sometimes though, when using things like ORM frameworks, we might have the need to express the query as shown above.