We are doing a simple grouping query to find the duplicated add by next set of items inserted into our db table.
SET #old_set_id = 71, #new_set_id = 72;
SELECT id,
request_id,
data_capture_id as temp_id,
count(data_capture_id ) as item_count
FROM my_table
WHERE request_id= #old_set_id or request_id= #new_set_id
GROUP BY data_capture_id
Will result in a table like,
id request_id temp_id item_count
----------------------------------------
3 71 2324345 1
4 71 6786867 2
8 72 5276345 1
For all the duplicates we need the id of the second item in the group that is what is the id of 72 for the duplicate record 6786867? Currently, it is displaying id of the first set.
Try to start from this query:
select
t1.id,
t1.record_col_1,
t2.request_id,
count(t1.data_capture_id) as item_count
from (
select id, request_id, data_capture_id, record_col_1
from my_table
order by request_id limit 123456789
) t1
inner join (
select request_id, record_col_1
from my_table
order by request_id limit 123456789
) t2 ON t2.record_col_1 = t1.record_col_1 and t2.request_id > t.request_id
group by t1.record_col_1
having item_count > 1
Our first sub-query assures, that data is sorted by request_id before we group it. We have this junk limit 123456789 because by default MySQL ignores sorting in sub-queries (unless we use this hacky limit). And we have another sub-query with sorted data in order to fetch higher request_id from a set with the same record_col_1. Finally, we collapse data by record_col_1 and filter only duplicates.
I'm not sure, if it will work, but try it.
Related
Criteria :
1) unique combination of 2 columns(column1,column2)
2) keep oldest one out of that combination
3) records might be same i.e. same column1, column2 and creation date in that case need the one which has lesser id.
e.g. data is as below:
ID column1 column2 creation_date(dd-mm-yyyy)
1 11 aa 10/5/2016
2 11 aa 11/6/2016
3 12 bb 10/5/2017
4 12 bb 20-05-2017
5 12 cc 10/5/2016
6 12 cc 11/5/2017
7 13 dd 10/1/2018
8 13 dd 10/1/2018
I need to keep records with id: 1,3,5,7
Approach I am thinking of is:
a) first write select query to get required records (in this example 1,3,5,7)
b) write update query to change status to deleted using update query(soft delete)
Also please suggest if any other better approach to fulfill the criteria.
Additional information:
*total number of records: 11k
*I don't want to get records directly from table rather than that I have a query which fetches only required data, need to run query on those records
*Final aim is to modify status of duplicate records to deleted and append deleted word to those records
This is really straight forward if you use analytic functions. The query has three parts:
A) Assign a rank to each record like this:
Group records by column1 and column2. Within each group, sort the records first by creation_date and then by ID. Assign 1 to the first record, 2 to the second and so on.
B) Keep only the duplicates, i.e. the records with newer creation_date and/or ID. The record with rnk = 1 would be your requested record. Records with rnk > 1 are the duplicates.
C) Using ROWID, delete the duplicates
delete
from your_table
where rowid in(-- (C)
select duplicate_rowid
from (select rowid as duplicate_rowid
,row_number() over( -- (A)
partition by column1, column2 -- Your criterion 1
order by creation_date asc -- Your criterion 2
,id asc -- Your criterion 3
) as rnk
from your_table
)
where rnk > 1 -- (B)
);
So final queries which worked for my question are as below:
1) to get count of records/ to get required columns:
SELECT --count (*) -use this to get count of records
ID, COLUMN1, COLUMN2,CREATION_DATE --required columns
FROM
MY_TABLE
WHERE
ROWID IN(
select duplicate_rowid
from (select rowid as duplicate_rowid
,row_number() over(
partition by COLUMN1, COLUMN2 -- criterion 1
ORDER BY CREATION_DATE ASC -- criterion 2
,ID ASC -- criterion 3
) AS RNK
from MY_TABLE
)
WHERE (RNK > 1 and COLUMN1 IS NOT NULL and COLUMN2 IS NOT NULL)
);
2) to update records with status=deleted and append _deleted word to column1 values:
UPDATE MY_TABLE
SET STATUS='deleted' , COLUMN1=CONCAT(COLUMN1,'_deleted')
WHERE
ROWID IN(
select duplicate_rowid
from (select rowid as duplicate_rowid
,row_number() over(
partition by COLUMN1, COLUMN2 -- criterion 1
ORDER BY CREATION_DATE ASC -- criterion 2
,ID ASC -- criterion 3
) AS RNK
from MY_TABLE
)
WHERE (RNK > 1 and COLUMN1 IS NOT NULL and COLUMN2 IS NOT NULL)
);
Sample table:
id------user_id------grade_id------time_stamp
1---------100----------1001---------2013-08-29 15:07:38
2---------101----------1002---------2013-08-29 16:07:38
3---------100----------1001---------2013-08-29 17:07:38
4---------102----------1003---------2013-08-29 18:07:38
5---------103----------1004---------2013-08-29 19:07:38
6---------105----------1002---------2013-08-29 20:07:38
6---------100----------1002---------2013-08-29 21:07:38
I want to select rows whose user_id = 100 group by grade_id only if its time_stamp is least for that particular grade_id.
so, from the above table, it should be:
row 1 because its time_stamp is least for that value of grade_id(1001)
but not row 2 because I only want 1 row for a particular grade_id
also not row 6 because that particular grade_id has least value for user_id 105.
I tried few things, which are too basic and obviously not worth posting.
Thank You
You could try nested queries:
SELECT grade_id, COUNT(grade_id) FROM SAMPLE_TABLE ST WHERE time_stamp = (SELECT MIN(time_stamp) FROM SAMPLE_TABLE STT WHERE STT.grade_id = ST.grade_id) AND user_id = 100 GROUP BY grade_id;
In this case, the nested query will give you the minimun timestamp for each specific 'grade_id' and you can use it in your WHERE filter.
SELECT t.*
FROM tableX AS t
JOIN
( SELECT grade_id, MIN(time_stamp) AS time_stamp
FROM tableX
GROUP BY grade_id
) AS g
ON g.grade_id = t.grade_id
AND g.time_stamp = t.time_stamp
WHERE t.user_id = 100 ;
Here's a query:
SELECT *
FROM table
WHERE id = 1
OR id = 100
OR id = 50
Note that I provided the ids in this order: 1,100,50.
I want the rows to come back in that order: 1,100,50.
Currently, i comes back 1,50,100 - basically in ascending order. Assume the rows in the table were inserted in ascending order also.
Use the MySQL specific FIND_IN_SET function:
SELECT t.*
FROM table t
WHERE t.id IN (1, 100, 50)
ORDER BY FIND_IN_SET(CAST(t.id AS VARCHAR(8)), '1,100,50')
Another way to approach this would put the list in a subquery:
select table.*
from table join
(select 1 as id, 1 as ordering union all
select 100 as id, 2 as ordering union all
select 50 as id, 3 as ordering
) list
on table.id = list.id
order by list.ordering
You can just do this with ORDER BY:
ORDER BY
id = 1 DESC, id = 100 DESC, id = 50 DESC
0 is before 1 in ORDER BY.
Try this
SELECT *
FROM new
WHERE ID =1
OR ID =100
OR ID =50
ORDER BY ID=1 DESC,ID=100 DESC,ID=50 DESC ;
http://www.sqlfiddle.com/#!2/796e2/5
... WHERE id IN (x,y,x) ORDER BY FIELD (id,x,y,z)
I have table with, folowing structure.
tbl
id name
1 AAA
2 BBB
3 BBB
4 BBB
5 AAA
6 CCC
select count(name) c from tbl
group by name having c >1
The query returning this result:
AAA(2) duplicate
BBB(3) duplicate
CCC(1) not duplicate
The names who are duplicates as AAA and BBB. The final result, who I want is count of this duplicate records.
Result should be like this:
Total duplicate products (2)
The approach is to have a nested query that has one line per duplicate, and an outer query returning just the count of the results of the inner query.
SELECT count(*) AS duplicate_count
FROM (
SELECT name FROM tbl
GROUP BY name HAVING COUNT(name) > 1
) AS t
Use IF statement to get your desired output:
SELECT name, COUNT(*) AS times, IF (COUNT(*)>1,"duplicated", "not duplicated") AS duplicated FROM <MY_TABLE> GROUP BY name
Output:
AAA 2 duplicated
BBB 3 duplicated
CCC 1 not duplicated
For List:
SELECT COUNT(`name`) AS adet, name
FROM `tbl` WHERE `status`=1 GROUP BY `name`
ORDER BY `adet` DESC
For Total Count:
SELECT COUNT(*) AS Total
FROM (SELECT COUNT(name) AS cou FROM tbl GROUP BY name HAVING cou>1 ) AS virtual_tbl
// Total: 5
why not just wrap this in a sub-query:
SELECT Count(*) TotalDups
FROM
(
select Name, Count(*)
from yourTable
group by name
having Count(*) > 1
) x
See SQL Fiddle with Demo
The accepted answer counts the number of rows that have duplicates, not the amount of duplicates. If you want to count the actual number of duplicates, use this:
SELECT COALESCE(SUM(rows) - count(1), 0) as dupes FROM(
SELECT COUNT(1) as rows
FROM `yourtable`
GROUP BY `name`
HAVING rows > 1
) x
What this does is total the duplicates in the group by, but then subtracts the amount of records that have duplicates. The reason is the group by total is not all duplicates, one record of each of those groupings is the unique row.
Fiddle: http://sqlfiddle.com/#!2/29639a/3
SQL code is:
SELECT VERSION_ID, PROJECT_ID, VERSION_NO, COUNT(VERSION_NO) AS dup_cnt
FROM MOVEMENTS
GROUP BY VERSION_NO
HAVING (dup_cnt > 1 && PROJECT_ID = 11660)
I'm using this query for my own table in PHP, but it only gives me one result whereas I'd like to the amount of duplicate per username, is that possible?
SELECT count(*) AS duplicate_count
FROM (
SELECT username FROM login_history
GROUP BY username HAVING COUNT(time) > 1
) AS t;
Consider this classical setup:
entry table:
id (int, PK)
title (varchar 255)
entry_category table:
entry_id (int)
category_id (int)
category table:
id (int, PK)
title (varchar 255)
Which basically means entries can be in one or more categories (the entry_category table is used as MM/join table)
Now I need to query 6 unique categorys along with 1 unique entries from these categories by RANDOM!
EDIT: To clarify: the purpose of this is to display 6 random categories with 1 random entry per category.
A correct result set would look like this:
category_id entry_id
10 200
20 300
30 400
40 500
50 600
60 700
This would be incorrect as there are duplicates in the category_id column:
category_id entry_id
10 300
20 300
...
And this is incorrect as there are duplicates in the member_id column:
category_id entry_id
20 300
20 400
...
How can I query this?
If I use this simple query with order by rand, the result contains duplicated rows:
select c.id, e.id
from category c
inner join entry_category ec on ec.category_id = c.id
inner join entry e on e.id = ec.entry_id
group by c.id
order by rand()
Performance is at the moment not the most important factor, but I would need a reliably working query for this, and the above is pretty much useless and does not do what I want at all.
EDIT: as an aside, the above query is no better when using select distinct ... and leaving out the group by. This includes duplicate rows as distinct only makes sure that the combinations of c.id and e.id are unique.
EDIT: one solution I found, but probably slow as hell on larger datasets:
select t1.e_id, t2.c_id
from (select e.id as e_id from entry e order by rand()) t1
inner join (select ec.entry_id as e_id, ec.category_id as c_id from entry_category ec group by e_id order by rand()) t2 on t2.e_id = t1.e_id
group by t2.c_id
order by rand()
SELECT category_id, entity_id
FROM (
SELECT category_id,
#ce :=
(
SELECT entity_id
FROM category_entity cei
WHERE cei.category_id = ced.category_id
AND NOT FIND_IN_SET(entity_id, #r)
ORDER BY
RAND()
LIMIT 1
) AS entity_id,
(
SELECT #r := CAST(CONCAT_WS(',', #r, #ce) AS CHAR)
)
FROM (
SELECT #r := ''
) vars,
(
SELECT DISTINCT category_id
FROM category_entity
ORDER BY
RAND()
LIMIT 15
) ced
) q
WHERE entity_id IS NOT NULL
LIMIT 6
This solution is not a piece of code I'd be proud of, since it relies on black magic of session variables in MySQL to keep the recursion stack. However, it works.
Also it's not perfectly random and can in fact yield less than 6 values (if entity_id's duplicate across the categories too often). In this case, you can increase the value of 15 in the innermost query.
Create a unique index or a PRIMARY KEY on category_entity (category_id, entity_id) for this to work fast.
Seems to me that the good way to do this is to pick 6 distinct values from each set, shuffle each list of values (each list individually), and then glue the lists together into a two-column result.
To randomize which six you get, shuffle the entire list of each type of value, and grab the first six.