how to find duplicates and gaps in this scenario in mysql - mysql

Hi I have a table that looks like
-----------------------------------------------------------
| id | group_id | source_id | target_id | sortsequence |
-----------------------------------------------------------
| 2 | 1 | 2 | 4 | 1 |
-----------------------------------------------------------
| 4 | 1 | 20 | 2 | 1 |
-----------------------------------------------------------
| 5 | 1 | 2 | 14 | 1 |
-----------------------------------------------------------
| 7 | 1 | 2 | 7 | 3 |
-----------------------------------------------------------
| 20 | 2 | 20 | 4 | 3 |
-----------------------------------------------------------
| 21 | 2 | 20 | 4 | 1 |
-----------------------------------------------------------
Scenario
There are two scenarios that needs to be handled.
Sortsequence column value should be unique against one source_id and group_id. For example if all the records having group_id = 1 AND source_id = 2 should have sortsequence unique. In above example records having id= and 5 which are having group_id = 1 and source_id = 2 have same sortsequence which is 1. This is faulty record. I need to find out these records.
If group_id and source_id is same. The sortsequence columns value should be continous. There should be no gap. For example in above table records having id = 20, 21 having same group_id and source_id and sortsequence value is 3 and 1. Even this is unique but there is a gap in sortsequence value. I need to also find out these records.
MY So Far Effort
I have written a query
SELECT source_id,`group_id`,GROUP_CONCAT(id) AS children
FROM
table
GROUP BY source_id,
sortsequence,
`group_id`
HAVING COUNT(*) > 1
This query only address the scenario 1. How to handle scenario 2? Is there any way to do it in same query or I have to write other to handle second scenario.
By the way query will be dealing with million of records in table so performance must be very good.

Got answer from Tere J Comments. Following query covers above mentioned both criteria.
SELECT
source_id, `group_id`, GROUP_CONCAT(id) AS faultyIDS
FROM
table
GROUP BY
source_id,group_id
HAVING
COUNT(DISTINCT sortsequence) <> COUNT(sortsequence) OR COUNT(sortsequence) <> MAX(sortsequence) OR MIN(sortsequence) <> 1
May be it can help others.

Try this query it will solve both of the cases as you have mentioned in the question.
SELECT
a.*
FROM
tbl a
INNER JOIN
(select
#rn:=IF(#prevG = group_id AND #prevS = source_id, #rn + 1, 1) As rId,
#prevG:=group_id AS group_id,
#prevS:=source_id AS source_id,
id,
sortsequence
FROM
tbl
join
(select #rn:=0, #prevS:=0, #prevG:=0)b
order by group_id, source_id, id) b
ON a.id = b.id AND a.SORTSEQUENCE <> b.RID;
FIDDLE

Related

How to select rows, using group by with minimum field values?

Today I have posted a question and got a good answer: Stuck in building mysql query.
I though it helped me, but I've discovered that it returns wrong data. So I'm reposting the question here, with an answer I received, as well I will explain the problem why it is not working for me.
Example of data:
id | item_id | user_id | bid_price
----------------------------------
1 | 1 | 11 | 1
2 | 1 | 12 | 2
3 | 1 | 13 | 3
4 | 1 | 14 | 1
5 | 1 | 15 | 4
6 | 2 | 16 | 2
7 | 2 | 17 | 1
8 | 3 | 18 | 2
9 | 3 | 19 | 3
10 | 3 | 18 | 2
Expected result:
id | item_id | user_id | bid_price
----------------------------------
1 | 1 | 11 | 1
7 | 2 | 17 | 1
8 | 3 | 18 | 2
Offered solution:
select m.id, m.item_id, m.user_id, m.bid_price
from my_table m
inner join (
select item_id, min(id) min_id, min(bid_price) min_price
from my_table
where item_id IN (1,2,3)
group by item_id
) t on t.item_id = m.item_id
and t.min_price= m.bid_price
and t.min_id = m.id
The problem:
In the sub query the minimum ID is selected entire the group by (item_id) statement and doesn't reflects according to minimum bid_price.
In other words, the minimum id is selected not depending on the price field at all. So, in the result I will get minimum price and minimum id of the group, but this will not be the same row! The id can be related to the row with another bet_price value.
How this query can be adjusted? Thank you in advance!
SELECT min(m.id) AS id, m.item_id, m.user_id, m.bid_price
FROM my_table m
INNER JOIN (
SELECT item_id, min(bid_price) AS min_price
FROM my_table
GROUP BY item_id
) t ON t.item_id = m.item_id
AND t.min_price= m.bid_price
GROUP BY item_id
Output
id item_id user_id bid_price
1 1 11 1
7 2 17 1
8 3 18 2
Live Demo
http://sqlfiddle.com/#!9/a52dc6/13
SELECT DISTINCT
t1.item_id,
t1.bid_price
FROM tab1 t1
WHERE NOT exists(SELECT 1
FROM tab1 t2
WHERE t2.item_id = t1.item_id
AND t2.bid_price < t1.bid_price)
AND t1.item_id IN (1, 2, 3);
http://sqlfiddle.com/#!9/615e0a/5

Sum of Counted records that calculated using "group by" with condition and "group by"

I'm sorry for fuzzy title of this question.
I have 2 Tables in my database and want to count records of first_table using "group by" on a foreign key id that exists in a column of second_table (which stores ids like array "1,2,3,4,5").
id | name | fk_id
1 | john | 1
2 | mike | 1
3 | jane | 2
4 | tailor | 1
5 | jane | 3
6 | tailor | 5
7 | jane | 4
8 | tailor | 5
9 | jane | 5
10 | tailor | 5
id | name | fk_ids | s_fk_id
1 | xxx | 1,5,6 | 1
2 | yyy | 2,3 | 1
3 | zzz | 9 | 1
4 | www | 7,8 | 1
Now i wrote the following query but it not working properly and displays wrong numbers.
I WANT TO:
1-Count records in first_table group by "fk_id"
2-Sum the counted records which exists in "fk_ids"
3-Display the sum result (sum of related counts) grouped by id.
symbol ' ' means ``.
select sum(if(FIND_IN_SET('fk_id', 'fk_ids')>0,'count',0) 'sum', 'count', 'from'.'fk_id', 'second_table'.* FROM 'second_table'
LEFT JOIN
(
SELECT 'fk_id', count(*) 'count'
FROM 'first_table'
group BY 'fk_id'
) AS 'from'
ON FIND_IN_SET('fk_id', 'fk_ids')>0
WHERE 'second_table'.'s_fk_id'=1
GROUP BY 'id'
ORDER by 'count' DESC
This table has many data and we have no plan to change the structure.
Edit:
Desired output:
id | name | sum
1 | xxx | 7 (3+4+0)
2 | yyy | 2 (1+1)
3 | zzz | 0 (0)
4 | www | 0 (0+0)
After two holidays i came back to work and found out that the "FIND_IN_SET" function is not working properly with space contained string.
And the problem is that i was ignored the spaces too, (same as this question)
Finnaly this query worked:
select sum(`count`) `sum`, `count`, `from`.`fk_id`, `second_table`.* FROM `second_table`
LEFT JOIN
(
SELECT `fk_id`, count(*) `count`
FROM `first_table`
group BY `fk_id`
) AS `from`
ON FIND_IN_SET(`fk_id`, replace(`fk_ids`,' ',''))>0
WHERE `second_table`.`s_fk_id`=1
GROUP BY `id`
ORDER by `count` DESC
And the magic is replace(fk_ids,' ','')

SQL select rows which are identical in two values in a way that retains Edit features in output

Apologies if the answer is dead obvious but in spite of a lot of research and trying out different commands, the solution escapes me (I'm more of a lexicographer than a dev).
We have a table which for various reasons has ended up with some rows which have duplicated values in critical cells. A mockup looks like this:
Unique_ID | E_ID | Date | User_ID | V_value
1 | 500 | 2012-05-12 | 23 | 3
2 | 501 | 2012-05-12 | 23 | 3
3 | 501 | 2012-05-13 | 23 | 1
4 | 502 | 2012-05-13 | 23 | 2
5 | 503 | 2012-05-12 | 23 | 2
6 | 7721 | 2012-05-22 | 8845 | 3
7 | 7722 | 2012-05-22 | 8845 | 3
8 | 7722 | 2012-05-22 | 8845 | 3
9 | 7723 | 2012-05-22 | 8845 | 3
So the rows I need as output are Unique_ID 2 & 3 and 7 & 8 as they are identical as regards the E_ID and User_ID field. The values of the other fields are not relevant to our problem. So what I want is this, ideally:
Unique_ID | E_ID | Date | User_ID | V_value
2 | 501 | 2012-05-12 | 23 | 3
3 | 501 | 2012-05-13 | 23 | 1
7 | 7722 | 2012-05-22 | 8845 | 3
8 | 7722 | 2012-05-22 | 8845 | 3
For reasons to do with the data, I need the output to appear with the Edit features (in particular the tick-box or at least the Delete feature) because I need to go through the table manually and discard one or the other duplicate based on decisions/conditions that can't be determined with SQL commands.
The closest I have come is this:
SELECT *
FROM ( SELECT E_ID, User_ID, COUNT(Unique_ID)
AS V_Count
FROM TableName
GROUP BY E_ID, User_ID
ORDER BY E_ID )
AS X
WHERE V_Count > 1
ORDER BY User_ID ASC, E_ID ASC
which does give me the rows with the duplications but because I'm creating the V_Count column to give me the duplicates:
E_ID | User_ID | V_Count
501 | 23 | 2
7722 | 8845 | 2
the output does not give me the Delete option I need - it says it's because there is no unique ID and I get that, as it puts them together in the same row. Is there a way to do this without losing the Unique_ID so I don't lose the Delete function?
You can use aggregation to check for a given user_id and e_id if there are more than one rows. Then join it with your table to get all the columns in the result.
select t1.*
from tablename t1
join (
select e_id,
user_id
from tablename
group by e_id,
user_id
having count(*) > 1
) t2
on t1.e_id = t2.e_id
and t1.user_id = t2.user_id
Which can be more cleanly expressed using the USING clause as:
select *
from tablename t1
join (
select e_id,
user_id
from tablename
group by e_id,
user_id
having count(*) > 1
) t2 using (e_id, user_id)
A sort-of simple method uses exists:
select t.*
from tablename t
where exists (select 1
from tablename t2
where t2.e_id = t.e_id and t2.date = t.date and
t2.user_id = t.user_id and t2.v_value = t.v_value and
t2.unique_id <> t.unique_id
);
An alternative way that puts each combination on a single row with all the ids is:
select e_id, date, user_id, v_value,
group_concat(unique_id) as unique_ids
from tablename t
group by e_id, date, user_id, v_value
having count(*) > 1;

I need to get the average for every 3 records in one table and update column in separate table

Table Mytable1
Id | Actual
1 ! 10020
2 | 12203
3 | 12312
4 | 12453
5 | 13211
6 | 12838
7 | 10l29
Using the following syntax:
SELECT AVG(Actual), CEIL((#rank:=#rank+1)/3) AS rank FROM mytable1 Group BY rank;
Produces the following type of result:
| AVG(Actual) | rank |
+-------------+------+
| 12835.5455 | 1 |
| 12523.1818 | 2 |
| 12343.3636 | 3 |
I would like to take AVG(Actual) column and UPDATE a second existing table Mytable2
Id | Predict |
1 | 11133
2 | 12312
3 | 13221
I would like to get the following where the Actual value matches the ID as RANK
Id | Predict | Actual
1 | 11133 | 12835.5455
2 | 12312 | 12523.1818
3 | 13221 | 12343.3636
IMPORTANT REQUIREMENT
I need to set an offset much like the following syntax:
SELECT #rank := #rank + 1 AS Id , Mytable2.Actual FROM Mytable LIMIT 3 OFFSET 4);
PLEASE NOTE THE AVERAGE NUMBER ARE MADE UP IN EXAMPLES
you can join your existing query in the UPDATE statement
UPDATE Table2 T2
JOIN (
SELECT AVG(Actual) as AverageValue,
CEIL((#rank:=#rank+1)/3) AS rank
FROM Table1, (select #rank:=0) t
Group BY rank )T1
on T2.id = T1.rank
SET Actual = T1.AverageValue

how to write this self join based on three columns

Hello there I have a following table
------------------------------------------
| id | language | parentid | no_daughter |
------------------------------------------
| 1 | 1 | 0 | 2 |
------------------------------------------
| 1 | 1 | 0 | 2 |
------------------------------------------
| 2 | 1 | 1 | 1 |
------------------------------------------
| 2 | 2 | 1 | 1 |
------------------------------------------
| 3 | 1 | 1 | 0 |
------------------------------------------
| 3 | 2 | 1 | 0 |
------------------------------------------
| 4 | 1 | 2 | 0 |
------------------------------------------
| 4 | 2 | 2 | 0 |
------------------------------------------
| 5 | 1 | 2 | 0 |
------------------------------------------
| 5 | 2 | 2 | 1 |
-----------------------------------------
| 5 | 1 | 4 | 1 |
------------------------------------------
| 5 | 2 | 4 | 1 |
------------------------------------------
Scenario
Every record has more than one rows in table with different language ids. parentid tells who is the parent of this record. no_daughter columns tells against each record that how many child one record has. Means in Ideal scenario If no_daughter has value 2 of id = 1 , it means 1 should be parentid of 2 records in same table. But If a record has more than one exitance with respect to language, it will be considered as one record.
My Problem
I need to find out those records where no_daughter value is not correct. It means if no_daughter is 2, there must be two records whoes parentid has that id. In above case record with id = 1 is valid. But record having id = 2 is not valid because the no_daughter = 1 but actual daughter of this record is 2. Same is the case with id=4
Can any body tell me how can I find these faulty records?
Updated after answers
Ken Clark has and shola has given answer which return same result for example shola query is
SELECT DISTINCT
id
FROM
tbl_info t
INNER JOIN
(SELECT
parentid,
COUNT(DISTINCT id) AS childs
FROM
tbl_info
GROUP BY parentid) AS parentchildrelation
ON t.id = parentchildrelation.parentid
AND t.no_daughters != parentchildrelation.childs
This query is returning those ids who have been used as parentid somewhere in table but having wrong no_daughter values. But not returning ids that has value in no_daugter columns but have not been used as parentid any where in table. For exampl id = 5 has no_daughter = 1 but it is not used as parentid in table. So it is also a faulty record. But above query is not capturing such records.
Any help will be much appreciated.
Try this:
SELECT DISTINCT
id
FROM
tbl_info t
Left JOIN
(SELECT
parentid,
COUNT(DISTINCT id) AS childs
FROM
tbl_info
GROUP BY parentid) AS parentchildrelation
ON t.id = parentchildrelation.parentid
Where t.no_daughters != parentchildrelation.childs
Try this:
SELECT id FROM tinfo t inner join
(SELECT parentid, COUNT(distinct language ) as childs FROM tinfo group by parentid) as summary
on t.id=summary.parentid and t.no_daughters!= summary.childs
try this
Select Distinct * From tablename t
Left Join
(
Select COUNT(t1.Id) Doughter,t1.parentid,t1.language From tablename t1 Group By t1.parentid,t1.language
)tbl
On t.id=tbl.parentid And tbl.language=t.language And t.no_daughter<>tbl.Doughter