How to find duplicate rows with similar part of string - mysql

I have thousands of rows in a table. Which is some rows has similar keyword but can be categorize to the same group. For example :
Table : Birds_Name
+-------+---------------------+
|ID |Name |
+-------+---------------------+
|1 |Blue Peckwood |
+-------+---------------------+
|2 |North Peckwood |
+-------+---------------------+
|3 |Northern Peckwood |
+-------+---------------------+
|4 |Northern Peckwood |
+-------+---------------------+
|5 |Red Heron |
+-------+---------------------+
|6 |Red Heron |
+-------+---------------------+
As for the table above there should be 2 groups of birds. They are Peckwook and Heron.
But after I run this mySQL I get :
SELECT *
FROM birds_name
WHERE name IN (
SELECT name
FROM birds_name
GROUP BY name
HAVING COUNT(*) > 1
)
After I run the query. This is what I've got:
+-------+---------------------+
|3 |Northern Peckwood |
+-------+---------------------+
|4 |Northern Peckwood |
+-------+---------------------+
|5 |Red Heron |
+-------+---------------------+
|6 |Red Heron |
+-------+---------------------+
Actually, I expect any row which share a similar string to be chosen (in this case it's Peckwood. So it should have only 2 groups - Peckwood and Heron.
Is it possible to do so? And how to adapt mysql code to achieve it?
Regards.

Try this
SELECT SUBSTRING_INDEX(name,' ',-1),count(*)
FROM birds_name
GROUP BY SUBSTRING_INDEX(name,' ',-1) HAVING count(*)>0;
Manual for SUBSTRING_INDEX function in mysql.

Can you try this.
SELECT count(id),name
FROM birds_name
group by name
having count(id) >1
Thanks
SQL Fiddle

I think you can separate those words using MySQL String functions, like below:
mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
-> 'www.mysql'
mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2);
-> 'mysql.com'
Then, use it in the GROUP BY clause of your query.
UPDATE :
Here is my SQLFiddle.

Related

Counting whole DB while searching for specific SQL

I have a table in db for customers and their glasses
customer_inventory_tbl:
SELECT * FROM customer_inventory_tbl
+-------+-------+-------+
|id(pk) | name | spex |
+-------+-------+-------+
|1 |John |Oval |
|2 |Steve |Angular|
|3 |John |Aviator|
|4 |Kevin |Supra |
|5 |Jamie |Oval |
|6 |Ben |Supra |
+-------+-------+-------+
(this is a way more simplified version, haha)
If I view John's record it shows
SELECT * FROM customer_inventory_tbl WHERE name=John
+-------+-------+-------+
|id(pk) | name | spex |
+-------+-------+-------+
|1 |John |Oval |
|3 |John |Aviator|
+-------+-------+-------+
But what I require is when viewing John's record, it to show me
+-------+-------+-------+-----+
|id(pk) | name | spex |count|
+-------+-------+-------+-----+
|1 |John |Oval |2 |
|3 |John |Aviator|1 |
+-------+-------+-------+-----+
That "count" column is the number of records in the database that has "Oval" for instance.
Now that is easy enough, if I wanted to count every record in the db, but how do I get the count of all records whilst looking for a specific name.
I hope this makes sense
select c.*,
(
select count(1)
from customer_inventory_tbl
where spex = c.spex
) "count"
from customer_inventory_tbl c;
As a solution according to above mentioned description please try executing following sql query
SELECT *,(select count(id) from customer_inventory_tbl group by spex)
as count FROM customer_inventory_tbl WHERE name='John'
In above mentioned sql query counter value is being retrieved through subquery with records grouped according to values of spex column using GROUP BY clause.

Get latest rows from my sql database

I have a table where there in rows there is a column called version. I have 2 same entries with 1 column say abc(unique) in all the same rows. I have 2 rows as follows
ID|Name|Version|Unique_Id
-------------------------
1 |abc |1 | 23
2 |abc1|2 |23
3 |xyz |1 |21
4 |tre |1 |20
I want the result as
ID|Name|Version|Unique_Id
-------------------------
2 |abc1|2 |23
3 |xyz |1 |21
4 |tre |1 |20
I have tried grouping by Unique_Id, the result is as follows
ID|Name|Version|Unique_Id
-------------------------
1 |abc |1 | 23
3 |xyz |1 |21
4 |tre |1 |20
Following is the query I am using
SELECT * FROM test
group by Unique_Id
order by Version desc;
I want latest(top order by desc) of each each rows. Please help. How can i achieve that.
How about something like
INSERT INTO tbllogs
(logorigin,
logaction,
loguser,
logdate,
logoutcome)
VALUES (:origin,
:action,
:user,
:dt,
:outcome)
Use a sub select to determine the id and its max version number, then join back to the original table to retrieve the other values.
SQL Fiddle DEMO

A little help in my Mysql statement

If there's a sql select statement that get the last value of the data.
I mean like this:
database table 1
|id| code| name |
|1 | abc | absent |
|2 | cbd | tabsent|
|3 | def | late |
|4 | efg | kalant |
how to get the data with the end value of ent:
SELECT * FROM table 1
WHERE (endValue of name)= "ent"
SELECT *
FROM Table1
WHERE name like "%ent"
SELECT *
FROM Table1
WHERE RIGHT(name, 3) = "ent"
would also be a possibility, instead of a LIKE "%ent" statement. See http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_right

multicount on SQl query

I am looking for the proper SQl query to pull data from a the database and COUNT the specific rows to come up with a total... here's my table:
------------------------------------------
|name |App |Dep |Sold |
------------------------------------------
|Joe |1 |1 |2 |
|Joe |1 |2 |2 |
|Steve |1 |1 |1 |
|Steve |1 |2 |1 |
------------------------------------------
So I need to count the "1" in each column for each name and come up and output the totals like this:
Joe | 2 App | 1 Dep | 0 Sold
Steve | 2 App | 1 Dep | 2 Sold
Anyone have a starting point for me? I'm not sure if i need JOINs or i can just add seperate COUNTs for each column?
SELECT Name,
SUM(App = 1) TotalApp,
SUM(Dep = 1) TotalDep,
SUM(Sold = 1) TotalSold
FROM tableName
GROUP BY Name
SQLFiddle Demo
App = 1 is a mysql specific syntax which performs boolean arithmetic resulting 1 and 0. To make it more RDBMS friendly, you can use CASE eg. SUM(CASE WHEN App = 1 THEN 1 ELSE 0 END).
SQL Fiddle Demo using CASE statement

Creating an "average grade list" for ranked IDs in SQL

Problem description
I'm trying to get a comma-separated list of average grades for each recommendation, which consists of another comma-separated list of recommended content IDs. A recommendation is an object which consists of content that will receive the recommendation (ContentID) and a list of other contents that will be recommended (RecommendedContentIDs).
Table structure, sample data and other limitations
I have a two table database structure. The first table contains a recommended content IDs saved as a comma-separated ranked list. The second table contains grades for each of the recommended content IDs. The ranked lists have up to 10 comma-separated values and grades range from 0 to 5.
To better illustrate the problem, here are the table structures and some sample data:
Table Recommendations
|ID |ContentID |RecommendedContentIDs |Type |
+------+-------------+----------------------+-----+
|1 |2051 |9706,14801,13354,... |a |
+------+-------------+----------------------+-----+
|67 |2051 |8103,16366,8795,... |b |
+------+-------------+----------------------+-----+
|133 |2051 |8795,8070,15341,... |c |
+------+-------------+----------------------+-----+
|22 |1234 |4782,283,33,... |a |
+------+-------------+----------------------+-----+
...
Table Grades
|ID |RecommendationID |RecommendedDocumentID |Grade |EvaluatorHash|
+------+-----------------+----------------------+------+-------------+
|1 |1 |9706 |4 |123456789 |
+------+-----------------+----------------------+------+-------------+
|2 |1 |14801 |5 |123456789 |
+------+-----------------+----------------------+------+-------------+
|3 |1 |13354 |3 |987654321 |
+------+-----------------+----------------------+------+-------------+
|3 |1 |9706 |3 |987654321 |
+------+-----------------+----------------------+------+-------------+
|4 |67 |8103 |5 |123456789 |
+------+-----------------+----------------------+------+-------------+
|1 |67 |16366 |4 |987654321 |
+------+-----------------+----------------------+------+-------------+
|1 |133 |8795 |2 |123456789 |
+------+-----------------+----------------------+------+-------------+
...
I've transformed the RecommendedContentIDs column in table Recommendations into a separate table that looks like this:
Table RecommendedContent
|ID |RecommendationID |RecommendedContentID |Rank |
+------+-----------------+---------------------+-----+
|1 |1 |9706 |1 |
+------+-----------------+---------------------+-----+
|2 |1 |14801 |2 |
+------+-----------------+---------------------+-----+
|3 |1 |13354 |3 |
+------+-----------------+---------------------+-----+
|4 |1 |12787 |4 |
+------+-----------------+---------------------+-----+
...
+------+-----------------+---------------------+-----+
|11 |2 |19042 |1 |
+------+-----------------+---------------------+-----+
|12 |2 |13376 |2 |
+------+-----------------+---------------------+-----+
|13 |2 |9853 |3 |
+------+-----------------+---------------------+-----+
Expected result
I would now like to make a query that would return a result set that contains two comma-separated lists which are correspondent, so that I'll be able to display the average grade for each recommended content ID. It should look something like this:
|ContentID |RecommendedContentIDs |RecommendedContentAverageGrades |Type |
+-------------+-------------------------+----------------------------------+------+
|2051 |9706,14801,13354,... |3.5,5.0,3.0,... |a |
+-------------+-------------------------+----------------------------------+------+
|2051 |8103,16366,8795,... |5.0,4.0,0.0,... |b |
+-------------+-------------------------+----------------------------------+------+
|2051 |8795,8070,15341,... |2.0,0.0,0.0,... |c |
+-------------+-------------------------+----------------------------------+------+
...
As you can see, the RecommendedContentAverageGrades column contains the average grades for each corresponding ContentID in the column RecommendedContentIDs (Content with ID 9706 was graded twice, once with 4 and once with 3 therefore the average is 3.5). If the content hasn't been graded, the average grade should be 0. What is really important here is that the two comma-separated lists are correspondent, because the list in RecommendedContentIDs is a ranked list.
I would normally implement something like this in C#, but I was wondering whether it can be done with SQL. I was thinking of using GROUP_CONCAT but I wasn't able to get a proper result set. I would be very grateful if someone would provide a working SQL query for MySQL and/or T-SQL, but just suggestions will be fine too.
Edits
#1 - Laurence mentioned using separate tables instead of comma-separated lists. I'm using them due to an old design, which I cannot change. However, I am open to answers which assume that data in comma-separated lists is stored in separate tables.
#2 - Changed structure like Laurence suggested (using separated tables - see updated structure).
This just follows up the answer given by #Laurence:
http://sqlfiddle.com/#!2/7d236/6
Updated with Akrigg's fix and sql fiddle, also with how to order by values in the recommendation table
Also updated using order by in the group_concat clause as per brozo's fix:
Table RecommendedContent
+-----------------+----------------------+
|RecommendationID | RecommendedContentID |
+-----------------+----------------------+
| 1 | 9706 |
| 1 | 14801 |
| 1 | 13354 |
| 67 | 8103 |
| ... | ... |
+-----------------+----------------------+
Select
a.RecommendationID,
a.ContentID,
Group_Concat(a.RecommendedContentId Order By a.Rank),
Group_Concat(Trim(Trailing '.' From Trim(Trailing '0' From a.AverageGrade)) Order By a.Rank),
a.Type
From (
Select
r.RecommendationID,
r.ContentID,
r.Type,
rc.RecommendedContentID,
rc.Rank,
Coalesce(Avg(g.Grade), 0) As AverageGrade
From
Recommendations r
Left Outer Join
RecommendedContent rc
On r.RecommendationID = rc.RecommendationID
Left Outer Join
Grades g
On rc.RecommendedContentID = g.RecommendedDocumentID And
rc.RecommendationID = g.RecommendationID
Group By
r.RecommendationID,
r.ContentID,
r.Type,
rc.RecommendedContentID,
rc.Rank
) as a
Group By
a.RecommendationID,
a.ContentID,
a.Type
Order By
a.ContentID, -- Or other way round if that's what you prefer
a.RecommendationID
http://sqlfiddle.com/#!2/ca8b8/8
You could create a custom aggreate in sql server to do the comma separated string concatenation and then use it like this:
SELECT ContentID, RecommendedContentIDs, CustomToCsv(AvgGrade), Type FROM
(
SELECT ContentID, RecommendedContentIDs, AVG(Grade) AvgGrade, Type
FROM Recommendations r INNER JOIN Grades g ON r.ID = g.RecommendationID
GROUP BY ContentID, RecommendedContentIDs, RecommendedDocumentID, Type
) as t
GROUP BY ContentID, RecommendedContentIDs, Type
this is done in oracle
WITH count_number AS
(SELECT
ContentID,
','
||RecommendedContentIDs
||',' new_ContentIDs,
RecommendedContentIDs,
type ,
LENGTH(RECOMMENDEDCONTENTIDS )-LENGTH(REPLACE(RECOMMENDEDCONTENTIDS ,','))+1 COUNT_ID
FROM Recommendations
) ,
RecommendedContentIDs_postion AS
(SELECT A1.*,
B1.CONTENTIDS_OCCURANCE_POSITION ,
SUBSTR(new_ContentIDs,instr(new_ContentIDs,',',1,ContentIDs_OCCURANCE_POSITION)+1 , INSTR(new_ContentIDs,',',1,ContentIDs_OCCURANCE_POSITION+1)-instr(new_ContentIDs,',',1,ContentIDs_OCCURANCE_POSITION)-1) ContentIDs
FROM count_number a1,
(SELECT I ContentIDs_OCCURANCE_POSITION
FROM DUAL model dimension BY (1 i) measures (0 X) (X[FOR I
FROM 2 TO 1000 increment 1] = 0)
) b1
WHERE b1.ContentIDs_OCCURANCE_POSITION<=a1.count_id
)
SELECT
CONTENTID,
WM_CONCAT(CONTENTIDS) RECOMMENDEDCONTENTIDS ,
WM_CONCAT(GRADE) avg_grade_contentid ,
type
FROM RECOMMENDEDCONTENTIDS_POSTION RCI,
(SELECT RECOMMENDEDDOCUMENTID,
AVG(GRADE) GRADE
FROM Grades
GROUP BY RECOMMENDEDDOCUMENTID
) GRD
WHERE TRIM(RCI.CONTENTIDS)=TRIM(GRD.RECOMMENDEDDOCUMENTID)
GROUP BY
ContentID,
type;