MySql: Select Distinct for words in different order - mysql

I have problem with creating query, which getting no duplicate values form my table. Unfortunately, Full Name column has Name and Surname in different order.
For example:
+----+----------------------+
| ID | Full Name |
+----+----------------------+
| 1 | Marshall Wilson |
| 2 | Wilson Marshall |
| 3 | Lori Hill |
| 4 | Hill Lori |
| 5 | Casey Dean Davidson |
| 6 | Davidson Casey Dean |
+----+----------------------+
I would like to get that result:
+----+-----------------------+
| ID | Full Name |
+----+-----------------------+
| 1 | Marshall Wilson |
| 3 | Lori Hill |
| 5 | Casey Dean Davidson |
+----+-----------------------+
My target is to create query, which getting in similar way, for example: select distinct for Name and Surname in the same order.
Any thoughts?

It requires lots of String operations, and usage of multiple Derived Tables. It may not be efficient.
We first tokenize the FullName into multiple words it is made out of. For that we use a number generator table gen. In this case, I have assumed that maximum number of substrings is 3. You can easily extend it further by adding more Selects, like, SELECT 4 UNION ALL .. and so on.
We use Substring_Index() with Replace() function to get a substring out, using a single space character (' ') as Delimiter. Trim() is used to remove any leading/trailing spaces left.
Now, the trick is to use this result-set as a Derived table, and do a Group_Concat() on the words such that they are sorted in a ascending order. This way even the duplicate names (but substrings in different order), will get similar words_sorted value. Eventually, we simply need to Group By on words_sorted to weed out the duplicates.
Query #1
SELECT
MIN(dt2.ID) AS ID,
MIN(dt2.FullName) AS FullName
FROM
(
SELECT
dt1.ID,
dt1.FullName,
GROUP_CONCAT(IF(word = '', NULL, word) ORDER BY word ASC) words_sorted
FROM
(
SELECT e.ID,
e.FullName,
TRIM(REPLACE(
SUBSTRING_INDEX(e.FullName, ' ', gen.idx),
SUBSTRING_INDEX(e.FullName, ' ', gen.idx-1),
'')) AS word
FROM employees AS e
CROSS JOIN (SELECT 1 AS idx UNION ALL
SELECT 2 UNION ALL
SELECT 3) AS gen -- You can add more numbers if more than 3 substrings
) AS dt1
GROUP BY dt1.ID, dt1.FullName
) AS dt2
GROUP BY dt2.words_sorted
ORDER BY ID;
| ID | FullName |
| --- | ------------------- |
| 1 | Marshall Wilson |
| 3 | Hill Lori |
| 5 | Casey Dean Davidson |
View on DB Fiddle

Related

Mysql - BIGINT value is out of range in error using substring_index

select substring_index(SUBSTRING_INDEX(title, ' ', title+1), ' ',-1) as word ,
COUNT(*) AS counter
from feed_collections
group by word
ORDER BY counter DESC;
The table has 1785123 rows and I thing this is the problem.
This is the error query (1690): BIGINT value is out of range in '(feed_collections.title + 1)' and I don't know how to fix it.
The query worked until around 1500000 rows.
The table contains 3 columns: title(text), url(text), date(datetime).
The code is finding most common words in column title
Example:
Table
+----------------------------------+-----------------+
| title | url |
+----------------------------------+-----------------+
| the world of ukraine | www.ab |
| count the days until christmas | www.abc.com |
| EU and NATO wants to use bombs | www.abcd.com |
| Ukraine needs help from NATO | www.abce.com |
+----------------------------------+-----------------+
Result
+------+-------+
| word | total |
+------+-------+
| nato | 5 |
| of | 14 |
| and | 11 |
| To | 9 |
| that | 7 |
| ukraine | 2 |
| EU | 1 |
+------+-------+
I adapted the code from here:
How to find most popular word occurrences in MySQL?
This works with small data. Seems to be a problem when tries to filter large data.
What I'm trying to achive in the future is to find most common words in the title column grouped by 1,2,3,4,5,6,7 words.
It will exists a select box to select how many words to use.
Example:
I will select to find most common words with 4 words.
Title: 1. Nato is using force , 2. Eu and Nato is using force.
Results with 4 words:
'nato is using force' found 2 times in title.
Any idea how to fix or how to do a query for this?
I'm working with laravel, a solution would be to create a php method...

How to find data based on comma separated parameter in comma separated data in my SQL query

We have below data,
plant table
----------------------------
| name | classification |
| A | 1,4,7 |
| B | 2,3,7 |
| C | 3,4,9,8 |
| D | 1,5,6,9 |
Now from front end side, they will send multiple parameter like "4,9",
and the objective output should be like this
plant table
---------------------------
| name | classification |
| A | 1,4,7 |
| C | 3,4,9,8 |
| D | 1,5,6,9 |
Already tried with FIND_IN_SET code, but only able to fetch only with 1 parameter
select * from plant o where find_in_set('4',classification ) <> 0
Another solution is by doing multiple queries, for example if the parameter is "4,9" then we do loop the query two times with parameter 4 and 9, but actually that solution will consume so much resources since the data is around 10000+ rows and the parameter itself actually can be more than 5 params
If the table design is in bad practice then OK but we are unable to change it since the table is in third party
Any solution or any insight will be appreciated,
Thank you
Schema (MySQL v8.0)
CREATE TABLE broken_table (name CHAR(12) PRIMARY KEY,classification VARCHAR(12));
INSERT INTO broken_table VALUES
('A','1,4,7'),
('B','2,3,7'),
('C','3,4,9,8'),
('D','1,5,6,9');
Query #1
WITH RECURSIVE cte (n) AS
(
SELECT 1
UNION ALL
SELECT n + 1 FROM cte WHERE n < 5
)
SELECT DISTINCT x.name, x.classification FROM broken_table x JOIN cte
WHERE SUBSTRING_INDEX(SUBSTRING_INDEX(classification,',',n),',',-1) IN (4,9);
name
classification
A
1,4,7
C
3,4,9,8
D
1,5,6,9
View on DB Fiddle
EDIT:
or, for older versions...
SELECT DISTINCT x.name, x.classification FROM broken_table x JOIN
(
SELECT 1 n UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5
) cte
WHERE SUBSTRING_INDEX(SUBSTRING_INDEX(classification,',',n),',',-1) IN (4,9)
Let's just avoid the CSV altogether and fix your table design:
plant table
----------------------------
| name | classification |
| A | 1 |
| A | 4 |
| A | 7 |
| B | 2 |
| B | 3 |
| B | 7 |
| ... | ... |
Now with this design, you may use the following statement:
SELECT *
FROM plant
WHERE classification IN (?);
To the ? placeholder, you may bind your collection of values to match (e.g. (4,9)).
You want or so you can use regular expressions. If everything were one digit:
where classification regexp replace('4,9', ',', '|')
However, this would match 42 and 19, which I'm guessing you do not want. So, make this a little more complicated so you have comma delimiters:
where classification regexp concat('(,|^)', replace('4,9', ',', ',|,'), '(,|$)')

How to check if there are some recordings share the same name, but they have different types in mysql

There is a table with 97972561 rows (recordings) and 4 columns (attributes). The format looks like:
+------+-------------+-------------+-------------+
| PMID | SUBJECT_NAME| SUBJECT_TYPE| Sentence_ID |
+------+-------------+-------------+-------------+
I would like to check if there are some subjects share the same name with different types.
For example, there are three recordings in a table:
+------+-------------+-------------+-------------+
| PMID | SUBJECT_NAME| SUBJECT_TYPE| Sentence_ID |
+------+-------------+-------------+-------------+
| 1 | Bob | F | 1 |
+------+-------------+-------------+-------------+
| 2 | Bob | B | 2 |
+------+-------------+-------------+-------------+
| 3 | Bob | F | 3 |
+------+-------------+-------------+-------------+
I do not care about how many cases, just want to check if there are two recordings with the same subject_name, but different subject_type. Any help would be appreciated!
I would aggregate by subject name and then assert that the max and min types are different:
SELECT SUBJECT_NAME
FROM yourTable
GROUP BY SUBJECT_NAME
HAVING MIN(SUBJECT_TYPE) <> MAX(SUBJECT_TYPE);
Note the way I wrote the HAVING clause leaves it sargable, meaning that any index on SUBJECT_TYPE could potentially be used. The following index might speed up this query:
CREATE INDEX idx ON yourTable (SUBJECT_NAME, SUBJECT_TYPE);

Mysql split column into 3 columns for view in phpMyAdmin

I'm extremely new to writing SQL queries - I am hoping to create some charts in a front end application, but have to manipulate the data to create a view because the front end is not well suited to running complicated queries.
Here is my current situation:
I have a table that has client data as well as a date that record was created. Here is a sample not in any particular order.
| ID | post_date | post_title |
-------------------------------------------
| 1654 | 2017-09-04 | Bill Smith (5678)|
| 1658 | 2017-09-05 | Jan Jones (3423) |
| 1878 | 2017-08-17 | Jim Tanz (7890) |
| 1659 | 2017-09-06 | Jan Jones (3425) |
I would like to display unique values by last name, but at the moment all the names are in one column. The ID is unique as it is incremented for each record and the number in parentheses (transaction ID) appended to the last name is also unique and comes from another application we are pulling the name from.
I have been able to split the post_title column, but only into 2 columns but am left with FName and LastName (TrID), which doesn't allow me to pick distinct entries by last name to do a client count because the TrIDs are all different.
My intent was to create a view with 3 columns then display distinct entries by last name and count the clients, each month to see if there has been any client growth, but I am still at the very early step.
Any assistance would be greatly appreciated (and remembered forever :>)
Thanks!
Some text operations and it may work:
SELECT t.post_title
,LEFT(t.post_title, LOCATE(' ', post_title )) AS FName
,SUBSTR(t.post_title, LOCATE(' ', post_title)+1, LOCATE(' ',post_title,LOCATE(' ', post_title)+1)-LOCATE(' ', post_title)) AS LName
,REPLACE(REPLACE(TRIM(RIGHT(t.post_title,LOCATE(' ', REVERSE(post_title)))), '(', ''), ')','') AS ID
FROM (SELECT 'Bill Smith (5678)' AS post_title
UNION SELECT 'Jan Jones (3423)'
UNION SELECT 'Jim Tanz (7890)') t;
Rextester Demo
You can use SUBSTRING_INDEX to separate the string, so to retrieve the first name:
SUBSTRING_INDEX(post_title," ",1)
This gets everything up until the nth instance of the space, so it's a bit messier to get the last name, as when using '2' we will get the values up until the second space, then we need to then extract the second value (-1, as we go backwards). Therefore, getting the 'Last Name' is done using:
SUBSTRING_INDEX(SUBSTRING_INDEX(post_title," ",2)," ",-1)
Scenario 1: Splitting post_title into three fields:
SELECT
SUBSTRING_INDEX(post_title," ",1) as firstName,
SUBSTRING_INDEX(SUBSTRING_INDEX(post_title," ",2)," ",-1) as lastName,
SUBSTRING_INDEX(REPLACE(REPLACE(post_title,"(",""),")","")," ",-1) as post_ID
FROM tableName;
Output:
+-----------+----------+---------+
| firstName | lastName | post_ID |
---------------------------------+
| Bill | Smith | 5678 |
| Jan | Jones | 3423 |
| Jim | Tanz | 7890 |
| Jan | Jones | 3425 |
+-----------+----------+---------+
Scenario 2: Grouping functions
You could also use the named field to group and count by Last Name
SELECT
COUNT(*) as Qty,
SUBSTRING_INDEX(SUBSTRING_INDEX(post_title," ",2)," ",-1) as lastName
FROM tableName
GROUP BY lastName;
Output:
+-----+----------+
| Qty | lastName |
+-----+----------+
| 2 | Jones |
| 1 | Smith |
| 1 | Tanz |
+-----+----------+
And so on. Hard to tailor this any further, as I'm not fully sure what you're intending to do, but hopefully the above is of use.

A better way to search for tags in mysql table

Say I have a table and one of the columns is titled tags with data that is comma separated like this.
"tag1,tag2,new york,tag4"
As you can see, some of tags will have spaces.
Whats the best or most accurate way of querying the table for any tags that are equal to "new york"?
In the past I've used:
SELECT id WHERE find_in_set('new york',tags) <> 0
But find_in_set does not work when the value has a space.
I'm currently using this:
SELECT id WHERE concat(',',tags,',') LIKE concat(',%new york%,')
But I'm not sure if this is the best approach.
How would you do it?
When Item A can be associated with many of item B, and item B can be associated with many of item A. This is called Many to many relationship
Data with these relationship should be stored in separate table and join together only on query.
Examble
Table 1
| product_uid | price | amount |
| 1 | 12000 | 3000 |
| 2 | 30000 | 600 |
Table 2
| tag_uid | tag_value |
| 1 | tag_01 |
| 2 | tag_02 |
| 3 | tag_03 |
| 4 | tag_04 |
Then we use a join table to relate them
Table 3
| entry_uid | product_uid | tag_uid |
| 1 | 1 | 3 |
| 2 | 1 | 4 |
| 3 | 2 | 1 |
| 4 | 2 | 2 |
| 5 | 4 | 2 |
The query will be (If you want to select item one and the tag)
SELECT t1.*, t2.tag_value
FROM Table1 as t1,
JOIN Table3 as join_table ON t1.product_uid = join_table.product_uid
JOIN Table2 as t2 ON t2.tag_uid = join_table.tag_uid
WHERE t1.product_uid = 1
If I needed to ignore the leading spaces before and after the commas in tags.
For example, if tags had a value of:
'atlanta,boston , chicago, los angeles , new york '
and assuming spaces are the only character I want to ignore, and the tag I'm searching for doesn't have any leading or trailing spaces, then I'd likely use a regular expression. Something like this:
SELECT ...
FROM t
WHERE t.tags REGEXP CONCAT('^|, *', 'new york' ,' *,|$')
I recommend Bill Karwin's excellent book "SQL Antipatterns: Avoiding the Pitfalls of Database Programming"
https://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557
Chapter 2 Jaywalking covers the antipattern of comma separated lists.