Removing duplicate rows in MySQL by merging info - mysql

I have a large table with person info. Every record has an ID and is referenced by other tables. I noticed that a lot of records have duplicate keys, but they vary in the amount of information in the other fields. I'd like to merge the info in various fields into one and make that the 'master' record and all references to the other records need to be replaced with the master record.
An example
| id | key1 | key2 | name | city | dob |
|--- | ---- | ---- | ---- | ---- | -------- |
| 1 | 1 | 2 | John | | |
| 2 | 1 | 2 | | Town | |
| 3 | 1 | 2 | John | | 70/09/12 |
I need to end up with a single record (id is either 1, 2 or 3) with values
key1 = 1, key2 = 2, name = John, city = Town, dob = 70/09/12.
Is there a clever way to merge these records without testing for every field (my actual table has a lot of fields)?

You can use MAX() to get the non-empty values for each key.
SELECT key1, key2, MAX(id) AS id, MAX(name) AS name, MAX(city) AS city, MAX(dob) AS dob
FROM yourTable
GROUP BY key1, key2
If there can be different values between rows, and you don't want to include them, you can add:
HAVING COUNT(DISTINCT NULLIF(name, ''), NULLIF(city, ''), NULLIF(dob, '')) = 1

Related

How to calculate count of each value in MySQL JSON array?

I have a MySQL table with the following definition:
mysql> desc person;
+--------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| name | text | YES | | NULL | |
| fruits | json | YES | | NULL | |
+--------+---------+------+-----+---------+-------+
The table has some sample data as follows:
mysql> select * from person;
+----+------+----------------------------------+
| id | name | fruits |
+----+------+----------------------------------+
| 1 | Tom | ["apple", "orange"] |
| 2 | John | ["apple", "mango"] |
| 3 | Tony | ["apple", "mango", "strawberry"] |
+----+------+----------------------------------+
How can I calculate the total number of occurrences for each fruit? For example:
+------------+-------+
| fruit | count |
+------------+-------+
| apple | 3 |
| orange | 1 |
| mango | 2 |
| strawberry | 1 |
+------------+-------+
Some research shows that the JSON_LENGTH function can be used but I cannot find an example similar to my scenario.
You can use JSON_EXTRACT() function to extract each value ("apple", "mango", "strawberry" and "orange") of all three components of the arrays, and then then apply UNION ALL to combine all such queries:
SELECT comp, count(*)
FROM
(
SELECT JSON_EXTRACT(fruit, '$[0]') as comp FROM person UNION ALL
SELECT JSON_EXTRACT(fruit, '$[1]') as comp FROM person UNION ALL
SELECT JSON_EXTRACT(fruit, '$[2]') as comp FROM person
) q
WHERE comp is not null
GROUP BY comp
Indeed If your DB's version is 8, then you can also use JSON_TABLE() function :
SELECT j.fruit, count(*)
FROM person p
JOIN JSON_TABLE(
p.fruits,
'$[*]' columns (fruit varchar(50) path '$')
) j
GROUP BY j.fruit;
Demo
You can't do it without first creating a table with one row per fruit.
CREATE TABLE allfruits (fruit VARCHAR(10) PRIMARY KEY);
INSERT INTO allfruits VALUES ('apple'), ('orange'), ('mango'), ('strawberry');
There is not a good way to generate this from the JSON.
Once you have that table, you can join it to the JSON and then use GROUP BY to count the occurrences.
SELECT fruit, COUNT(*) AS count
FROM allfruits
JOIN person ON JSON_SEARCH(person.fruits, 'one', fruit) IS NOT NULL
GROUP BY fruit;
Output:
+------------+-------+
| fruit | count |
+------------+-------+
| apple | 3 |
| mango | 2 |
| orange | 1 |
| strawberry | 1 |
+------------+-------+
Note that it will do a table-scan on the person table to find each fruit. This is pretty inefficient, and as your person table gets larger, it will become a performance problem.
If you want to optimize for this type of query, then you shouldn't use JSON to store an array of fruits. You should store data in a normalized way, representing the many-to-many relationship between persons and fruits with another table.
This is related to my answer to Is storing a delimited list in a database column really that bad?
I think the simplest solution would be to use JSON_TABLE function.
The query you need is
select ft.fruit, count(ft.fruit) from person,
json_table(
fruits,
'$[*]' columns(
fruit varchar(128) path '$'
)
) as ft
group by ft.fruit
;
You can find working example in this dbfiddle
Fruit demo

Insert concatenated values from multiple rows as new row in MySQL query

I currently have a table with columns for "field_type" (username, city, state), "user_id", and "value". The user_id column obviously has lots of repeats. I'd like to merge the city and state data into a single "location" field_type value. I need something that will:
for every integer in the user_id column:
-check if there exist corresponding (not null) table rows for field_type "city" and "state"
-if yes, insert a new row into the table with field_type "location" which concatenates the corresponding city and state values for that user_id
I haven't worked with MySQL much so I don't really know where to start. I've tried to simplify the problem a bit - it's actually a somewhat more complicated wordpress table and I'm trying to reformat the data to be compatible with a new plugin, but this covers the basics of what has to happen so I should hopefully be able to extrapolate an actual solution from the answers. Thanks for any pointers!
Edit: Current structure looks like this:
|-id (key)-|- field_type -|- user_id -|- value -|
| 1 | username | 1 | Joe |
| 2 | city | 1 | Albany |
| 3 | state | 1 | NY |
| 4 | username | 2 | Bob |
| 5 | city | 2 | Toledo |
| 6 | state | 2 | OH |
And I would like to get something like this:
|-id (key)-|- field_type -|- user_id -|- value ---------|
| 1 | username | 1 | Joe |
| 2 | city | 1 | Albany |
| 3 | state | 1 | NY |
| 4 | username | 2 | Bob |
| 5 | city | 2 | Toledo |
| 6 | state | 2 | OH |
| 7 | location | 1 | Albany, NY |
| 8 | location | 2 | Toledo, OH |
Duplicate user_id values are how it's supposed to work, so they don't need to be removed.
You could use an INSERT ... SELECT query to do this:
INSERT INTO yourtable
SELECT 'location' AS field_type, t1.user_id, CONCAT(t1.value, ' ', t2.value) AS value
FROM yourtable t1
JOIN yourtable t2 ON t2.user_id = t1.user_id AND t1.field_type = 'city' AND t2.field_type = 'state'
WHERE t1.value IS NOT NULL AND t2.value IS NOT NULL
Demo on dbfiddle
Since you are mentioned that there are duplicates with the same user_id.
I don't think inserting a new row will be a good idea.
So have written an update query to update the existing data.
You can later cleanup the duplciates.
UPDATE your_table SET locations = CONCAT_WS('-', state, city) where city is not null or state is not null

Mysql split column into 3 columns for view in phpMyAdmin

I'm extremely new to writing SQL queries - I am hoping to create some charts in a front end application, but have to manipulate the data to create a view because the front end is not well suited to running complicated queries.
Here is my current situation:
I have a table that has client data as well as a date that record was created. Here is a sample not in any particular order.
| ID | post_date | post_title |
-------------------------------------------
| 1654 | 2017-09-04 | Bill Smith (5678)|
| 1658 | 2017-09-05 | Jan Jones (3423) |
| 1878 | 2017-08-17 | Jim Tanz (7890) |
| 1659 | 2017-09-06 | Jan Jones (3425) |
I would like to display unique values by last name, but at the moment all the names are in one column. The ID is unique as it is incremented for each record and the number in parentheses (transaction ID) appended to the last name is also unique and comes from another application we are pulling the name from.
I have been able to split the post_title column, but only into 2 columns but am left with FName and LastName (TrID), which doesn't allow me to pick distinct entries by last name to do a client count because the TrIDs are all different.
My intent was to create a view with 3 columns then display distinct entries by last name and count the clients, each month to see if there has been any client growth, but I am still at the very early step.
Any assistance would be greatly appreciated (and remembered forever :>)
Thanks!
Some text operations and it may work:
SELECT t.post_title
,LEFT(t.post_title, LOCATE(' ', post_title )) AS FName
,SUBSTR(t.post_title, LOCATE(' ', post_title)+1, LOCATE(' ',post_title,LOCATE(' ', post_title)+1)-LOCATE(' ', post_title)) AS LName
,REPLACE(REPLACE(TRIM(RIGHT(t.post_title,LOCATE(' ', REVERSE(post_title)))), '(', ''), ')','') AS ID
FROM (SELECT 'Bill Smith (5678)' AS post_title
UNION SELECT 'Jan Jones (3423)'
UNION SELECT 'Jim Tanz (7890)') t;
Rextester Demo
You can use SUBSTRING_INDEX to separate the string, so to retrieve the first name:
SUBSTRING_INDEX(post_title," ",1)
This gets everything up until the nth instance of the space, so it's a bit messier to get the last name, as when using '2' we will get the values up until the second space, then we need to then extract the second value (-1, as we go backwards). Therefore, getting the 'Last Name' is done using:
SUBSTRING_INDEX(SUBSTRING_INDEX(post_title," ",2)," ",-1)
Scenario 1: Splitting post_title into three fields:
SELECT
SUBSTRING_INDEX(post_title," ",1) as firstName,
SUBSTRING_INDEX(SUBSTRING_INDEX(post_title," ",2)," ",-1) as lastName,
SUBSTRING_INDEX(REPLACE(REPLACE(post_title,"(",""),")","")," ",-1) as post_ID
FROM tableName;
Output:
+-----------+----------+---------+
| firstName | lastName | post_ID |
---------------------------------+
| Bill | Smith | 5678 |
| Jan | Jones | 3423 |
| Jim | Tanz | 7890 |
| Jan | Jones | 3425 |
+-----------+----------+---------+
Scenario 2: Grouping functions
You could also use the named field to group and count by Last Name
SELECT
COUNT(*) as Qty,
SUBSTRING_INDEX(SUBSTRING_INDEX(post_title," ",2)," ",-1) as lastName
FROM tableName
GROUP BY lastName;
Output:
+-----+----------+
| Qty | lastName |
+-----+----------+
| 2 | Jones |
| 1 | Smith |
| 1 | Tanz |
+-----+----------+
And so on. Hard to tailor this any further, as I'm not fully sure what you're intending to do, but hopefully the above is of use.

SQL select statement optimizing (id, parent_id, child_ids)

we have a very old custom db (oracle, mysql, derby) with the restrictions: no new table fileds, no views, no functions, no procedures.
My table MYTABLE:
| id | ... | parent_id |
------------------------
| 1 | ... | |
| 2 | ... | 1 |
| 3 | ... | 1 |
| 4 | ... | 2 |
| 5 | ... | 1 |
and I my first statement:
select * from MYTABLE where id in ('1','2','3','4','5');
give my 5 records.
Then I need the information about the first (no deeper) child ids.
My current solution:
for (record in records) {
// get child ids as comma separated string list
// e.g. "2,3,5" for id 1
String childIds = getChildIds(record.id);
}
with the second statement in getChildIds(record.Id):
select id from MYTABLE where parent_id='record.Id';
So I have 1 + 5 = 6 statements for the required information.
I'm looking for a solution to select the records from the following "imaginary" table with the "imaginary" field "child_ids":
| id | ... | parent_id | child_ids |
------------------------------------
| 1 | ... | | 2,3,5 |
| 2 | ... | 1 | 4 |
| 3 | ... | 1 | |
| 4 | ... | 2 | |
| 5 | ... | 1 | |
Does anyone have an idea how I can get this information with only one statement (or with 2 statements)?
Thanks for your help, Thomas
FOR MYSQL:
How about using the GROUP_CONCAT() function like the following:
SELECT id, parent_id,
GROUP_CONCAT(child_id ORDER BY child_id SEPARATOR ',') AS child_ids
FROM MYTABLE
WHERE id IN ('1','2','3','4','5')
FOR ORACLE:
If you have a later version of Oracle you could use the LISTAGG() function:
SELECT parent_id,
LISTAGG(child_id, ', ') WITHIN GROUP (ORDER BY child_id) "child_ids"
FROM MYTABLE
WHERE id IN ('1','2','3','4','5')
GROUP BY parent_id
FOR DERBY:
I don't know anything about derby, but doing a little research it uses IBM DB2 SQL syntax. So, maybe using a combination of XMLSERIALIZE(), XMLAGG(), and XMLTEXT() will work for you:
SELECT parent_id,
XMLSERIALIZE(XMLAGG(XMLTEXT(child_id) ORDER BY child_id) AS CLOB(30K))
FROM table GROUP BY parent_id

Merge rows into one TSQL

ID | First Name | Last Name |
-----------------------------
1 | Test | NULL |
2 | Test | ABC1 |
I need to merge these two rows into one to display as so the null in 'last name' will be replaced by the text in the second column, Grouping by the first name.
ID | First Name | Last Name |
-----------------------------
1 | Test | ABC1|
try this:
select min (id),First_Name,MAX(Last_Name)
from your_table
group by First_Name