MySQL Select all differences between 2 tables? - mysql

I have 3 tables, 'old', 'new' and a 'result' table (from a phonebook database), they have the same structure and nearly the same entries.
old:
ID | name | number | email | ...
----+--------------------+--------+-------+-----
1 | foo | 123 | ...
2 | bar | 456 |
3 | entrry with typo | 012345 |
4 | John Doe | 123345 |
new:
ID | name | number | email | ...
----+--------------------+--------+-------+-----
1 | foo | 123 | ...
2 | bar | 456 |
3 | entry without typo | 012345 |
4 | John Doe | 12345 |
5 | newly added entry | 09876 |
From this 'new' table I would like to select all rows that are different from the 'old' table, so the result would be:
result:
ID | name | number | email | ...
----+--------------------+--------+-------+-----
3 | entry without typo | 012345 | ...
4 | John Doe | 12345 |
5 | newly added entry | 09876 |
including all entries that have changed data plus all entries that don't appear in 'old' table...
Not only to make it more complicated, there are about 10 columns in those tables (including ID, name, number, email and several flags and other info).
Is there any most performant solution for doing this or will I have to compare each column with a new query..?

You'll have to do some comparison on the old records for correctness but I think this is the most straight forward solution.
Update I was a little confused about icluding all entries that have changed data plus all entries that don't appear in 'old' table... So I added the where and modified the join clause
insert into result (id, name, number, email, ...)
select new.id, new.name, new.number, new.email, ...
from new
LEFT JOIN old
ON new.ID = old.id
WHERE
old.ID is null
OR
( new.name <> old.name
or
new.number <> old.number
or
new.email <> new.email
...)

SELECT new.*
FROM new
JOIN old ON new.id = old.id
WHERE (CONCAT(new.ID,new.name,new.number,etc...) <> CONCAT(old.ID,old.name,old.number,etc...))
That should pull up any records in the new table where at least one its fields differs from the equivalent record in the old table.

Assuming the IDs must match up in order to make the comparisons legitimate:
select n.*
from new n
left join old o on o.id = n.id
where o.id is null
or not (
and o.name = n.name
and o.number = n.number
and o.email = n.email
and ...)
Note, this solution handles the case where some of the fields can be NULL. If you use (o.name <> n.name) instead of not (o.name = n.name) you won't correctly consider NULLs to be different from non-nulls.

Related

How to get all data from a table which you also query for AVG

Hello I need help on solving the query problem below.. Thanks in advance.
I have three tables
attachments
sun_individual
sun_reviews
I want to select user names, profession and reviewed by his/her ID from sun_individual and join with his profile photo from table attachments then get his/her reviews (each review and rate) and the average RATE
TABLE : sun_individual
id|sun_id|first_name|last_name |sun_profession|sun_verified_date|sun_verified|flg
---------------------------------------------------------------------------------
20|SV-001|Alex | James | Doctor |2017-12-08 | 1 |1
21|SV-002|Jane | Rose | Programmer |2017-12-08 | 1 |1
TABLE: sun_reviews
id|user_id|rev_txt |rev_rate|rev_date |flg
----------------------------------------------------
1 |20 | the best | 4 |2017-12-09|1
2 |21 | know CLI | 2 |2017-12-09|1
3 |20 | recommend| 3 |2017-12-09|0
4 |20 | so far | 3 |2017-12-09|1
TABLE: attachments
id|user|type |path |flg
----------------------------------------
88|20 |passport|/upload/img128.jpg|1
89|21 |passport|/upload/img008.jpg|1
flg:1 means the value is active, flg:0the value is to be ignored
My Code is :
SELECT
sun_reviews.rev_txt As txtReview, sun_reviews.rev_date As dateReview,
sun_reviews.rev_rate As rateReview,
AVG(sun_reviews.rev_rate) As avgREV,
concat(sun_individual.first_name, sun_individual.last_name) As name,
sun_individual.sun_profession As profession,
sun_individual.sun_verified_date As dateVerified,
CASE when sun_verified = 1 then 'VERIFIED' else 'EXPIRED' END As status,
attachments.path As photo
FROM `sun_individual`
LEFT JOIN sun_reviews ON sun_reviews.user_id = sun_individual.id
INNER JOIN attachments ON attachments.user = sun_individual.id
WHERE attachments.type = 'passport' AND attachments.flg = 1
AND sun_reviews.flg = 1 AND sun_individual.flg = 1
AND sun_individual.sun_id LIKE '%SV-001'
What I want to archive is when someone is looking for user (let say SV-001) when the code is inputted to get result like
result for: SV-001
txtReview|dateReview|rateReview|avgREV|name |profession | dateVerified | photo
------------------------------------------------------------------------- -----------------
the best |2017-12-09|4 |3.5000|Alex James| Doctor | 2017-12-08 |/upload/img128.jpg
so far |2017-12-09|3 |3.5000|Alex James| Doctor | 2017-12-08 |/upload/img128.jpg
I want to get result like the one above, however when I ran the query I get only one review
txtReview|dateReview|rateReview|avgREV|name |profession | dateVerified | photo
------------------------------------------------------------------------- -----------------
the best |2017-12-09|4 |3.5000|Alex James| Doctor | 2017-12-08 |/upload/img128.jpg
I think there is something am doing wrong... If you know the solution to my problem kindly help me.
Thanks.
select
a.rev_txt as txtReview,
a.rev_date as dateReview,
a.rev_rate as rateReview,
d.avgRev as avgRev,
b.first_name || b.last_name as name,
b.sun_profession as profession,
b.sun_verified_date as dateVerified,
case
when sun_verified = 1 then 'VERIFIED'
else 'EXPIRED'
end as status,
c.path as photo
from
sun_reviews a
join
sun_individual b
on a.user_id = b.id
join
attachments c
on c.user_id = b.id
join
(
select
user_id,
avg(rev_rate) as avgRev
from
sun_reviews
where
flg = 1
group by
user_id
) d
on d.user_id = b.id
where
c.type = 'passport' and
c.flg = 1 and
a.flg = 1 and
b.flg = 1 and
b.sun_id LIKE '%SV-001';
txtreview | datereview | ratereview | avgrev | name | profession | dateverified | status | photo
-----------+------------+------------+--------------------+-----------+------------+--------------+----------+--------------------
the best | 2017-12-09 | 4 | 3.5000000000000000 | AlexJames | Doctor | 2017-12-08 | VERIFIED | /upload/img128.jpg
so far | 2017-12-09 | 3 | 3.5000000000000000 | AlexJames | Doctor | 2017-12-08 | VERIFIED | /upload/img128.jpg
(2 rows)
When you have a group by, all columns in the result set must be either group by columns or aggregates. I guess you are using MySQL which does not enforce this. You should enforce it yourself though, otherwise it gets very confusing.

Removing duplicates based on one column, and keeping the row that has value in different column, and if there isn't any, keep lowest ID row

Using MySQL 5.7 on Google Cloud, I'm trying to deduplicate MySQL data based on an "EmailAddress" column, but some of the rows have a value in the "FullName" column and some of them don't. I want to keep the ones that have a value in the FullName column, but if none of the rows with that EmailAddress value a FullName value, then just keep the duplicate with the lowest ID number (first column - primary key).
I've finally broken it down into two separate queries, one to first remove the rows with no value in the FullName column IF there's another duplicate row that does have a value in the FullName column:
DELETE
FROM customer_info
WHERE id IN
(
SELECT *
FROM
(
SELECT c1.id
FROM customer_info c1
INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id!=c2.id
WHERE
(trim(c1.FullName)='' or c1.FullName is NULL)
and c2.FullName is not NULL
and length(trim(c2.FullName))!=0
) t
)
and another query to remove the rows with the bigger IDs where no value was found in the FullName column:
DELETE
FROM customer_info
WHERE id IN
(
SELECT *
FROM
(
SELECT c1.id
FROM customer_info c1
INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id>c2.id
) t
)
This "works", but not really. It worked one time when I left it running overnight for a smaller segment of the data, and when I woke up there was an error, but I looked at the data and it was complete.
Am I missing something in my query that's making it highly inefficient, or is it just par for the course for this type of query, and there's no optimization possible in my code that would make a tangible improvement? I've maxed out a Google Cloud SQL instance to their db-n1-highmem-32 size, with 32 GB of memory and 1000 GB of storage space, and it still chokes up and spits out a 2013 error after running for an hour. I need to do this for a total of a little over 3 million rows.
For example, this:
id | FullName | EmailAddress |
----------------------------------------------
1 | John Doe | john.doe#email.com |
2 | null | janedoe#box.com |
3 | null | billybob#bobby.com |
4 | null | john.doe#email.com |
5 | John Lennon | jlennon#yoohoo.com |
6 | null | james.smith#coolmail.com|
7 | null | billybob#bobby.com |
8 | Jane Doe | janedoe#box.com |
would result in this:
id | FullName | EmailAddress |
----------------------------------------------
1 | John Doe | john.doe#email.com |
3 | null | billybob#bobby.com |
5 | John Lennon | jlennon#yoohoo.com |
6 | null | james.smith#coolmail.com|
8 | Jane Doe | janedoe#box.com |
using exists() might be simpler in this situation
delete
from customer_info c
where (trim(c.FullName)='' or c.FullName is null)
and exists (
select 1
from customer_info i
where i.Email = c.EmailAddress
and trim(i.FullName)>''
)
delete
from customer_info c
where exists (
select 1
from customer_info i
where i.Email = c.EmailAddress
and i.id < c.id
)

Optimization SQL for getting data from two joined tables (usernames for user-from-id and user-to-id msgs from two tables)

I have table "msgs" with messages between users (their ids):
+--------+-------------+------------+---------+---------+
| msg_id |user_from_id | user_to_id | message | room_id |
+--------+-------------+------------+---------+---------+
| 1 | 1 | 4 |Hello! | 2 |
| 2 | 1 | 5 |Hi there | 1 |
| 3 | 2 | 1 |CU soon | 2 |
| 4 | 3 | 7 |nice... | 1 |
+--------+-------------+------------+---------+---------+
I also have two tables with users names.
Table: user1
+--------+----------+
|user_id |user_name |
+--------+----------+
| 5 | Ann |
| 6 | Sam |
| 7 | Michael |
+--------+----------+
Table: user2
+--------+----------+
|user_id |user_name |
+--------+----------+
| 1 | John |
| 2 | Alice |
| 3 | Tom |
| 4 | Jane |
+--------+----------+
I need to get usernames for two users IDs in every row. Every user-id can be in first or second table with usernames.
I wrote this SQL query:
SELECT DISTINCT
m.msg_id,
m.user_from_id,
CASE WHEN c1.user_name IS NULL THEN c3.user_name ELSE c1.user_name END AS from_name,
m.user_to_id,
CASE WHEN c2.user_name IS NULL THEN c4.user_name ELSE c2.user_name END AS to_name,
m.message
FROM msgs m
LEFT JOIN users1 c1 ON c1.user_id=m.user_from_id
LEFT JOIN users1 c2 ON c2.user_id=m.user_to_id
LEFT JOIN users2 c3 ON c3.user_id=m.user_from_id
LEFT JOIN users2 c4 ON c4.user_id=m.user_to_id
WHERE m.room_id=1
LIMIT 0, 8
It works.
Execute query to get raw data without usernames (without any join) tooks about ~0.1 sec. But it's enough to join only one usernames table (user1 or user2 only) to get this data in about ~6.2 sec. (with join one table). I have quite a lot rows in this tables: 35K rows in msgs, 0.5K in user1, 25K in user2.
Executing query with join two tables (with all this data) is impossible.
How to optimize this query? I just need usernames for user_ids in first "msgs" table.
There are potentially many differences between the queries with and without the joins. I am going to assume that the ids have the appropriate indexes -- primary keys automatically do. If not, then check that.
The obvious solution is to use the original query as a subquery:
SELECT m.msg_id, m.user_from_id,
(CASE WHEN c1.user_name IS NULL THEN c3.user_name ELSE c1.user_name
END) AS from_name,
m.user_to_id,
(CASE WHEN c2.user_name IS NULL THEN c4.user_name ELSE c2.user_name
END) AS to_name,
m.message
FROM (SELECT m.*
FROM msgs m
WHERE m.room_id = 1
LIMIT 0, 8
) m LEFT JOIN
users1 c1
ON c1.user_id = m.user_from_id LEFT JOIN
users1 c2
ON c2.user_id = m.user_to_id LEFT JOIN
users2 c3
ON c3.user_id = m.user_from_id LEFT JOIN
users2 c4
ON c4.user_id = m.user_to_id;
For most data structures, the distinct is also unnecessary.
This also makes (the reasonable assumption) that user_id is unique in the users tables.
Also, use of LIMIT without ORDER BY is highly discouraged. The particular rows you get are indeterminate and might change from one execution to the next.

Mysql - sql query to delete duplicate rows based on condition

I have a database table with nearly 1 million records - when I wrote a query to see how many of them are duplicates - there are close 90K records that are duplicates - By duplicate I mean records with the same email address - Like for one email address - there could be 10 records.
Sample data
ID | Name | Email | phone
1 | abc | abc#gmail.com | 12345
2 | def | def#gmail.com | 12533
3 | abc | abc#gmail.com |
4 | hij | hij#gmail.com | 50633
5 | abc | abc#gmail.com | 12345
6 | def | def#gmail.com |
1) ID is the autoincrement primary key of the table
2) If there are two records present like def#gmail.com - I need to keep the record that has the phone and delete the other record
3) Now incase of abc#gmail.com - there are 3 records - the one without phone gets deleted - now out of the remaining two - although both have all data - keep the first one and delete the second
Is it possible to write a delete statement based on a condition or is there an easier way to accomplish this.
A SQLfiddle to play around with - http://sqlfiddle.com/#!2/cf8c7
thank much
DELETE FROM phoney ph
WHERE ph.zphone IS NULL
AND EXISTS (SELECT *
FROM phoney ex
WHERE ex.zname = ph.zname
AND ex.zemail = ph.zemail
AND ex.zphone IS NOT NULL
);
DELETE FROM phoney ph
WHERE ph.zphone IS NOT NULL
AND EXISTS (SELECT *
FROM phoney ex
WHERE ex.zname = ph.zname
AND ex.zemail = ph.zemail
AND ex.id < ph.id
);
SELECT * FROM phoney;
RESULT:
DELETE 2
DELETE 1
id | zname | zemail | zphone
----+-------+---------------+--------
1 | abc | abc#gmail.com | 12345
2 | def | def#gmail.com | 12533
4 | hij | hij#gmail.com | 50633
NOTE: You could combine the two delete-queries, but that will result in a messy kludge of AND/OR conditions in the WHERE CLAUSE, which is very error-prone.
Try below query:
DELETE b.* FROM table1 a INNER JOIN table1 b ON a.name = b.name AND a.id < b.id

Generate unique username from first and last name?

I've got a bunch of users in my database and I want to reset all their usernames to the first letter of their first name, plus their full last name. As you can imagine, there are some dupes. In this scenario, I'd like to add a "2" or "3" or something to the end of the username. How would I write a query to generate a unique username like this?
UPDATE user
SET username=lower(concat(substring(first_name,1,1), last_name), UNIQUETHINGHERE)
CREATE TABLE bar LIKE foo;
INSERT INTO bar (id,user,first,last)
(SELECT f.id,CONCAT(SUBSTRING(f.first,1,1),f.last,
(SELECT COUNT(*) FROM foo f2
WHERE SUBSTRING(f2.first,1,1) = SUBSTRING(f.first,1,1)
AND f2.last = f.last AND f2.id <= f.id
)),f.first,f.last from foo f);
DROP TABLE foo;
RENAME TABLE bar TO foo;
This relies on a primary key id, so for each record inserted into bar, we only count duplicates found in foo with id less than bar.id.
Given foo:
select * from foo;
+----+------+--------+--------+
| id | user | first | last |
+----+------+--------+--------+
| 1 | aaa | Roger | Hill |
| 2 | bbb | Sally | Road |
| 3 | ccc | Fred | Mount |
| 4 | ddd | Darren | Meadow |
| 5 | eee | Sharon | Road |
+----+------+--------+--------+
The above INSERTs into bar, resulting in:
select * from bar;
+----+----------+--------+--------+
| id | user | first | last |
+----+----------+--------+--------+
| 1 | RHill1 | Roger | Hill |
| 2 | SRoad1 | Sally | Road |
| 3 | FMount1 | Fred | Mount |
| 4 | DMeadow1 | Darren | Meadow |
| 5 | SRoad2 | Sharon | Road |
+----+----------+--------+--------+
To remove the "1" from the end of user names,
INSERT INTO bar (id,user,first,last)
(SELECT f3.id,
CONCAT(
SUBSTRING(f3.first,1,1),
f3.last,
CASE f3.cnt WHEN 1 THEN '' ELSE f3.cnt END),
f3.first,
f3.last
FROM (
SELECT
f.id,
f.first,
f.last,
(
SELECT COUNT(*)
FROM foo f2
WHERE SUBSTRING(f2.first,1,1) = SUBSTRING(f.first,1,1)
AND f2.last = f.last AND f2.id <= f.id
) as cnt
FROM foo f) f3)
As a two-parter:
SELECT max(username)
FROM user
WHERE username LIKE concat(lower(concat(substring(first_name,1,1),lastname), '%')
to retrieve the "highest" username for that name combo. Extract the numeric suffix, increment it, then insert back into the database for your new user.
This is racy, of course. Two users with the same first/last names might stomp on each other's usernames, depending on how things work out. You'd definitely want to sprinkle some transaction/locking onto the queries to make sure you don't have any users conflicting.
Nevermind.... I just found the dupes:
select LOWER(CONCAT(SUBSTRING(first_name,1,1),last_name)) as new_login,count(* ) as cnt from wx_user group by new_login having count(* )>1;
And set those ones manually. Was only a handful.
Inspired in the answer of unutbu: there is no need to create an extra table neither several queries:
UPDATE USER a
LEFT JOIN (
SELECT USR_ID,
REPLACE(
CONCAT(
SUBSTRING(f.`USR_FIRSTNAME`,1,1),
f.`USR_LASTNAME`,
(
(SELECT IF(COUNT(*) > 1, COUNT(*), '')
FROM USER f2
WHERE SUBSTRING(f2.`USR_FIRSTNAME`,1,1) =
SUBSTRING(f.`USR_FIRSTNAME`,1,1)
AND f2.`USR_LASTNAME` = f.`USR_LASTNAME`
AND f2.`USR_ID` <= f.`USR_ID`)
)
),
' ',
'') as login
FROM USER f) b
ON a.USR_ID = b.USR_ID
SET a.USR_NICKNAME = b.login