Insert into a table with a select statement in mysql - mysql

I have a table that has training history that has been modified by many different users over the years. This has cause the same training record to be entered twice. I want to create a table that replicates the main table and insert all duplicate records.
What constitutes a duplicate record is if the employee_id, course_code, and completion_date all match.
I can create the duplicate table and I have a select statement that appears to pull duplicates, but it pulls only one of them and I need it to pull both (or more) of them. This is because one person may have entered the training record with a different course name but the id, code, and date are the same so it is a duplicate entry. So by pulling all the duplicates I can validate that that is the case.
Here is my SELECT statement:
SELECT *
FROM
training_table p1
JOIN
training_table p2 ON (
p1.employee_id = p2.employee_id
AND p1.course_code = p2.course_code
AND p1.completion.date = p2.completion_date)
GROUP BY p1.ssn;
The query runs and returns what appear to be unique rows. I would like all of the duplicates. And whenever I try to INSERT it into an identical table I get an error stating my column count doesn't match my value count.
Any help would be great.

This will select any duplicate rows for insertion into your new table.
SELECT p1.*
FROM training_table p1
JOIN
(SELECT employee_id, course_code, completion_date
FROM training_table
GROUP BY employee_id, course_code, completion_date
HAVING COUNT(*) > 1
) dups
ON p1.employee_id = dups.employee_id
AND p1.course_code = dups.course_code
AND p1.completion_date = dups.completion_date
;

Try to use CROSS JOIN (Cartesian Product Join) instead JOIN only. For insert try INSERT INTO TABLE (column1, column2, column3) SELECT column1, column2, column3 FROM TABLE; in same order.

Thanks for the help. I had discovered the answer shortly after I posted the question (even though I had looked for the answer for over an hour :) ) Here is what I used:
SELECT *
FROM training_table mto
WHERE EXISTS
(
SELECT 1
FROM training_table mti
WHERE mti.employee_id = mto.employee_ie
AND mti.course_code = mto.course_code
AND mti.completion_date = mto.completion_date
LIMIT 1, 1
)
I just added the INSERT statement and it worked.
Thanks.

Related

Compare data for two versions of the same MySQL database table

I have two MySQL tables that have the exact same structure and mostly the same data. Some of the rows would be different between the two because my client updated the old website instead of the new website. There are hundreds of records and a column is not in place for the last modified date. I have created a new database on localhost and imported the old and new tables. All of the rows of data will need to be compared and differences between the old and new databases will need to be returned. Once the differences are identified, would there be a way to easily migrate the updated data from the old table to the new table? I am a MySQL novice, but I can usually muddle my way through issues. Thanks in advance for your assistance.
I have been looking at the following code, but I am not sure if it is the best answer.
SELECT *,'table_1' AS o FROM table_1
UNION
SELECT *,'table_2' AS o FROM table_2
WHERE some_id IN (
SELECT some_id
FROM (
SELECT * FROM table_1
UNION
SELECT * FROM table_2
) AS x
GROUP BY some_id
HAVING COUNT(*) > 1
)
ORDER BY some_id, o;
This should do the trick. You are finding the primary keys for all rows where the every value is the same across both tables in the subselect used in the where clause. You then exclude rows with those primary keys from the unioned result set. Now how you go about reconciling the differences is a totally different story :)
SELECT * FROM (
SELECT *, 'table 1' FROM table_1
UNION ALL
SELECT *, 'table 2' FROM table_2
) AS combined
WHERE combined.primary_key_field
NOT IN (
SELECT t1.primary_key_field
FROM table_1 AS t1
INNER JOIN table_2 AS t2
ON t1.primary_key_field = t2.primary_key_field
AND t1.some_other_field = t2.some_other_field
AND ... /* join on all fields in tables */
)
A insert into select single query will do.
insert into table_new
select * from table_old
where some_id NOT IN (select some_id from table_new)

Insert duplicate rows for a new id based on result of another query

So I'll do my best to describe the query that I'm trying to build.
I have a table I'll call user_records that has some data (several rows) for a relational ID, say userId (from a table users). For each row, I need to duplicate each row for another user. I know I could run this:
INSERT INTO user_records (userId, column1, column2, ...)
SELECT 10 as userId, column1, column2...
FROM user_records
WHERE userId = 1
This will copy the existing rows for userId 1 to userId 10.
But I want to run this for all userIds that are active and don't already exists in this table. So I want to basically execute this query first:
SELECT userID
FROM users
WHERE users.active = 1
AND NOT EXISTS (
SELECT * FROM user_records
WHERE users.userId = user_records.userId)
Using JOINS or simply combining the 2 queries, can I run this query and replace the 10 in the former query so that it duplicates the rows for a series of userIds?
Thanks in advance!
One way is to create a CROSS JOIN:
insert into user_records (userId, column1)
select u.userId, ur.column1
from user_records ur
cross join users u
where ur.userId = 1
and u.active = 1
and u.userId not in (select userId from user_records);
SQL Fiddle Demo
This will insert new rows into user_records for each userId that doesn't exist in the table, copying the data from UserId 1.

Copy rows if value exists x amount of times

I have two tables Board1 and Board2 with the identical structure. They both have a primary index column of id. I have a THIRD table called Table1, which has a non-indexed column board_id, where the same board_id occurs multiple times. board_id always corresponds to an id in Board1. Board2 is currently empty, and I want to add rows from Board1, but only where the same board_id occurs at least six times in Table1. Table1 will be changing periodically, so I'll be needing to do the query in the future, but without doubling id rows which are already in Board2.
So to recap:
There are three tables: Board1, Board2, and Table1. I want to copy rows from Board1 to Board2, but only where the id in the Board1 occurs (at least) six times in Table1 as `board_id'.
I'd appreciate any help!
EDIT: I'm dreadfully sorry, but I realized I made a huge mistake in my question. I've rewritten it to reflect what I actually needed. I'm truly sorry.
You can do it like this
INSERT INTO Table2
SELECT
id,
board_id
FROM (SELECT
b.id,
b.board_id,
bl.Count
FROM board as b
LEFT JOIN (SELECT
board_id,
COUNT(board_id) as `Count`
FROM board
GROUP BY board_id) as bl
on bl.board_id = b.board_id
group by b.id
having bl.Count >= 6) as L
If you need more columns you can select them in inner and outer queries.
Fiddle Demo for Select
Here is what you asked for, with fiddle
INSERT Table2
SELECT
*
FROM
Table1
JOIN
(
SELECT
Board_Id,
count(*) cnt
FROM
Table1
GROUP BY
Board_Id
) BoardIds
ON BoardIds.Board_Id = Table1.Board_Id
WHERE
BoardIds.cnt > 5
AND
NOT EXISTS (SELECT id FROM Table2 WHERE Table2.id = Table1.id)
Try something like the below:
Add your column names where specified (excluding any ID columns), as I'm assuming each row will have a unique ID, so you won't be able to GROUP and COUNT by doing SELECT * FROM Table1
You may need to test / validate this
INSERT INTO Board2 (Your Column Names)
SELECT (Your Column Names)
FROM Board1
WHERE id (IN (SELECT board_id
FROM Table1
GROUP BY (board_id)
HAVING (COUNT(*) >= 6))
AND board_id NOT IN(SELECT DISTINCT board_id FROM Board2)

Check if multiple records match a set of values

Is there a way to write a single query to check if a set of rows matches a set of values? I have one row per set of values that I need to match and I'd like to know if all rows are matched or not. I could perform this via multiple queries such as:
select * from tableName where (value1, value2) = ('someValue1', 'someValue2')
select * from tableName where (value1, value2) = ('someOtherValue1', 'someOtherValue2')
...and so on, up to an arbitrary number of queries. How could this sort of thing be re-written as a single query where the query returns ONLY if all values are matched?
You could try something like:
select t.*
from tableName t
join (select 'someValue1' value1, 'someValue2' value2 union all
select 'someOtherValue1', 'someOtherValue2') v
on t.value1 = v.value1 and t.value2 = v.value2
where 2=
(select count(distinct concat(v1.value1, v1.value2))
from (select 'someValue1' value1, 'someValue2' value2 union all
select 'someOtherValue1', 'someOtherValue2') v1
join tableName t1
on t1.value1 = v1.value1 and t1.value2 = v1.value2)
If you have a large number of value pairs that you want to check, it may be easier to insert them into a temporary table and use the temporary table in the above query, instead of two separate hard-coded virtual tables.
What about:
SELECT *
FROM tableName
WHERE value1 IN ('someValue1', 'someOtherValue1') AND
value2 IN ('someValue2', 'someOtherValue2')
Match if exactly two records found
Select students who got q13 wrong and Q14 right
SELECT qa.StudentID FROM questionAnswer qa, Student s
WHERE qa.StudentID=s.StudentID AND
((QuestionID=13 AND Pass=0) OR (QuestionID=14 AND Pass=1))
GROUP BY qa.StudentID
HAVING COUNT(*)=2;
The Where clause matches any records where q14 is correct and q13 is incorrect
We then group by the StudentID
The having requires there to be two records

How to remove duplicate entries from a mysql db?

I have a table with some ids + titles. I want to make the title column unique, but it has over 600k records already, some of which are duplicates (sometimes several dozen times over).
How do I remove all duplicates, except one, so I can add a UNIQUE key to the title column after?
This command adds a unique key, and drops all rows that generate errors (due to the unique key). This removes duplicates.
ALTER IGNORE TABLE table ADD UNIQUE KEY idx1(title);
Edit: Note that this command may not work for InnoDB tables for some versions of MySQL. See this post for a workaround. (Thanks to "an anonymous user" for this information.)
Create a new table with just the distinct rows of the original table. There may be other ways but I find this the cleanest.
CREATE TABLE tmp_table AS SELECT DISTINCT [....] FROM main_table
More specifically:
The faster way is to insert distinct rows into a temporary table. Using delete, it took me a few hours to remove duplicates from a table of 8 million rows. Using insert and distinct, it took just 13 minutes.
CREATE TABLE tempTableName LIKE tableName;
CREATE INDEX ix_all_id ON tableName(cellId,attributeId,entityRowId,value);
INSERT INTO tempTableName(cellId,attributeId,entityRowId,value) SELECT DISTINCT cellId,attributeId,entityRowId,value FROM tableName;
DROP TABLE tableName;
INSERT tableName SELECT * FROM tempTableName;
DROP TABLE tempTableName;
Since the MySql ALTER IGNORE TABLE has been deprecated, you need to actually delete the duplicate date before adding an index.
First write a query that finds all the duplicates. Here I'm assuming that email is the field that contains duplicates.
SELECT
s1.email
s1.id,
s1.created
s2.id,
s2.created
FROM
student AS s1
INNER JOIN
student AS s2
WHERE
/* Emails are the same */
s1.email = s2.email AND
/* DON'T select both accounts,
only select the one created later.
The serial id could also be used here */
s2.created > s1.created
;
Next select only the unique duplicate ids:
SELECT
DISTINCT s2.id
FROM
student AS s1
INNER JOIN
student AS s2
WHERE
s1.email = s2.email AND
s2.created > s1.created
;
Once you are sure that only contains the duplicate ids you want to delete, run the delete. You have to add (SELECT * FROM tblname) so that MySql doesn't complain.
DELETE FROM
student
WHERE
id
IN (
SELECT
DISTINCT s2.id
FROM
(SELECT * FROM student) AS s1
INNER JOIN
(SELECT * FROM student) AS s2
WHERE
s1.email = s2.email AND
s2.created > s1.created
);
Then create the unique index:
ALTER TABLE
student
ADD UNIQUE INDEX
idx_student_unique_email(email)
;
Below query can be used to delete all the duplicate except the one row with lowest "id" field value
DELETE t1 FROM table_name t1, table_name t2 WHERE t1.id > t2.id AND t1.name = t2.name
In the similar way, we can keep the row with the highest value in 'id' as follows
DELETE t1 FROM table_name t1, table_name t2 WHERE t1.id < t2.id AND t1.name = t2.name
This shows how to do it in SQL2000. I'm not completely familiar with MySQL syntax but I'm sure there's something comparable
create table #titles (iid int identity (1, 1), title varchar(200))
-- Repeat this step many times to create duplicates
insert into #titles(title) values ('bob')
insert into #titles(title) values ('bob1')
insert into #titles(title) values ('bob2')
insert into #titles(title) values ('bob3')
insert into #titles(title) values ('bob4')
DELETE T FROM
#titles T left join
(
select title, min(iid) as minid from #titles group by title
) D on T.title = D.title and T.iid = D.minid
WHERE D.minid is null
Select * FROM #titles
delete from student where id in (
SELECT distinct(s1.`student_id`) from student as s1 inner join student as s2
where s1.`sex` = s2.`sex` and
s1.`student_id` > s2.`student_id` and
s1.`sex` = 'M'
ORDER BY `s1`.`student_id` ASC
)
The solution posted by Nitin seems to be the most elegant / logical one.
However it has one issue:
ERROR 1093 (HY000): You can't specify target table 'student' for
update in FROM clause
This can however be resolved by using (SELECT * FROM student) instead of student:
DELETE FROM student WHERE id IN (
SELECT distinct(s1.`student_id`) FROM (SELECT * FROM student) AS s1 INNER JOIN (SELECT * FROM student) AS s2
WHERE s1.`sex` = s2.`sex` AND
s1.`student_id` > s2.`student_id` AND
s1.`sex` = 'M'
ORDER BY `s1`.`student_id` ASC
)
Give your +1's to Nitin for coming up with the original solution.
Deleting duplicates on MySQL tables is a common issue, that usually comes with specific needs. In case anyone is interested, here (Remove duplicate rows in MySQL) I explain how to use a temporary table to delete MySQL duplicates in a reliable and fast way (with examples for different use cases).
In this case, something like this should work:
-- create a new temporary table
CREATE TABLE tmp_table1 LIKE table1;
-- add a unique constraint
ALTER TABLE tmp_table1 ADD UNIQUE(id, title);
-- scan over the table to insert entries
INSERT IGNORE INTO tmp_table1 SELECT * FROM table1 ORDER BY sid;
-- rename tables
RENAME TABLE table1 TO backup_table1, tmp_table1 TO table1;