Let's say I have created the following dimension table:
create table schema1.DOMAIN (
ID INT AUTO_INCREMENT PRIMARY KEY NOT NULL,
DOMAIN_NAME VARCHAR(10)
);
And I have a table of logs with records where DOMAIN_NAME is a column. My goal here is to write an insert statement that will populate this dimension table with values for DOMAIN_NAME, but only when they don't already exist. For example:
INSERT INTO schema1.DOMAIN (ID, DOMAIN_NAME)
select distinct DOMAIN_NAME from LOGS l where not exists (select 1 from schema1.DOMAIN d where d.domain_name = l.domain_name);
I haven't actually run this on a MySQL db yet, but I have the following questions:
Notice I didn't supply a value for the ID column in schema1.DOMAIN for the insert. Does this matter? If it's not supplied, will it simply auto-increment the primary key? Or will it throw an error? Is there a way to avoid supplying this ID and have it auto-increment automatically? This is the desired behavior for me. What is the best way to do this?
Is there a more performant way to do this?
I want this to work whether schema1.DOMAIN is empty or already has records and we are dumping parsing a log for a new value. Are these two objectives not compatible.
1.Notice I didn't supply a value for the ID column in schema1.DOMAIN for the insert. Does this matter? If it's not supplied, will it simply auto-increment the primary key? Or will it throw an error? Is there a way to avoid supplying this ID and have it auto-increment automatically? This is the desired behavior for me. What is the best way to do this?
Ans.
INSERT INTO schema1.DOMAIN (DOMAIN_NAME)
select distinct DOMAIN_NAME from LOGS l where not exists (select 1 from schema1.DOMAIN d where d.domain_name = l.domain_name);
2.Is there a more performant way to do this?
Ans. Left outer join would perform better
3.I want this to work whether schema1.DOMAIN is empty or already has records and we are dumping parsing a log for a new value. Are these two objectives not compatible.
Ans. Seems compatible
The query you wanted to write - I just removed id from the list of columns for insert: it will auto-increment automatically for every insert:
insert into schema1.domain (domain_name)
select distinct domain_name
from logs l
where not exists (select 1 from schema1.domain d where d.domain_name = l.domain_name);
You could also use the insert ... on duplicate key syntax. This requires defining a unique constraint on the domain column:
create table schema1.domain (
id int auto_increment primary key not null,
domain_name varchar(10) unique
);
Then you can do:
insert into schema1.domain (domain_name)
select distinct domain_name from logs l
on duplicate key update domain = values(domain)
When a domain that already exists in the table is met, the query goes to the on duplicate key clause, where a dummy operation is performed.
Related
There are a few similar questions on here. None provide a solution. I would like to INSERT a NEW record into table B, but only if a foreign key exists in table A. To be clear, I do not wish to insert the result of a select. I just need to know that the foreign key exists.
INSERT INTO tableB (tableA_ID,code,notes,created) VALUES ('24','1','test',NOW())
SELECT tableA_ID FROM tableA WHERE tableA_ID='24' AND owner_ID='9'
Clearly, the above does not work. But is this even possible? I want to insert the NEW data into tableB, only if the record for the row in tableA exists and belongs to owner_ID.
The queries I have seen so far relate to INSERTING the results from the SELECT query - I do not wish to do that.
Try this:
INSERT INTO tableB (tableA_ID,code,notes,created)
SELECT id, code, notes, created
FROM ( SELECT '24' as id, '1' as code, 'test' as notes, NOW() as created) t
WHERE EXISTS
(
SELECT tableA_ID
FROM tableA
WHERE tableA_ID='24' AND owner_ID='9'
)
I know it's a pretty much old answered question but it's highly ranked now in google search results and I think some addition may help someone in the future.
In some DB configuration, you may want to insert a row in a table that have two or more foreign keys. Let's say we have four tables in a chat application :
Users, Threads, Thread_Users and Messages
If we want a User to join a Thread we'll want to insert a row in Thread_Users in wich have two foreign keys : user_id, thread_id.
Then, we can use a query like this, to insert if both foreign keys exists, and silently fail otherwise :
INSERT INTO `thread_users` (thread_id,user_id,status,creation_date)
SELECT 2,3,'pending',1601465161690 FROM (SELECT 1 as nb_threads, 1 as nb_users) as tmp
WHERE tmp.nb_threads = (SELECT count(*) FROM `threads` WHERE threads.id = 2)
AND tmp.nb_users = (SELECT count(*) FROM `users` WHERE users.id = 3)
It's a little verbose but it does the job pretty well.
Application-side, we just have to raise an error if affectedRows = 0 and maybe trying to see which of the keys doesn'nt exists. IMHO, it's a better way to do the job than to execute two SELECT queries and THEN execute the INSERT especially when an inexistent foreign key probability is very low.
I have a table like this:
uuid | username | first_seen | last_seen | score
Before, the table used the primary key of a "player_id" column that ascended. I removed this player_id as I no longer needed it. I want to make the 'uuid' the primary key, but there's a lot of duplicates. I want to remove all these duplicates from the table, but keep the first one (based off the row number, the first row stays).
How can I do this? I've searched up everywhere, but they all show how to do it if you have a row ID column...
I highly advocate having auto-incremented integer primary keys. So, I would encourage you to go back. These are useful for several reasons, such as:
They tell you the insert order of rows.
They are more efficient for primary keys.
Because primary keys are clustered in MySQL, they always go at the end.
But, you don't have to follow that advice. My recommendation would be to insert the data into a new table and reload into your desired table:
create temporary table tt as
select t.*
from tt
group by tt.uuid;
truncate table t;
alter table t add constraint pk_uuid primary key (uuid);
insert into t
select * from tt;
Note: I am using a (mis)feature of MySQL that allows you to group by one column while pulling columns not in the group by. I don't like this extension, but you do not specify how to choose the particular row you want. This will give values for the other columns from matching rows. There are other ways to get one row per uuid.
If I try to insert data into a table which already contains that primary key, it will clearly fail.
Is there a simple way to check whether the data I've failed to insert matches what is already in the table? (ie, if the non-primary key fields are the same as are already there for that primary key)
Ideally rather than get a single error, I would like to get 2 different errors when I attempt to insert a primary key that is already used:
- Error1: primary key constraint broken - data being inserted is already in table
- Error2: primary key constraint broken - attempt to enter different data for existing primary key
To check you can do something like this
SELECT COUNT(*) FROM
(
SELECT * FROM tab1
UNION
SELECT * FROM tab2
);
UNION removes duplicates so if rows in both tables are identical then above query will return identical result as
SELECT COUNT(*) FROM tab1;
OR
SELECT COUNT(*) FROM tab2;
Your question is not very detailed (e.g. how you insert this data?) so my answer in also quite generic, but I belive it will be useful for you.
Try it with this:
INSERT INTO yourTable (field1, field2, field3...)
SELECT yourValue1, yourValue2, yourValue3...
FROM dual
WHERE NOT EXISTS (SELECT *
FROM yourTable
WHERE field1 = yourValue1
AND field2 = yourValue2
AND field3 = yourValue3...);
This query checks your fields and only inserts when the record is not already there.
I have a huge table of products but there are lot of duplicate entries. The table has more than10 Thousand entries and I want to remove the duplicate entries in it without manually finding and deleting it. Please let me know if you can provide me a solution for this
You could use SELECT DISTINCT INTO TempTable, drop the original table, and then rename the temp one.
You should also add primary and unique keys to avoid this sort of thing in the future.
for full row duplicates try this.
select distinct * into mytable_tmp from mytable
drop table mytable
alter table mytable_tmp rename mytable
Seems the below statements will help you in resolving your requirements.
if the table(foo) has primary key field
First step
store key values in temporary table, give your unique conditions in group by clause
if you want to delete the duplicate email id, give email id in group by clause and give the primary key name in
select clause like either min(primarykey) or max(primarykey)
CREATE TEMPORARY TABLE temptable AS SELECT min( primarykey ) FROM foo GROUP BY uniquefields;
Second step
call the below delete statement and give the table name and primarykey columns
DELETE FROM foo WHERE primarykey NOT IN (SELECT * FROM temptable );
execute both the query combined in your query analyser or db tool.
If the table(foo) doesn't have a primary key filed
step 1
CREATE TABLE temp_table AS SELECT * FROM foo GROUP BY field or fileds;
step 2
DELETE FROM foo;
step 3
INSERT INTO foo select * from temp_table;
There are different solutions to remove duplicate rows and it fully depends upon your scenario to make use of one from them. The simplest method is to alter the table making the Unique Index on Product Name field:
alter ignore table products add unique index `unique_index` (product_name);
You can remove the index after getting all the duplicate rows deleted:
alter table products drop index `unique_index`;
Please let me know if this resolves the issue. If not I can give you alternate solutions for that.
You can add more than one column to a group by. I.E.
SELECT * from tableName GROUP BY prod_name HAVING count(prod_name) > 1
That will show the unique products. You can write it dump it to new table and drop the existing one.
I've read some posts about this but none cover this issue.
I guess its not possible, but I'll ask anyway.
I have a table with more than 50.000 registers. It's an old table where various insert/delete operations have taken place.
That said, there are various 'holes' some of about 300 registers. I.e.: ..., 1340, 1341, 1660, 1661, 1662,...
The question is. Is there a simple/easy way to make new inserts fill these 'holes'?
I agree with #Aaron Digulla and #Shane N. The gaps are meaningless. If they DO mean something, that is a flawed database design. Period.
That being said, if you absolutely NEED to fill these holes, AND you are running at least MySQL 3.23, you can utilize a TEMPORARY TABLE to create a new set of IDs. The idea here being that you are going to select all of your current IDs, in order, into a temporary table as such:
CREATE TEMPORARY TABLE NewIDs
(
NewID INT UNSIGNED AUTO INCREMENT,
OldID INT UNSIGNED
)
INSERT INTO NewIDs (OldId)
SELECT
Id
FROM
OldTable
ORDER BY
Id ASC
This will give you a table mapping your old Id to a brand new Id that is going to be sequential in nature, due to the AUTO INCREMENT property of the NewId column.
Once this is done, you need to update any other reference to the Id in "OldTable" and any foreign key it utilizes. To do this, you will probably need to DROP any foreign key constraints you have, update any reference in tables from the OldId to the NewId, and then re-institute your foreign key constraints.
However, I would argue that you should not do ANY of this, and just understand that your Id field exists for the sole purpose of referencing a record, and should NOT have any specific relevance.
UPDATE: Adding an example of updating the Ids
For example:
Let's say you have the following 2 table schemas:
CREATE TABLE Parent
(
ParentId INT UNSIGNED AUTO INCREMENT,
Value INT UNSIGNED,
PRIMARY KEY (ParentId)
)
CREATE TABLE Child
(
ChildId INT UNSIGNED AUTO INCREMENT,
ParentId INT UNSIGNED,
PRIMARY KEY(ChildId),
FOREIGN KEY(ParentId) REFERENCES Parent(ParentId)
)
Now, the gaps are appearing in your Parent table.
In order to update your values in Parent and Child, you first create a temporary table with the mappings:
CREATE TEMPORARY TABLE NewIDs
(
Id INT UNSIGNED AUTO INCREMENT,
ParentID INT UNSIGNED
)
INSERT INTO NewIDs (ParentId)
SELECT
ParentId
FROM
Parent
ORDER BY
ParentId ASC
Next, we need to tell MySQL to ignore the foreign key constraint so we can correctly UPDATE our values. We will use this syntax:
SET foreign_key_checks = 0;
This causes MySQL to ignore foreign key checks when updating the values, but it will still enforce the correct value type is used (see MySQL reference for details).
Next, we need to update our Parent and Child tables with the new values. We will use the following UPDATE statement for this:
UPDATE
Parent,
Child,
NewIds
SET
Parent.ParentId = NewIds.Id,
Child.ParentId = NewIds.Id
WHERE
Parent.ParentId = NewIds.ParentId AND
Child.ParentId = NewIds.ParentId
We now have updated all of our ParentId values correctly to the new, ordered Ids from our temporary table. Once this is complete, we can re-institute our foreign key checks to maintain referential integrity:
SET foreign_key_checks = 1;
Finally, we will drop our temporary table to clean up resources:
DROP TABLE NewIds
And that is that.
What is the reason you need this functionality? Your db should be fine with the gaps, and if you're approaching the max size of your key, just make it unsigned or change the field type.
You generally don't need to care about gaps. If you're getting to the end of the datatype for the ID it should be relatively easy to ALTER the table to upgrade to the next biggest int type.
If you absolutely must start filling gaps, here's a query to return the lowest available ID (hopefully not too slowly):
SELECT MIN(table0.id)+1 AS newid
FROM table AS table0
LEFT JOIN table AS table1 ON table1.id=table0.id+1
WHERE table1.id IS NULL
(remember to use a transaction and/or catch duplicate key inserts if you need concurrent inserts to work.)
INSERT INTO prueba(id)
VALUES (
(SELECT IFNULL( MAX( id ) , 0 )+1 FROM prueba target))
IFNULL for skip null on zero rows count
add target for skip error mysql "error clause FROM)
There is a simple way but it doesn't perform well: Just try to insert with an id and when that fails, try the next one.
Alternatively, select an ID and when you don't get a result, use it.
If you're looking for a way to tell the DB to automatically fill the gaps, then that's not possible. Moreover, it should never be necessary. If you feel you need it, then you're abusing an internal technical key for something but the single purpose it has: To allow you to join tables.
[EDIT] If this is not a primary key, then you can use this update statement:
update (
select *
from table
order by reg_id -- this makes sure that the order stays the same
)
set reg_id = x.nextval
where x is a new sequence which you must create. This will renumber all existing elements preserving the order. This will fail if you have foreign key constraints. And it will corrupt your database if you reference these IDs anywhere without foreign key constraints.
Note that during the next insert, the database will create a huge gap unless you reset the identity column.
As others have said, it doesn't matter, and if it does then something is wrong in your database design. But personally I just like them to be in order anyway!
Here is some SQL that will recreate your IDs in the same order, but without the gaps.
It is done first in a temp_id field (which you will need to create), so you can see that it is all good before overwriting your old IDs. Replace Tbl and id as appropriate.
SELECT #i:=0;
UPDATE Tbl
JOIN
(
SELECT id
FROM Tbl
ORDER BY id
) t2
ON Tbl.id = t2.id
SET temp_id = #i:=#i+1;
You will now have a temp_id field with all of your shiny new IDs. You can make them live by simply:
UPDATE Tbl SET id = temp_id;
And then dropping your temp_id column.
I must admit I'm not quite sure why it works, since I would have expected the engine to complain about duplicate IDs, but it didn't when I ran it.
You might wanna clean up gaps in a priority column.
The way below will give an auto increment field for the priority.
The extra left join on the same tabel will make sure it is added in the same order as (in this case) the priority
SET #a:=0;
REPLACE INTO footable
(id,priority)
(
SELECT tbl2.id, #a
FROM footable as tbl
LEFT JOIN footable as tbl2 ON tbl2.id = tbl.id
WHERE (select #a:=#a+1)
ORDER BY tbl.priority
)